The simplest system call: exiting

From the previous post, we know how to make an x86-64 ELF program start, but how do we make it stop? Let's make a program that will exit, say with exit code 110.

In C, this would be easy to do:

$ cat >eleventy.c <<'EOF'

#include <stdlib.h>
int main(void) { exit(110); }

EOF
$ gcc eleventy.c -o eleventy
$ ./eleventy; echo $?
110

Our goal is to write some assembly code to do this, so we'll have to unwrap a few layers of niceness that the C standard library provides for us. The first stop is man 3 exit, which tells us that exit does a bunch of stuff and eventually calls _exit, with an underscore in front. The next stop is man 2 _exit, which tells us:

   C library/kernel differences
       In glibc up to version  2.3,  the  _exit()  wrapper
       function invoked the kernel system call of the same
       name.   Since  glibc  2.3,  the  wrapper   function
       invokes exit_group(2), in order to terminate all of
       the threads in a process.

I love these "C library/kernel differences" sections that are in the man pages for most system calls. Concise and useful!

Alright, so now we know we should call either the exit (number 60) or the exit_group (number 231) system call. (Those numbers come from either searching online or looking at the file arch/x86/entry/syscalls/syscall_64.tbl in the Linux kernel source tree.)

Next up, the specification in x86-64-psABI-1.0.pdf tells us on page 148 how to actually make a system call in assembly: calling conventions for x86-64 system calls

(Incidentally, this tells us that the whole errno thing in C is an abstraction added by C itself; in assembly, the error information is right there in the return value of the system call.)

Summarizing, the only code we need to exit with 110 is:

mov rax, 0xe7    # system call number 231 (decimal) = e7 (hex)
mov rdi, 0x6e    # exit code 110 (decimal) = 6e (hex)
syscall

How to move constants into registers

According to the AMD64 Architecture Programmer's Manual, Volume 3, page 421, it's pretty easy to encode the syscall instruction: 0f 05

Just the two bytes 0f 05 and we're done.

It's a bit harder to encode the mov instructions, because there are so many ways of doing it. There are eight variants just for moving a constant into a register (page 225): eight forms of the mov instruction

If you look closely at the encoding, though, you'll notice that the encodings for the 16-bit, 32-bit and 64-bit variants look the same on the binary level. That can't be right! How does the processor know the size of the operands? The answer is hidden in a few other tables and figures and sections in the manual, which specify how various instruction prefixes modify the meaning of instructions. Table 1-2 on page 8 tells us how: how operand sizes are determined in each operating mode

Our program will be operating in the 64-bit submode of long mode, so to use a 64-bit operand size, we need a REX prefix with REX.W = 1. After reading the flow-chart on page 2, section 1.2.7 "REX Prefix", and section 2.5.2 "Opcode Syntax", we can finally figure out that mov rax, 0xe7 can be encoded as:

48 b8 e7 00 00 00 00 00 00 00

But there's more than one way to do it! Let's take this as an invitation to do some premature optimization for code size. We can do the same thing in fewer bytes if we look at figure 2-3 and/or read the surrounding text: a table of the general-purpose registers and how they overlap

This specifies that the 32-bit register eax lives in the lower half of the 64-bit register rax, and when we save a result to eax the processor sets the upper half of rax to zero automatically. Our constant has its upper half equal to zero, so we could use the encoding:

b8 e7 00 00 00

Can we do even better? Our constant fits into a single byte, so we could hope to just stuff it into the 8-bit register al, which lives in the lowest byte of rax. This would be encoded as:

b0 e7

However, as figure 2-3 above tells us, the rest of rax doesn't get set to zero when we put something in al. We could do that ourselves, and the section on the mov instruction on page 224 tells us: to initialize a register to 0, use the xor instruction with identical destination and source operands

After reading about the xor instruction and the encoding of ModRM bytes, we can encode xor eax, eax (which sets rax to zero) as:

31 c0

So this brings us down to four bytes for something roughly equivalent to mov rax, 0xe7:

31 c0 b0 e7

Can we do the same thing for the other value we have to set, mov rdi, 0x6e? Not really! Figure 2-3 tells us that the register dil, which lives in the lowest byte of register rdi, can only be used if we have a REX prefix (so that we can have REX.B = 1), which would bring us back up to five bytes:

31 ff 41 b7 6e

However, it turns out that we can encode the instruction mov edi, eax with only two bytes:

89 c7

This means that we can do the following sequence in just 10 bytes:

31 c0    # xor eax, eax
b0 6e    # mov al, 6e
89 c7    # mov edi, eax
b0 e7    # mov al, e7
0f 05    # syscall

and the effect is essentially the same as the sequence we started with, which uses 22 bytes:

48 b8 e7 00 00 00 00 00 00 00    # mov rax, 0xe7
48 bf 6e 00 00 00 00 00 00 00    # mov rdi, 0x6e
0f 05                            # syscall

(It's only essentially the same because the 10-byte version affects the status flags before doing the syscall, which we don't care about here. Infinite minutiae!)

Wrap it all up in an ELF file and we have a program that can make a system call:

$ cat >make-eleventy <<'EOF'
#!/usr/bin/python3

s = """
7f 45 4c 46 02 01 01 00
00 00 00 00 00 00 00 00
02 00 3e 00 01 00 00 00
78 00 01 00 00 00 00 00
40 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 40 00 38 00
01 00 40 00 00 00 00 00

01 00 00 00 05 00 00 00
78 00 00 00 00 00 00 00
78 00 01 00 00 00 00 00
00 00 00 00 00 00 00 00
0a 00 00 00 00 00 00 00
0a 00 00 00 00 00 00 00
00 10 00 00 00 00 00 00

31 c0 b0 6e 89 c7 b0 e7
0f 05
"""

open('eleventy', 'wb').write(bytes([int(b, 16) for b in s.split()]))

EOF
$ chmod +x make-eleventy
$ ./make-eleventy
$ chmod +x eleventy
$ ./eleventy
$ echo $?
110

What's the point?