Lately I’ve been working on an 8080 emulator in Swift. The process of having to take a binary apart and figure out which byte is associated with which instruction has gotten me interested in programs and how they work under the hood.
Instead of doing some private research and calling it a day, I wanted to try and put things together into one place.
What Are Binaries?
A binary is nothing more than an executable chunk of data arranged in a certain way. In fact, everything on a computer is a chunk of data. Images, music, movies – these are all just sets of ones and zeros arranged in such a way that they can produce some meaningful output when interpreted in a certain way.
Understanding Binaries
As mentioned, a binary is just data. Nothing but a long series of ones and zeros, the same as everything else on a hard disk. Here’s the trick, though: ones and zeros can mean different things based on how you interpret them.
Take the following series of bits: 0100 0001 0100 0010
. A byte is 8 bits, so we’re looking at two bytes. If you were to consult a hex editor, it would take each nibble (half a byte, or 4 binary digits) and display it as a hex digit (a single hex digit is 4 bits). It would tell you that this particular string represents 4142
. It’s not wrong – this string is definitely 4142
in hexadecimal… but it can mean something entirely different depending on who you ask.
A text editor would take each byte and figure out which character that should be. By consulting an ASCII table, we see that 41
in hex is A
, and 42
is B
. So a text editor’s verdict would be that this particular binary string represents AB
. Just like a hex editor tells you that its understanding of the string is 4142
, the text editor is also correct. It’s all a matter of interpretation.
If the data wasn’t intended to be human-readable, the result will be mostly garbled. You’ve probably tried to open up a program in a text editor before and seen this yourself.
So, we know about the hex editor’s opinion, as well as the text editor’s. But what would a CPU think? Now things are getting interesting. Before we delve down that rabbit hole, let’s build a binary of our very own.
Building Our Own Binary
With the following source code, and a bit of help from a compiler, we can produce a binary which can then be executed by our computer.
#include <stdio.h> | |
int main(void) { | |
printf("Hello, world!\n"); | |
return 0; | |
} |
The command I used was
gcc -fno-stack-protector -D_FORTIFY_SOURCE=0 -g -o hello_world hello_world.c
, which asks gcc to produce a binary namedhello_world
from thehello_world.c
source file. The extra arguments just make the output a little easier to deal with.
Alright, we’ve got our binary! Take it for a spin with ./hello_world
. The output probably won’t surprise you.
Now that we have our program compiled, we can try to read the result. If you’re anything like me, you prefer reading English characters – a string of binary digits isn’t very helpful. A common way of reading a program after it has been compiled is by converting it to hex. Your computer will likely have a program for this exact purpose: hexdump.
hexdumps themselves aren’t actually that useful unless you know what to look for. An even better way to understand a compiled program is to run it through a disassembler.
To try out hexdump
, let’s run our binary through it with the command hexdump -C hello_world
. It’s a little on the lengthy side, but it’s worth looking over the entire thing.
00000000 cf fa ed fe 07 00 00 01 03 00 00 80 02 00 00 00 |................| | |
00000010 10 00 00 00 10 05 00 00 85 00 20 00 00 00 00 00 |.......... .....| | |
00000020 19 00 00 00 48 00 00 00 5f 5f 50 41 47 45 5a 45 |....H...__PAGEZE| | |
00000030 52 4f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |RO..............| | |
00000040 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00000060 00 00 00 00 00 00 00 00 19 00 00 00 28 02 00 00 |............(...| | |
00000070 5f 5f 54 45 58 54 00 00 00 00 00 00 00 00 00 00 |__TEXT..........| | |
00000080 00 00 00 00 01 00 00 00 00 10 00 00 00 00 00 00 |................| | |
00000090 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................| | |
000000a0 07 00 00 00 05 00 00 00 06 00 00 00 00 00 00 00 |................| | |
000000b0 5f 5f 74 65 78 74 00 00 00 00 00 00 00 00 00 00 |__text..........| | |
000000c0 5f 5f 54 45 58 54 00 00 00 00 00 00 00 00 00 00 |__TEXT..........| | |
000000d0 40 0f 00 00 01 00 00 00 2a 00 00 00 00 00 00 00 |@.......*.......| | |
000000e0 40 0f 00 00 04 00 00 00 00 00 00 00 00 00 00 00 |@...............| | |
000000f0 00 04 00 80 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00000100 5f 5f 73 74 75 62 73 00 00 00 00 00 00 00 00 00 |__stubs.........| | |
00000110 5f 5f 54 45 58 54 00 00 00 00 00 00 00 00 00 00 |__TEXT..........| | |
00000120 6a 0f 00 00 01 00 00 00 06 00 00 00 00 00 00 00 |j...............| | |
00000130 6a 0f 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |j...............| | |
00000140 08 04 00 80 00 00 00 00 06 00 00 00 00 00 00 00 |................| | |
00000150 5f 5f 73 74 75 62 5f 68 65 6c 70 65 72 00 00 00 |__stub_helper...| | |
00000160 5f 5f 54 45 58 54 00 00 00 00 00 00 00 00 00 00 |__TEXT..........| | |
00000170 70 0f 00 00 01 00 00 00 1a 00 00 00 00 00 00 00 |p...............| | |
00000180 70 0f 00 00 02 00 00 00 00 00 00 00 00 00 00 00 |p...............| | |
00000190 00 04 00 80 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
000001a0 5f 5f 63 73 74 72 69 6e 67 00 00 00 00 00 00 00 |__cstring.......| | |
000001b0 5f 5f 54 45 58 54 00 00 00 00 00 00 00 00 00 00 |__TEXT..........| | |
000001c0 8a 0f 00 00 01 00 00 00 0f 00 00 00 00 00 00 00 |................| | |
000001d0 8a 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
000001e0 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
000001f0 5f 5f 75 6e 77 69 6e 64 5f 69 6e 66 6f 00 00 00 |__unwind_info...| | |
00000200 5f 5f 54 45 58 54 00 00 00 00 00 00 00 00 00 00 |__TEXT..........| | |
00000210 9c 0f 00 00 01 00 00 00 48 00 00 00 00 00 00 00 |........H.......| | |
00000220 9c 0f 00 00 02 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00000230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00000240 5f 5f 65 68 5f 66 72 61 6d 65 00 00 00 00 00 00 |__eh_frame......| | |
00000250 5f 5f 54 45 58 54 00 00 00 00 00 00 00 00 00 00 |__TEXT..........| | |
00000260 e8 0f 00 00 01 00 00 00 18 00 00 00 00 00 00 00 |................| | |
00000270 e8 0f 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00000290 19 00 00 00 e8 00 00 00 5f 5f 44 41 54 41 00 00 |........__DATA..| | |
000002a0 00 00 00 00 00 00 00 00 00 10 00 00 01 00 00 00 |................| | |
000002b0 00 10 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................| | |
000002c0 00 10 00 00 00 00 00 00 07 00 00 00 03 00 00 00 |................| | |
000002d0 02 00 00 00 00 00 00 00 5f 5f 6e 6c 5f 73 79 6d |........__nl_sym| | |
000002e0 62 6f 6c 5f 70 74 72 00 5f 5f 44 41 54 41 00 00 |bol_ptr.__DATA..| | |
000002f0 00 00 00 00 00 00 00 00 00 10 00 00 01 00 00 00 |................| | |
00000300 10 00 00 00 00 00 00 00 00 10 00 00 03 00 00 00 |................| | |
00000310 00 00 00 00 00 00 00 00 06 00 00 00 01 00 00 00 |................| | |
00000320 00 00 00 00 00 00 00 00 5f 5f 6c 61 5f 73 79 6d |........__la_sym| | |
00000330 62 6f 6c 5f 70 74 72 00 5f 5f 44 41 54 41 00 00 |bol_ptr.__DATA..| | |
00000340 00 00 00 00 00 00 00 00 10 10 00 00 01 00 00 00 |................| | |
00000350 08 00 00 00 00 00 00 00 10 10 00 00 03 00 00 00 |................| | |
00000360 00 00 00 00 00 00 00 00 07 00 00 00 03 00 00 00 |................| | |
00000370 00 00 00 00 00 00 00 00 19 00 00 00 48 00 00 00 |............H...| | |
00000380 5f 5f 4c 49 4e 4b 45 44 49 54 00 00 00 00 00 00 |__LINKEDIT......| | |
00000390 00 20 00 00 01 00 00 00 00 10 00 00 00 00 00 00 |. ..............| | |
000003a0 00 20 00 00 00 00 00 00 30 02 00 00 00 00 00 00 |. ......0.......| | |
000003b0 07 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................| | |
000003c0 22 00 00 80 30 00 00 00 00 20 00 00 08 00 00 00 |"...0.... ......| | |
000003d0 08 20 00 00 18 00 00 00 00 00 00 00 00 00 00 00 |. ..............| | |
000003e0 20 20 00 00 10 00 00 00 30 20 00 00 30 00 00 00 | ......0 ..0...| | |
000003f0 02 00 00 00 18 00 00 00 a8 20 00 00 0c 00 00 00 |......... ......| | |
00000400 78 21 00 00 b8 00 00 00 0b 00 00 00 50 00 00 00 |x!..........P...| | |
00000410 00 00 00 00 08 00 00 00 08 00 00 00 02 00 00 00 |................| | |
00000420 0a 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00000430 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00000440 68 21 00 00 04 00 00 00 00 00 00 00 00 00 00 00 |h!..............| | |
00000450 00 00 00 00 00 00 00 00 0e 00 00 00 20 00 00 00 |............ ...| | |
00000460 0c 00 00 00 2f 75 73 72 2f 6c 69 62 2f 64 79 6c |..../usr/lib/dyl| | |
00000470 64 00 00 00 00 00 00 00 1b 00 00 00 18 00 00 00 |d...............| | |
00000480 d4 9e ee 72 33 4e 3a 53 be c6 cb c8 c7 73 8b 83 |...r3N:S.....s..| | |
00000490 24 00 00 00 10 00 00 00 00 0a 0a 00 00 0a 0a 00 |$...............| | |
000004a0 2a 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 |*...............| | |
000004b0 28 00 00 80 18 00 00 00 40 0f 00 00 00 00 00 00 |(.......@.......| | |
000004c0 00 00 00 00 00 00 00 00 0c 00 00 00 38 00 00 00 |............8...| | |
000004d0 18 00 00 00 02 00 00 00 00 00 bd 04 00 00 01 00 |................| | |
000004e0 2f 75 73 72 2f 6c 69 62 2f 6c 69 62 53 79 73 74 |/usr/lib/libSyst| | |
000004f0 65 6d 2e 42 2e 64 79 6c 69 62 00 00 00 00 00 00 |em.B.dylib......| | |
00000500 26 00 00 00 10 00 00 00 60 20 00 00 08 00 00 00 |&.......` ......| | |
00000510 29 00 00 00 10 00 00 00 68 20 00 00 00 00 00 00 |).......h ......| | |
00000520 2b 00 00 00 10 00 00 00 68 20 00 00 40 00 00 00 |+.......h ..@...| | |
00000530 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
* | |
00000f40 55 48 89 e5 48 83 ec 10 48 8d 3d 3b 00 00 00 c7 |UH..H...H.=;....| | |
00000f50 45 fc 00 00 00 00 b0 00 e8 0d 00 00 00 31 c9 89 |E............1..| | |
00000f60 45 f8 89 c8 48 83 c4 10 5d c3 ff 25 a0 00 00 00 |E...H...]..%....| | |
00000f70 4c 8d 1d 91 00 00 00 41 53 ff 25 81 00 00 00 90 |L......AS.%.....| | |
00000f80 68 00 00 00 00 e9 e6 ff ff ff 48 65 6c 6c 6f 2c |h.........Hello,| | |
00000f90 20 77 6f 72 6c 64 21 0a 00 00 00 00 01 00 00 00 | world!.........| | |
00000fa0 1c 00 00 00 00 00 00 00 1c 00 00 00 00 00 00 00 |................| | |
00000fb0 1c 00 00 00 02 00 00 00 40 0f 00 00 34 00 00 00 |........@...4...| | |
00000fc0 34 00 00 00 6b 0f 00 00 00 00 00 00 34 00 00 00 |4...k.......4...| | |
00000fd0 03 00 00 00 0c 00 01 00 10 00 01 00 00 00 00 00 |................| | |
00000fe0 00 00 00 01 00 00 00 00 14 00 00 00 00 00 00 00 |................| | |
00000ff0 01 7a 52 00 01 78 10 01 10 0c 07 08 90 01 00 00 |.zR..x..........| | |
00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00001010 80 0f 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................| | |
00001020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| | |
* | |
00002000 11 22 10 51 00 00 00 00 11 40 64 79 6c 64 5f 73 |.".Q.....@dyld_s| | |
00002010 74 75 62 5f 62 69 6e 64 65 72 00 51 72 00 90 00 |tub_binder.Qr...| | |
00002020 72 10 11 40 5f 70 72 69 6e 74 66 00 90 00 00 00 |r..@_printf.....| | |
00002030 00 01 5f 00 05 00 02 5f 6d 68 5f 65 78 65 63 75 |.._...._mh_execu| | |
00002040 74 65 5f 68 65 61 64 65 72 00 21 6d 61 69 6e 00 |te_header.!main.| | |
00002050 25 02 00 00 00 03 00 c0 1e 00 00 00 00 00 00 00 |%...............| | |
00002060 c0 1e 00 00 00 00 00 00 fa de 0c 05 00 00 00 3c |...............<| | |
00002070 00 00 00 01 00 00 00 00 00 00 00 14 fa de 0c 00 |................| | |
00002080 00 00 00 28 00 00 00 01 00 00 00 06 00 00 00 02 |...(............| | |
00002090 00 00 00 0b 6c 69 62 53 79 73 74 65 6d 2e 42 00 |....libSystem.B.| | |
000020a0 00 00 00 03 00 00 00 00 02 00 00 00 64 00 00 00 |............d...| | |
000020b0 00 00 00 00 00 00 00 00 27 00 00 00 64 00 00 00 |........'...d...| | |
000020c0 00 00 00 00 00 00 00 00 35 00 00 00 66 03 01 00 |........5...f...| | |
000020d0 f6 76 34 55 00 00 00 00 01 00 00 00 2e 01 00 00 |.v4U............| | |
000020e0 40 0f 00 00 01 00 00 00 7b 00 00 00 24 01 00 00 |@.......{...$...| | |
000020f0 40 0f 00 00 01 00 00 00 01 00 00 00 24 00 00 00 |@...........$...| | |
00002100 2a 00 00 00 00 00 00 00 01 00 00 00 4e 01 00 00 |*...........N...| | |
00002110 2a 00 00 00 00 00 00 00 01 00 00 00 64 01 00 00 |*...........d...| | |
00002120 00 00 00 00 00 00 00 00 81 00 00 00 0f 01 10 00 |................| | |
00002130 00 00 00 00 01 00 00 00 95 00 00 00 0f 01 00 00 |................| | |
00002140 40 0f 00 00 01 00 00 00 9b 00 00 00 01 00 00 01 |@...............| | |
00002150 00 00 00 00 00 00 00 00 a3 00 00 00 01 00 00 01 |................| | |
00002160 00 00 00 00 00 00 00 00 0a 00 00 00 0b 00 00 00 |................| | |
00002170 00 00 00 40 0a 00 00 00 20 00 2f 55 73 65 72 73 |...@.... ./Users| | |
00002180 2f 73 61 6d 73 79 6d 6f 6e 73 2f 44 65 73 6b 74 |/samsymons/Deskt| | |
00002190 6f 70 2f 48 65 6c 6c 6f 57 6f 72 6c 64 2f 00 68 |op/HelloWorld/.h| | |
000021a0 65 6c 6c 6f 5f 77 6f 72 6c 64 2e 63 00 2f 76 61 |ello_world.c./va| | |
000021b0 72 2f 66 6f 6c 64 65 72 73 2f 30 37 2f 33 7a 62 |r/folders/07/3zb| | |
000021c0 31 5f 6e 70 78 30 63 71 33 70 66 5f 6e 34 33 37 |1_npx0cq3pf_n437| | |
000021d0 63 6e 5f 35 72 30 30 30 30 67 6e 2f 54 2f 68 65 |cn_5r0000gn/T/he| | |
000021e0 6c 6c 6f 5f 77 6f 72 6c 64 2d 38 31 38 34 37 30 |llo_world-818470| | |
000021f0 2e 6f 00 5f 6d 61 69 6e 00 5f 5f 6d 68 5f 65 78 |.o._main.__mh_ex| | |
00002200 65 63 75 74 65 5f 68 65 61 64 65 72 00 5f 6d 61 |ecute_header._ma| | |
00002210 69 6e 00 5f 70 72 69 6e 74 66 00 64 79 6c 64 5f |in._printf.dyld_| | |
00002220 73 74 75 62 5f 62 69 6e 64 65 72 00 00 00 00 00 |stub_binder.....| | |
00002230 |
Because I compiled this on Mac OS X, I know from experience that the binary will be in the Mach-O format. I have trust issues though; let’s check anyway. According to this Mach-O format poster, the first 4 bytes should be CE FA ED FE
. A quick comparison to the hexdump file confirms it – this is Mach-O!
A nice feature of hexdump is that it will check whether each byte is printable – if it is, then it will display it in the right-hand column. This is why you can see “Hello, world!” right there in the hexdump. (If it’s not printable, it just puts a period instead.)
A binary format is a way of laying out a program. Mach-O is the way that Macs format their programs, so that each one can be understood by the operating system. Linux-based computers have a different format (ELF), with the same applying to Windows.
How CPUs Interpret Data
Previously, I asked how a CPU might interpret strings of ones and zeros. It does it the same way a text editor would: it reads a byte, and then decides what to do with it. A text editor decides which character to display, whereas a CPU decides which instruction to execute. The way it decides is through what is called an instruction set.
Each CPU has its own instruction set. 8080 processors have their own instruction set, 6502 processors have another. An instruction set defines everything that a computer can do – there are instructions for addition, multiplication, manipulating data, encryption, and much more. Every program you write can be distilled into these instructions.
For each byte in the text section of our hexdump (see the Mach-O poster for information on the text section), you can look up which instruction will be executed, and how many bytes it takes as arguments.
There’s not much more to say here without getting into assembly programming, which is a phenomenally large topic on its own. If you want to learn more, check out Programming From The Ground Up for a nice assembly intro.
If you’re curious, Intel has free PDFs about their architectures available. Never fear, new programmers: for regular programming jobs, you should never need to look at these.
Wrapping Up
You’ve learned that binaries come in all shapes and sizes, as well as how to rip one apart to understand how it operates. Not bad for a day’s work. Try writing a few programs of your own on various architectures, so you can get a feel for the various binary formats out there.
If you really want to step up your game, build a binary and then try to edit it so that it does something else. This is how people pirate software and do all sorts of shady things (don’t be a jerk).