Ilya Grigorik from the web performance team at Google had a great write-up of the upcoming preconnect feature in Chrome and Firefox a few weeks ago.
This is the kind of thing which I really love to see implemented in browsers: small, subtle features which yield nice performance boosts.
Whenever I play with MITM attacks against my devices on the local network, it always feels like I’m just putting together pieces of Lego until I find a combination of software that does what I want.
BetterCap has pretty much put an end to this problem – earlier I ran a MITM attack again my iPhone, modifying HTTP traffic in about a dozen lines of code.
After taking Stanford’s introductory networking course earlier this year, I decided it would be fun exercise to put some of that knowledge into practice and go about recreating traceroute in Rust. Traceroute is a neat little program; it ties together a bunch of networking protocols in a relatively simple way, making for a good test of a language’s networking APIs.
A Brief Overview of Network Protocols
If you’ve done any basic network programming in the past, you’re likely familiar with the IP protocol, along with its partners, TCP and UDP. The IP protocol is one of the most important protocols ever created; it is the basis of the Internet, as it allows for data to be transferred from point to point.
IP, the Internet Protocol, provides a base level of functionality for packets. It allows for packet expiration (known as Time To Live, or TTL), error correction (networks are notoriously unreliable), and routing between IP addresses. Because these properties are so important to the operation of the Internet, TCP and UDP are implemented on top of IP.
IP forgoes many sophisticated features in favor of simplicity. It has no guarantee of delivery, no promise to deliver packets in the correct sequence, and it doesn’t do anything to prevent the network being flooded. It does the bare minimum that is required of it – anything more is left to higher level protocols.
Coming back to the TTL concept for a second, one consequence of the Internet being a vast web of interconnected devices is the possibility of infinite loops. Router A might decide that Router B is the next best option for a packet, which in turn might choose Router C as the place to go. Router C could then pick Router A, leaving a packet stuck in an endless cycle. To work around this possibility, IP packets can only live for so long. Routers will decrement their TTL fields before forwarding them, meaning that their TTL field will eventually hit 0, causing them to be dropped before they can loop around any further.
UDP is the simpler of the UDP/TCP pair. It’s so simple, in fact, that it only adds a couple extra features to IP: source and destination ports. (It also has a couple of fields to indicate the length and checksum, in addition to IP’s own length and checksum.)
UDP, like IP, is connectionless. When you want to send a UDP datagram to another host, you create it, send it, and move on. Once that datagram is out the door, you can forget about it. A consequence of this is that datagrams can get lost along the way, and you won’t know about it; perhaps they get corrupted on the wire, or one of the intermediary routers is too busy to process it, so they discard it.
That may sound a little strange. Why send data if you can’t even guarantee that it’s going to arrive? In some cases, this is exactly what you want. Multiplayer games use UDP at the transport layer because the data is so short lived. If a packet gets dropped, there isn’t enough time to resend it, so it would be best to just send a new packet with newer data. Another major user of UDP is WebRTC. If you have a video frame which gets dropped during transmission, it makes sense to just let a newer frame go through, rather than try to resend the old one.
While UDP has its uses, most of the time you’re going to want to know that your data has been successfully received. TCP, the Transmission Control Protocol, does exactly this. While UDP is very minimal, TCP has been built up over the years to include astounding levels of performance optimization and reliability.
TCP is built around the idea of a connection. Before any real data is transmitted, TCP has to set up a connection with its destination, which it does using its three-way handshake. The sender, perhaps your laptop, will send a
SYN (synchronization) packet to its destination, which will respond with a
SYN-ACK (synchronization + acknowledgment) packet. The sender will send the final
ACK packet along with the first portion of data. Packets can include data along with
ACK flags, to save on the total number of packets sent.
Once the connection is established, TCP uses a number of techniques to ensure that data is transmitted in the correct order, while checking that it is not flooding the network with too much traffic. The algorithms used to control network traffic are extremely clever; they’re outside the scope of this article, but I definitely recommend reading up on them.
The Internet Control Message Protocol, ICMP, is not as famous as TCP or UDP. As useful as it is, it typically isn’t the concern of application-level network programmers.
The ICMP protocol is mainly used to send diagnostic information between devices. If a packet could not be delivered to its destination, for example, you would get back an ICMP packet of the type
What Is Traceroute?
When you send a packet to a host, you have no knowledge of how it got from your machine to the target. The complexity of The Internet is entirely abstracted away from you as a programmer – you don’t need to know how a packet got from A to B, it only matters whether or not it did (and to protocols like UDP, even that doesn’t matter).
Sometimes, though, you need to know how packets are traversing the network. If your Internet connection goes down, it would be handy to know where packets are being tripped up. The traceroute program can figure out the exact path taken to send data from you to your target by cleverly harnessing existing protocols like UDP and ICMP.
The key to traceroute’s operation is the TTL field. Every time a router processes a packet, it decreases the TTL value by one – if it decrements this field and notices that the value has reached 0, it won’t send it any further. Instead, it will send back an ICMP error packet to let you know that your packet died en-route. By sending packets out into the ether using a gradually increasing TTL value, traceroute can inspect the packets which come back and use their source IP address to learn where they came from.
Merely reading about a topic isn’t nearly enough to truly learn it. To cement this knowledge, you’ll need practical experience with network protocols. I’m going to use the command-line component of Wireshark, tshark, to watch traceroute as it figures out a path from my local IP to
Wireshark is a program which can capture local network traffic for analysis. You can use it to determine which devices are flooding a LAN, or perhaps which server a certain program is trying to reach. It also allows for some pretty elaborate filtering, to allow you to strip out any irrelevant traffic.
This isn’t meant to be a complete introduction to Wireshark, so you may want to check out the man page for tshark to get a better idea of what’s happening here.
Capturing The Data
To capture data with tshark, you need two things: a network interface, and root privileges. To start, I’m going to look up the IP address for
dig samsymons.com. Because this website has multiple A records, I want to find out one of its IP addresses to use ahead of time, so it can be used as a capture filter when reading packets.
› dig samsymons.com
; <<>> DiG 9.8.3-P1 <<>> samsymons.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13652
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;samsymons.com. IN A
;; ANSWER SECTION:
samsymons.com. 300 IN A 22.214.171.124
samsymons.com. 300 IN A 126.96.36.199
;; Query time: 54 msec
;; SERVER: 192.168.1.254#53(192.168.1.254)
;; WHEN: Wed Jul 22 15:29:48 2015
;; MSG SIZE rcvd: 63
The answer section of the query shows two IP addresses. I’ll arbitrarily choose
188.8.131.52 to use with tshark. Running tshark by itself will capture all network traffic and will not display the TTL of each packet, so a bit of command line magic is required.
sudo tshark -f "icmp or host 184.108.40.206" -T fields \
-e frame.number -e frame.time_relative -e ip.src -e ip.dst \
-e _ws.col.Protocol -e ip.ttl -e _ws.col.Info -E occurrence=f
This looks a bit intimidating, but I promise that it’s not that bad. Here’s what’s going down:
- The first option is specifying a capture filter which includes packets to/from the target IP, plus any ICMP packets which are received
- The fields section is including the frame number, IP addresses, etc.
-E occurrence=f is being used to prevent duplicate data being printed
Now that you’re set up to capture packets, run
traceroute 220.127.116.11 to actually trace the route between your local machine and
Reading The Data
After running traceroute, here’s what was printed:
1 0.000000000 192.168.1.78 18.104.22.168 UDP 1 Source port: 45462 Destination port: 33435
2 0.001175000 192.168.1.254 192.168.1.78 ICMP 64 Time-to-live exceeded (Time to live exceeded in transit)
3 0.001813000 192.168.1.78 22.214.171.124 UDP 1 Source port: 45462 Destination port: 33436
4 0.003100000 192.168.1.254 192.168.1.78 ICMP 64 Time-to-live exceeded (Time to live exceeded in transit)
5 0.003234000 192.168.1.78 126.96.36.199 UDP 1 Source port: 45462 Destination port: 33437
6 0.004227000 192.168.1.254 192.168.1.78 ICMP 64 Time-to-live exceeded (Time to live exceeded in transit)
7 0.004378000 192.168.1.78 188.8.131.52 UDP 2 Source port: 45462 Destination port: 33438
8 0.122021000 10.29.242.1 192.168.1.78 ICMP 63 Time-to-live exceeded (Time to live exceeded in transit)
9 0.123096000 192.168.1.78 184.108.40.206 UDP 2 Source port: 45462 Destination port: 33439
10 0.224505000 10.29.242.1 192.168.1.78 ICMP 63 Time-to-live exceeded (Time to live exceeded in transit)
11 0.224687000 192.168.1.78 220.127.116.11 UDP 2 Source port: 45462 Destination port: 33440
12 0.328691000 10.29.242.1 192.168.1.78 ICMP 63 Time-to-live exceeded (Time to live exceeded in transit)
13 0.328855000 192.168.1.78 18.104.22.168 UDP 3 Source port: 45462 Destination port: 33441
14 0.484253000 22.214.171.124 192.168.1.78 ICMP 249 Time-to-live exceeded (Time to live exceeded in transit)
15 0.485000000 192.168.1.78 126.96.36.199 UDP 3 Source port: 45462 Destination port: 33442
16 0.595150000 188.8.131.52 192.168.1.78 ICMP 249 Time-to-live exceeded (Time to live exceeded in transit)
17 0.595348000 192.168.1.78 184.108.40.206 UDP 3 Source port: 45462 Destination port: 33443
18 0.727938000 220.127.116.11 192.168.1.78 ICMP 249 Time-to-live exceeded (Time to live exceeded in transit)
19 0.728994000 192.168.1.78 18.104.22.168 UDP 4 Source port: 45462 Destination port: 33444
20 0.869751000 22.214.171.124 192.168.1.78 ICMP 249 Time-to-live exceeded (Time to live exceeded in transit)
21 0.870464000 192.168.1.78 126.96.36.199 UDP 4 Source port: 45462 Destination port: 33445
22 0.999830000 188.8.131.52 192.168.1.78 ICMP 249 Time-to-live exceeded (Time to live exceeded in transit)
23 0.999994000 192.168.1.78 184.108.40.206 UDP 4 Source port: 45462 Destination port: 33446
24 1.111686000 220.127.116.11 192.168.1.78 ICMP 249 Time-to-live exceeded (Time to live exceeded in transit)
25 1.111879000 192.168.1.78 18.104.22.168 UDP 5 Source port: 45462 Destination port: 33447
26 1.225611000 22.214.171.124 192.168.1.78 ICMP 60 Destination unreachable (Port unreachable)
27 1.228582000 192.168.1.78 126.96.36.199 UDP 5 Source port: 45462 Destination port: 33448
28 1.333820000 188.8.131.52 192.168.1.78 ICMP 60 Destination unreachable (Port unreachable)
29 1.333986000 192.168.1.78 184.108.40.206 UDP 5 Source port: 45462 Destination port: 33449
30 1.415013000 220.127.116.11 192.168.1.78 ICMP 60 Destination unreachable (Port unreachable)
A lot of this data is pretty self explanatory, but it’s worth some analysis. The first packet being sent is from my laptop’s IP to the target IP. It’s a UDP packet with a TTL of 1, meaning that the very first hop along the path will be the one to kill this packet and send back an ICMP response. The second packet is that exact response – the source IP belongs to the router. The first packet didn’t even make it out of the local network!
You may be a little puzzled by the next 2 outbound packets. They both have a TTL set to 1! Didn’t traceroute already try that value? One of the goals of traceroute is to figure out the roundtrip time between each hop, so sending three packets is a way to get the average time.
Another reason for sending multiple packets to each hop is the fact that UDP is an unreliable protocol. There’s every chance that a traceroute probe packet might get dropped somewhere along the way. Sometimes you’ll encounter routers which deliberately refuse to respond to certain packets, so traceroute will just give up on these and move on to the next hop.
The final three received packets are the cue to stop; they were received and processed by a machine with the target IP address. Because it had the correct IP, the target will have tried to forward the packets to the designated port on the system, and come up empty handed.
Traceroute deliberately picks unlikely port numbers because it has nothing to actually deliver to any service on the other end. This is why the final set of ICMP responses will have
Destination unreachable types.
By now you should have a fairly good idea of how traceroute does its thing. In the next article, I’ll investigate how you can use Rust to reimplement traceroute’s basic functionality.
A couple of weeks back, the folks at RPISEC posted the lecture slides and lab contents of their Modern Binary Exploitation course, held earlier this year. The course is designed to take somebody with basic C skills and have them work their way through a series of reverse engineering challenges of increasing difficulty.
This seemed like a great opportunity to fire up Radare2 and put it to work. This series of posts will work through each of the lecture challenges and labs, with a focus on solving them using Radare2 (and a little help from gdb and friends along the way).
I believe that reverse engineering is a fantastic skill for software developers to pick up. The idea may carry connotations of software piracy with it, but it’s tremendously useful for debugging software and learning how compilers work. Plus, it’s just fun.
Introduction to Radare2
The first order of business: what is Radare2? It’s… a little complicated. At its core, it is an open source framework designed to help disassemble software.
It comes with a set of utilities to help with common RE tasks, like base conversion and file info extraction. It also packs a powerful CLI,
r2, for interactively disassembling1 programs. If you’re familiar with IDA Pro or Hopper, then you have a good idea of what this CLI can do.
Programs like IDA may be easier to get up and running with, but I’m a fan of Radare2 because it can be set up on remote servers easily, and often comes preinstalled on CTF sites like Smash The Stack.
To kick things off, I’ll walk through the first few challenges for lecture two. The next entries will take a more direct approach at solving the problems – this article is more concerned about getting familiar with Radare2.
Grab a copy of the challenges and install Radare2. On Mac OS X,
brew install radare2 will do the job. For other operating systems, check out the installation page on radare.org.
If you’re keen to check out the rest of the challenges ahead of time, the full set of lecture content is available on the course website. (Be sure to send a thank-you note to the people at RPISEC!)
PS: These are all ELF binaries, so you’ll need a Linux system to run them. I’m running an Ubuntu VM in VMware Workstation, but you could set up a cheap server on Digital Ocean and get the same result. If you just want to statically analyze the files without actually executing them, OS X or Windows will do fine.
PPS: I’m assuming you’re at least somewhat familiar with X86 assembly. If not, Programming From The Ground Up is a good, free introduction. Another great book is Practical Reverse Engineering, which also covers ARM assembly and the Windows kernel.
Challenge 1: crackme0x00a
./crackme0x00a and entering a few passwords unsuccessfully, it’s clear that a brute force approach is not going to work. A smarter tactic is to disassemble the file and figure out how it works by reading the output.
The primary Radare2 UI can be started using the
r2 command. It takes a path to a binary as an argument, along with some optional arguments, which I’ll dig into in a future article. For now, run
r2 crackme0x00a to open the first challenge.
The main UI is reminiscent of a Meterpreter shell, but capable of much more. It has its own commands and state which you can use to explore a file, as well as editing and running it with a debugger. To illustrate how shell-like Radare2 is, you can navigate the file system just like you would in bash:
sam@remote:/challenges/lecture1$ r2 crackme0x00a
-- Trust no one, nor a zero. Both lie.
[0x100000d78]> cd ~
The first thing you see when launching
r2, besides the start-up message, is the input section preceded by a memory address. This memory address indicates your current position in the file. If you were to print out the next 16 hex values, for example, it would do so from that address:
[0x100000d78]> x 16
offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x100000d78 5548 89e5 4883 c768 4883 c668 5de9 2638 UH..H..hH..h].&8
To navigate the binary, there is the seek command,
s. To learn how seek works, give the help operator,
?, a try. It can be appended to any
r2 command, so in this situation you would use
s?. It can also be used on its own to print a generalized help message.
With that out of the way, it’s time to start analyzing the first challenge. Radare2 can analyze a binary using the
a command – this is useful, but the real workhorse is the
aa subcommand, short for analyze all.
aa will (unsurprisingly) search through the entire binary and analyze its symbols.
Now for the fun part: disassembling functions. Radare2 provides this in the form of the print disassemble function command,
pdf @ main will analyze the main function and print its disassembly.
Here’s what you get when disassembling
I cheated a little bit and added some newlines between a few function calls in the disassembly to help with readability. There are a couple interesting things here:
- Radare2 adds ASCII-art arrows to indicate control flow, which is amazingly useful
- Strings are automatically read from the binary and added as comments
- Strings are prefixed with
str., whereas imports use
On lines 30 and 31 of the disassembly, you can see a call to
g00dJ0B! as one of the arguments2. There’s a pretty good chance that this is the password.
Enter password: g00dJ0B!
Bingo! Not all of the challenges will have the answer lying out in the open like that, but this is a good start!
Challenge 2: crackme0x00b
Alright, time for round 2. Open up
crackme0x00b in r2, use
aa to analyze the file and its symbols, then disassemble
pdf @ main. The disassembly is largely uninteresting, but this section sticks out.
0x080484bf 8d44241c lea eax, [esp + 0x1c] ; 0x1c
0x080484c3 89442404 mov dword [esp + 4], eax ; [0x4:4]=0x10101
0x080484c7 c7042440a004. mov dword [esp], sym.pass.1964 ; [0x804a040:4]=119 ; "w" @ 0x804a040
0x080484ce e8bdfeffff call sym.imp.wcscmp ;sym.imp.wcscmp()
Hmm. On the surface, this looks like a string comparison with
w as an argument to the
wcscmp function. Seems a little too easy… let’s try anyway.
Enter password: w
Alright, that was a long shot. The
wcscmp function is interesting though. Perhaps it can be used to figure out if something is preventing the true password from being printed. The man page for
wcscmp has this to say:
The wcscmp() function is the wide-character equivalent of the strcmp(3) function. It compares the wide-character string pointed to by s1 and the wide-character string pointed to by s2.
That explains it! Wide-characters on Linux are 32-bits in length, so when Radare2 was looking up the string to print beside the
wcscmp argument, it would have read the character
w before hitting a null character and stopping; it didn’t know to expect 32-bit characters! If we can inspect the location of the string in memory, the password should become obvious.
In the disassembly above, the target string is at
sym.pass.1964. What happens when the hex values at this address are printed out using the
print hex command,
[0x080483e0]> px @ sym.pass.1964
offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x0804a040 7700 0000 3000 0000 7700 0000 6700 0000 w...0...w...g...
0x0804a050 7200 0000 6500 0000 6100 0000 7400 0000 r...e...a...t...
0x0804a060 0000 0000 4743 433a 2028 5562 756e 7475 ....GCC: (Ubuntu
0x0804a070 2f4c 696e 6172 6f20 342e 362e 312d 3975 Linaro 4.6.1-9u
0x0804a080 6275 6e74 7533 2920 342e 362e 3100 002e buntu3) 4.6.1...
0x0804a090 7379 6d74 6162 002e 7374 7274 6162 002e symtab..strtab..
0x0804a0a0 7368 7374 72 shstr
As predicted, the password was being prematurely terminated by r2. The program is reading in characters in 32-bit pieces (each hex digit is 4 bits, and there are 8 hex digits per character). The ASCII values of the hex to the right hand side reveal the password:
Challenge 3: crackme0x01
Alright, we’re through the qualifying rounds! There are now 9 binaries to work through, but let’s just tackle the first one and leave the others for part 2.
crackme0x01 is another nice, simple puzzle. After going through the usual analyze-then-disassemble routine, the
main function quickly reveals the solution. There are two calls to
printf, followed by a call to
scanf which expects an integer as an argument.
Wait a second… how can you tell that it expects an integer? In the comments provided by
r2, the string passed to
scanf is revealed to be set to
%d, the format specifier for signed decimal values. Try running
ps @ 0x804854c in Radare2 if you want a second opinion.
After the program prints some labels and takes our input, the program reaches a particularly interesting line:
cmp dword [ebp-local_1], 0x149a. This line is comparing a variable against an integer constant, likely the constant that the program expects as the password! There’s just one problem: the constant is hex, and the program’s input is expecting a plain old base 10 value.
Remember that set of utilities mentioned earlier? Radare2 comes with a program called
rax2, which is capable of converting one base to another, like decimal (base 10) to hex (base 16). Outside of the
rax2 0x149a will print out the integer (and password)
5274. If you don’t want to exit
r2 just to convert a number,
? 0x149a will display the value in a range of formats, including binary and octal. Run the program and enter
5274 to win!
Hopefully you’re beginning to comprehend the raw power of Radare2. This article barely skims the surface –
r2 packs a near limitless number of handy features, including an excellent visual interface capable of changing between display modes.
Although these challenges were pretty simple, this is a solid starting point. The next entries in this series will eventually work towards challenges which feature buffer overflows, encryption, ROP chaining, and more. Have fun!
One of my hobbies is taking apart binaries and figuring out how they work. It is really satisfying to take a program and break it apart, before reassembling the pieces in a way that you understand. There are so many resources for picking up this stuff that it seemed like a crime to not collect it in one place.
If you’re just getting into reverse engineering, there are a number of concepts you’ll need to get your head around. If you’ve been writing software for a while then chances are that you’ll have a working knowledge of how disassemblers and debuggers operate, but it’s worth a refresher.
gdb is the main debugger worth knowing these days. I use lldb for work, but it never really made its way into my side projects. Brian Hall wrote a nice gdb intro which is enough to get started.
The subject of disassemblers is a little more complicated. IDA Pro is the big one, both in terms of features and price, but a number of smaller disassemblers have sprouted up around it. Hopper for Mac is a good one, and a personal favorite is radare2. Pick one that sounds good and master it.
The last thing to mention is assembly. Without that, you’ll have no hope of deciphering the output of a disassembler. Programming From The Ground Up is a great way to pick up x86 assembly; once you’ve got that sorted it’s not too tricky to learn the other popular flavors as necessary.
A worthy mention for this section is the Reverse Engineering site on Stack Exchange – there are answers to all sorts of interesting questions over there.
There’s an enormous number of ways to practice reverse engineering in a safe (and legal) manner. Here are the websites which I regularly toy with:
- Exploit Exercises has a bunch of VMs which you can run in something like VirtualBox to get a feel for reverse engineering. Nebula is a good one for finding your feet in Linux, whereas Protostar is the place to go for lower-level memory exploitation.
- crackmes.de is a user driven database of exploitable binaries. Note that, like anything on the Internet, these could be horribly virus-ridden so it’s on you to vet them. (Reading the comments is probably fine.)
- Smash The Stack is the premier wargaming site out there. The IO game is a great place to start.
My all-time favorite low-level security book is Hacking: The Art of Exploitation. I’ve spent many hours poring over this book; it’s been worth every minute. There is a great chapter on writing shellcode, and the section on ARP spoofing is particularly fun.
Another No Starch book is Practical Malware Analysis. I’ve just started on this but so far, so good. Malware analysis is a natural progression point for reverse engineering so keep this book in mind.
Finally, for the Windows crowd, Practical Reverse Engineering is the book to get. There are some wonderful sections on the Windows kernel in there (which I really need to read again one day soon).
Open Security Training recorded some of their security courses and put them up on YouTube for free. I’m still working my way through these – they’re really good so far. The Intro x86 course is a good starting point.
DEFCON and Black Hat are more general security conferences, but there is plenty there to check out. Dive into the DEFCON media server and have fun!