Web

Assignment 5: Binary bomb

Due date: Mon Feb 23 11:59 pm - Hard deadline: Fri Feb 27 11:59 pm

Assignment by Julie Zelenski, ideas taken from the original by Randal Bryant & David O'Hallaron (CMU )

Learning goals

Completing this assignment will give you:

solid practice in reading and tracing IA32 assembly
a clear understanding of how data access, control structures, and function calls translate between C and IA32
experience reverse-engineering object code
a compelling reason to invest in mastering the gdb debugger!

The problem

Those nefarious Cal students have broken into our myth machines and planted some mysterious executables we are calling "binary bombs." These programs are believed to be armed and dangerous. Without the original source, we don't have much to go on, but we have observed that the programs seem to operate in a sequence of levels. Each level challenges the user to enter a string. If the user enter the correct string, it defuses the level and the program proceeds on. But given the wrong input, the bomb explodes by printing an earth-shattering KABOOM! and terminating. To deactivate the entire bomb, one needs to successfully defuse each of its levels.

The Cal students have littered our systems with these landmines and we need your help. Each of you is given a bomb to disable. Your mission is to apply your best IA32 detective skills to work out the input required to pass each level and render the entire bomb harmless.

You will use a variety of tools and techniques to dissect your bomb. The levels get progressively more complex, but the expertise you gain as you move up from each level should offset this difficulty. One confounding factor is that the bomb has a hair trigger, prone to exploding at the least provocation. Each time your bomb explodes, it notifies the staff, which deducts from your score. Thus, there are consequences to detonating the bomb-- you must tread carefully!

Tools and strategies

Your bomb is given to you as an executable, i.e. as compiled object code. There are various utilities for poking around in object code that can be useful in analyzing your bomb. The task before you is one of reverse-engineering, working backwards to reconstruct the original C source from the executable. Making progress on this task will require a mix of approaches and tools. Here are some suggested points of attack:

The nm utility dumps the symbol table from an executable. The symbols includes the names of functions and global variables and their addresses. The symbol table by itself is not a lot to go on, but just reading the names might give you a little sense of the lay of the land.
The strings utility will display all the printable strings in an executable, including all string constants. What strings do you find in your bomb? Do any of them seem of relevance to the task at hand?
The objdump disassembler can dump of the object code into its disassembled equivalent. Reading and tracing the disassembled code is where the bulk of your information will come from. Scrutinizing the lifeless object code without executing is a technique known as deadlisting. Once you sort out what the object code does, you can, in effect, translate it back to C and then see what input is expected. This works reasonably well on simple passages of code, but can become unwieldy when the code is more complex.
The gdb debugger is absolutely invaluable here. You can use gdb to single-step by assembly instruction, examine (and change!) memory and registers, view the runtime stack, disassemble the object code, set breakpoints and watchpoints, re-route control flow, write your own custom commands, and more. Live experimentation on the executing bomb is the most direct way to become familiar in what's happening at the assembly level. Here are some suggestions on how to maximize your use of gdb on the bomb:
- Expand your gdb repertoire. The labs have introduced you to handy commands such as break, x, print, display, info, disassemble, and stepi/nexti. Here are some additional commands that you might find similarly useful: set variable, watch, jump, kill, and return. Within gdb, you can use help name-of-command to get more details about any gdb command.
- Get fancy with your breakpoints. You can breakpoints by function name, source line, or address of a specific instruction. You can make breakpoints conditional using cond. The ignore command will pass over a breakpoint until a later iteration. Use commands to specify a list of commands to be automatically executed whenever a given breakpoint is hit. These commands might print a variable, dump the stack, jump to a different instruction, change values in memory, return early from a function, and so on.
  
  Late-breaking news: gdb 7.7 (current version on myth as of 11/2014) has a bug when attempting to use kill in the commands sequence for a breakpoint that causes a downward spiral of problems -- up to and including crashing gdb itself. The gdb command signal SIGKILL can be used as an alternate means to kill a program from a commands sequence that doesn't trip this bug.
- Using a .gdbinit file. The file named .gdbinit in the current directory can be used to set a startup sequence for gdb. In this text file, you enter a sequence of commands exactly as you would type them to the gdb command prompt. If your personal gdb configuration allows loading an init file, upon starting, gdb will automatically execute the commands from it. This will be a convenient place to put gdb commands to execute every time you start the debugger. Hint: wouldn't this be useful for creating breakpoints with commands that you want to be sure are always in place when running the bomb?
  
  Important note: To enable use of .gdbinit, you must change your personal gdb configuration to allow loading it. See last question in our gdb FAQ about how to tell if loading is declined. Copy and paste the command below in your terminal and execute it to add the necessary setting to your configuration file. You will need to make this configuration change only once.
```
bash -c 'echo set auto-load safe-path / >> ~/.gdbinit'
```
  The .gdbinit file we give you in the starter repo has only one command to echo Executing commands from .gdbinit in current directory. If you see this message when you start gdb, it confirms the .gdbinit file has been loaded.
- Custom gdb commands. Use define to add your own gdb "macros" for often-repeated command sequences. You can add defines to your .gdbinit file so you have access to them in subsequent gdb sessions as well.
- Fire up tui mode (maybe...). The command layout asm followed by layout reg will give you a split window showing disassembly and register values. This visual display is super-neat and can be quite handy for single-stepping through the assembly, but it also comes with its own quirks and bugs. The tui mode can occasionally crash gdb itself, killing off gdb and possibly the bomb while it's at it (one thing we recommend you avoid is resizing the window when tui is active as it seems possibly related to crashing). Even when tui is seemingly working, the display has a habit of turning wonky. A refresh command often fixes it up, but you may occasionally need to exit and re-enter tui mode to reset the state. A garbled display could cause you to misunderstand the program state, misidentify where your bomb is currently executing, or accidentally execute a gdb command you didn't intend. Any explosion suppression mechanism that requires you, the fallible human, to do the right thing at the right time could easily be waylaid by interference from tui, so don't attempt tui before you have invincible automatic protection against explosions.
One of the most important learning goals of this assignment is to become skilled in using the debugger. This is a crucial skill that can pay big dividends the rest of your career!
The gcc compiler. If you're unsure how to a particular C construct translates to assembly or how to access a certain kind of data, another technique is to try starting from the other side. Write a little C program with the code in question, compile it, and then trace its disassembly, either deadlisted or in gdb. For example, if you're not sure how a default case is incorporated into a switch table or how a function pointer works in a qsort call, this would be a good way to find out. Since you have written the test program, you also don't have to fear its explosive nature :-) You can compile by directly invoking gcc or adapting a simple makefile (the Makefile from any CS107 assignment/lab is a good starting point).

We do have one rule: do not use brute force! You could write a program to try every possible input to find a solution. But this is trouble for several reasons:

You lose points on every incorrect guess which explodes the bomb.
A notification is sent on each bomb explosion. Wild guessing will saturate the network, creating ill will among other users and attracting the ire of the system administrators who have the authority to revoke your privileges because you are abusing shared resources.
We haven't told you how long the strings are, nor have we told you what characters they can contain. Even if you made the (wrong) assumptions that they all are less than 80 characters long and only contain lowercase letters, you will have 26⁸⁰ guesses for each level. Trying them all will take an eternity, and you will not have an answer before you graduate.
In the readme file accompanying your submission, we expect you to show understanding of the C code, which guessing will not provide. :-)

Logistics

Our counter-intelligence efforts been able to confirm a few things about how the bombs operate:

If you start the bomb with no command-line argument, it reads input typed at the console.
If you give an argument to the bomb:
```
./bomb input.txt
```
the bomb will read lines from input.txt until it reaches EOF (end of file), and then switch over to reading from the console. This feature allows you to avoid retyping the solutions to those levels you have already defused. This is also how we will test your inputs when grading.
The bomb in your repository was lovingly created just for you and is unique to your id. It is said that the bomb can detect if an impostor attempts to execute your bomb and won't play along.
The bombs are designed for the myth computers (running on the console or logged in remotely). There is a rumor that the bomb will refuse to run anywhere else.
Direct modification of the binary bomb executable can change its behavior, but be forewarned that we will test your submission against your original unmodified binary, so while hacking the executable is good fun, it won't be of much use as a strategy for solving the levels. (Although it can be an entertaining and educational exercise in suppressing explosions...)
The bombs were compiled from rather ordinary C code using gcc without optimization. Apparently Cal students don't know how to edit a Makefile to change the flags to achieve much obfuscation of the object code. (For the curious: the exact flags happen to be gcc -O0 -m32 -std=gnu99 -fno-builtin -fno-stack-protector)
The Cal students also weren't aware the function names would be visible in the object code, so they didn't take pains to disguise them. Thus, a function name of initialize_bomb or read_five_numbers can be a clue. Similarly, they played it straight with use of the standard C library functions, so if you encounter a call to qsort or sscanf, it is the real deal.

Readme file

In the readme.txt file, you are to include:

An overview of the tactics you used to suppress/avoid explosions
An explanation of how each level operates

After sharing how you used your gdb/executable-hacking skills to hold explosions at bay, the rest of the readme is for you to demonstrate that you have a solid grasp on what C code was compiled to create the assembly for each level. It is not necessary that you provide a line-for-line perfectly faithful reproduction of the original C source, but we do expect an accurate description of the "shape" of the code. You should identify what constructs are being used (if/else, loops, switch), how control flows, and the operation/intent of the functions being used. Given an if statement, does it include an else, is there a sequence of if statements that are nested or cascading? Given a loop, where does it start and stop? How does increment? Can you tell if the loop is a for, while, or do-while? Is there any use of break or continue in the loop body? Given a switch statement, how many cases are there and what values do they have? Do any cases fall-through (no break statement), is there a default case? For function calls, what are the parameters being passed? How/what does it return?

Some assembly cannot be definitively matched to an unique C sequence (for example, for and while loops are largely indistinguishable, and a cascading if/else can look a lot like a switch) and likewise, the same C sequence can compile to different assembly. It's fine to provide one correct interpretation and indicate your confidence in how exact the match or acknowledge where there is room for ambiguity/alternatives. There are situations where the code has multiple paths through it. There may be more than one path that leads to a solution, harmless/useless paths, paths that explode, dead ends, and so on. In order to figure out which path(s) is the one you want, you will likely have to at least look at a few. The readme should start with a correct summary of the control "shape", i.e. what paths are available and what decisions determine how/which paths are followed. Your explanation should provide details of what happens when following along the path(s) you do take, but you can be fairly vague about the contents along the other paths you rejected or left unexplored.

A level should be described using C/pseudocode with English annotation as necessary. A painstaking play-by-play of the assembly (e.g. "compares the value at %ebp + 8 to the value at %ebp + 12 and jumps to this address if they are equal") is not appropriate. There is no need to refer registers or assembly instructions at all.

If you get stuck on a level, you can earn partial credit by using the readme to describe the steps you have been able to figure and/or the tactics you have been using to work on it.

The requirement for the readme file is not intended to be onerous. Strunk and White famously say "vigorous writing is concise" and we couldn't agree more. Most levels can be described in handful of C lines and an English sentence or two. In fact, excess verbiage can weaken an otherwise correct explanation, so you're best off sticking the facts!

Grading

For this assignment, the total number of points is expected to be 68, distributed as follows:

Input.txt. (42 points) 7 points for each level you have solved. We test this by running bomb input.txt on your submission. The input.txt file in your submission should contain one line for each level you have solved, starting from level 1, with no extraneous text.
Bomb explosions. (up to -5 points deducted) Each bomb explosion notification that reaches the staff results in a 1/2 point deduction, capped at -5 points total.
Readme.txt. (26 points) 2 points for describing your explosion suppression strategy, 4 points per level. A clear, concise, correct description (C source or appropriate equivalent) will earn full credit. Explanations that are vague, inaccurate, or overly verbose may earn partial credit.

Past scores on this assignment have been quite high (median over 90%) and students report that studying someone else's code can be much less time-consuming that debugging their own. :-) Reverse-engineering is fun and surprisingly addictive, hope you enjoy the diversion!

Getting started

The starter project contains your compiled binary bomb, empty input.txt and readme.txt files, and a .gdbinit file. Check out a copy of your cs107 repository with the command:

    hg clone /afs/ir/class/cs107/repos/assign5/$USER assign5

Do not start by running the bomb to "see what it will do..." You will quickly learn that "what it does" is explode :-) Each wrong input explodes the bomb and earns a 1/2 point deduction. Your first task should be to put your kid gloves on and carefully poke around to figure how you can prevent explosion notifications. By first putting in place protection against explosions, you will then be free to experiment with the levels without any nail-biting anxiety about setting off the bomb.

When grading, we will run your bomb on your input (e.g. invoke bomb input.txt) to verify which levels you have successfully defused. As you solve successive levels, add the necessary inputs to the input.txt file and document your work in the readme.txt file. Be sure to regularly commit your changes to these files.

Finishing

Your final committed version of input.txt should contain the strings that correctly defuse the levels that you solved and readme.txt should contain the information that documents your understanding. Don't forget to use the normal submit process when finished to send your files for grading. If needed, review our late policy.

Frequently asked questions about assign5

How do I know if the bomb has exploded?

When an unadulterated bomb explodes, it prints "KABOOM", notifies the authorities, and terminates. The bomb can only explode when it is "live", i.e., executing in shell or gdb. Using tools such as nm, strings, and objdump to examine the executable will not explode the bomb.

How can I tell if the staff heard my explosion?

The bomb has no secrets -- all the code is right there. If you dig into the code that processes explosions you can determine for yourself how/when/whether the word gets out. Avoiding the entire explosion is one straightforward approach to assure that we won't hear about it.

I did something careless and exploded my bomb. Can I undo that explosion?

We count all explosions that reach us. Consider it a fun challenge to work so carefully that you never detonate it, but should an explosion slip through, be assured that no cute baby animals have actually been harmed and the cost for that uncaught explosion is a mere 1/2 point and capped at 5 points total.

Do we need to recreate every single line of C source in the bomb?

You are asked to provide a correct input to pass each level along with an explanation of how the level operates. This will require a fairly complete exploration of the code path you follow for a given level, but any code outside that path can be investigated on a need-to-know basis.

My input defuses the level when typed manually, but when I added the same input to input.txt, it is rejected. What gives?

The contents of input.txt should consist of the input for each level on its own line and a line should end with a standard Unix newline (n). Stop in gdb and examine the line read from your file to spot the discrepancy between what you need and what you have. Watch for extra spaces before/after the input and pay close attention to the line endings. The unix editors available on myth (emacs, vim, gedit, etc.) use the correct line endings (n) by default. Editors on other platforms that are using the line-ending conventions for Mac (r) or Windows (rn) will cause you grief. The easiest approach to avoid problems is to edit the input.txt file using a unix editor.

I found some other IA32 reference material that seems syntactically/logically inconsistent with the IA32 from our textbook/lecture/lab/objdump/gdb. What's up?

Our unix-based tools support the IA32 att (AT&T) syntax and all of our materials (text, lecture, lab) are consistent with this syntax. If you hunt down other resources in the wild, you may encounter Intel syntax where the order of operands are reversed, register names are not prefixed with %, immediate values are not prefixed with $, indirection is expressed with brackets instead of parens, and so on. For example, the att instruction push %ebp is written as push EBP in Intel and att movl $1, (%esp) becomes movl [ESP], 1. Translating between them isn't terribly hard, but you may find it easier to stick with resources that use the same syntax we do.