Due date: Mon Feb 23 11:59 pm - Hard deadline: Fri Feb 27 11:59 pm
Assignment by Julie Zelenski, ideas taken from the original by Randal Bryant & David O'Hallaron (CMU )
Completing this assignment will give you:
Those nefarious Cal students have broken into our myth machines and planted some mysterious executables we are calling "binary bombs." These programs are believed to be armed and dangerous. Without the original source, we don't have much to go on, but we have observed that the programs seem to operate in a sequence of levels. Each level challenges the user to enter a string. If the user enter the correct string, it defuses the level and the program proceeds on. But given the wrong input, the bomb explodes by printing an earth-shattering KABOOM!
and terminating. To deactivate the entire bomb, one needs to successfully defuse each of its levels.
The Cal students have littered our systems with these landmines and we need your help. Each of you is given a bomb to disable. Your mission is to apply your best IA32 detective skills to work out the input required to pass each level and render the entire bomb harmless.
You will use a variety of tools and techniques to dissect your bomb. The levels get progressively more complex, but the expertise you gain as you move up from each level should offset this difficulty. One confounding factor is that the bomb has a hair trigger, prone to exploding at the least provocation. Each time your bomb explodes, it notifies the staff, which deducts from your score. Thus, there are consequences to detonating the bomb-- you must tread carefully!
Your bomb is given to you as an executable, i.e. as compiled object code. There are various utilities for poking around in object code that can be useful in analyzing your bomb. The task before you is one of reverse-engineering, working backwards to reconstruct the original C source from the executable. Making progress on this task will require a mix of approaches and tools. Here are some suggested points of attack:
nm
utility dumps the symbol table from an executable. The symbols includes the names of functions and global variables and their addresses. The symbol table by itself is not a lot to go on, but just reading the names might give you a little sense of the lay of the land.strings
utility will display all the printable strings in an executable, including all string constants. What strings do you find in your bomb? Do any of them seem of relevance to the task at hand?objdump
disassembler can dump of the object code into its disassembled equivalent. Reading and tracing the disassembled code is where the bulk of your information will come from. Scrutinizing the lifeless object code without executing is a technique known as deadlisting. Once you sort out what the object code does, you can, in effect, translate it back to C and then see what input is expected. This works reasonably well on simple passages of code, but can become unwieldy when the code is more complex.The gdb
debugger is absolutely invaluable here. You can use gdb to single-step by assembly instruction, examine (and change!) memory and registers, view the runtime stack, disassemble the object code, set breakpoints and watchpoints, re-route control flow, write your own custom commands, and more. Live experimentation on the executing bomb is the most direct way to become familiar in what's happening at the assembly level. Here are some suggestions on how to maximize your use of gdb on the bomb:
break
, x, print
, display
, info
, disassemble
, and stepi/nexti
. Here are some additional commands that you might find similarly useful: set variable
, watch
, jump
, kill
, and return.
Within gdb, you can use help name-of-command
to get more details about any gdb command.Get fancy with your breakpoints. You can breakpoints by function name, source line, or address of a specific instruction. You can make breakpoints conditional using cond
. The ignore
command will pass over a breakpoint until a later iteration. Use commands
to specify a list of commands to be automatically executed whenever a given breakpoint is hit. These commands might print a variable, dump the stack, jump to a different instruction, change values in memory, return early from a function, and so on.
Late-breaking news: gdb 7.7 (current version on myth as of 11/2014) has a bug when attempting to use kill
in the commands sequence for a breakpoint that causes a downward spiral of problems -- up to and including crashing gdb itself. The gdb command signal SIGKILL
can be used as an alternate means to kill a program from a commands sequence that doesn't trip this bug.
Using a .gdbinit file. The file named .gdbinit
in the current directory can be used to set a startup sequence for gdb. In this text file, you enter a sequence of commands exactly as you would type them to the gdb command prompt. If your personal gdb configuration allows loading an init file, upon starting, gdb will automatically execute the commands from it. This will be a convenient place to put gdb commands to execute every time you start the debugger. Hint: wouldn't this be useful for creating breakpoints with commands that you want to be sure are always in place when running the bomb?
Important note: To enable use of .gdbinit, you must change your personal gdb configuration to allow loading it. See last question in our gdb FAQ about how to tell if loading is declined. Copy and paste the command below in your terminal and execute it to add the necessary setting to your configuration file. You will need to make this configuration change only once.
bash -c 'echo set auto-load safe-path / >> ~/.gdbinit'
The .gdbinit file we give you in the starter repo has only one command to echo Executing commands from .gdbinit in current directory
. If you see this message when you start gdb, it confirms the .gdbinit file has been loaded.
Custom gdb commands. Use define
to add your own gdb "macros" for often-repeated command sequences. You can add defines to your .gdbinit
file so you have access to them in subsequent gdb sessions as well.
layout asm
followed by layout reg
will give you a split window showing disassembly and register values. This visual display is super-neat and can be quite handy for single-stepping through the assembly, but it also comes with its own quirks and bugs. The tui
mode can occasionally crash gdb itself, killing off gdb and possibly the bomb while it's at it (one thing we recommend you avoid is resizing the window when tui is active as it seems possibly related to crashing). Even when tui is seemingly working, the display has a habit of turning wonky. A refresh
command often fixes it up, but you may occasionally need to exit and re-enter tui mode to reset the state. A garbled display could cause you to misunderstand the program state, misidentify where your bomb is currently executing, or accidentally execute a gdb command you didn't intend. Any explosion suppression mechanism that requires you, the fallible human, to do the right thing at the right time could easily be waylaid by interference from tui
, so don't attempt tui
before you have invincible automatic protection against explosions.One of the most important learning goals of this assignment is to become skilled in using the debugger. This is a crucial skill that can pay big dividends the rest of your career!
The gcc
compiler. If you're unsure how to a particular C construct translates to assembly or how to access a certain kind of data, another technique is to try starting from the other side. Write a little C program with the code in question, compile it, and then trace its disassembly, either deadlisted or in gdb. For example, if you're not sure how a default case is incorporated into a switch table or how a function pointer works in a qsort call, this would be a good way to find out. Since you have written the test program, you also don't have to fear its explosive nature :-) You can compile by directly invoking gcc or adapting a simple makefile (the Makefile from any CS107 assignment/lab is a good starting point).
We do have one rule: do not use brute force! You could write a program to try every possible input to find a solution. But this is trouble for several reasons:
Our counter-intelligence efforts been able to confirm a few things about how the bombs operate:
If you give an argument to the bomb:
./bomb input.txt
the bomb will read lines from input.txt
until it reaches EOF (end of file), and then switch over to reading from the console. This feature allows you to avoid retyping the solutions to those levels you have already defused. This is also how we will test your inputs when grading.
The bomb in your repository was lovingly created just for you and is unique to your id. It is said that the bomb can detect if an impostor attempts to execute your bomb and won't play along.
gcc -O0 -m32 -std=gnu99 -fno-builtin -fno-stack-protector
)initialize_bomb
or read_five_numbers
can be a clue. Similarly, they played it straight with use of the standard C library functions, so if you encounter a call to qsort
or sscanf
, it is the real deal.In the readme.txt file, you are to include:
After sharing how you used your gdb/executable-hacking skills to hold explosions at bay, the rest of the readme is for you to demonstrate that you have a solid grasp on what C code was compiled to create the assembly for each level. It is not necessary that you provide a line-for-line perfectly faithful reproduction of the original C source, but we do expect an accurate description of the "shape" of the code. You should identify what constructs are being used (if/else, loops, switch), how control flows, and the operation/intent of the functions being used. Given an if
statement, does it include an else
, is there a sequence of if
statements that are nested or cascading? Given a loop, where does it start and stop? How does increment? Can you tell if the loop is a for
, while
, or do-while
? Is there any use of break
or continue
in the loop body? Given a switch
statement, how many cases are there and what values do they have? Do any cases fall-through (no break
statement), is there a default case? For function calls, what are the parameters being passed? How/what does it return?
Some assembly cannot be definitively matched to an unique C sequence (for example, for
and while
loops are largely indistinguishable, and a cascading if/else
can look a lot like a switch
) and likewise, the same C sequence can compile to different assembly. It's fine to provide one correct interpretation and indicate your confidence in how exact the match or acknowledge where there is room for ambiguity/alternatives. There are situations where the code has multiple paths through it. There may be more than one path that leads to a solution, harmless/useless paths, paths that explode, dead ends, and so on. In order to figure out which path(s) is the one you want, you will likely have to at least look at a few. The readme should start with a correct summary of the control "shape", i.e. what paths are available and what decisions determine how/which paths are followed. Your explanation should provide details of what happens when following along the path(s) you do take, but you can be fairly vague about the contents along the other paths you rejected or left unexplored.
A level should be described using C/pseudocode with English annotation as necessary. A painstaking play-by-play of the assembly (e.g. "compares the value at %ebp + 8
to the value at %ebp + 12
and jumps to this address if they are equal") is not appropriate. There is no need to refer registers or assembly instructions at all.
If you get stuck on a level, you can earn partial credit by using the readme to describe the steps you have been able to figure and/or the tactics you have been using to work on it.
The requirement for the readme file is not intended to be onerous. Strunk and White famously say "vigorous writing is concise" and we couldn't agree more. Most levels can be described in handful of C lines and an English sentence or two. In fact, excess verbiage can weaken an otherwise correct explanation, so you're best off sticking the facts!
For this assignment, the total number of points is expected to be 68, distributed as follows:
bomb input.txt
on your submission. The input.txt
file in your submission should contain one line for each level you have solved, starting from level 1, with no extraneous text.Past scores on this assignment have been quite high (median over 90%) and students report that studying someone else's code can be much less time-consuming that debugging their own. :-) Reverse-engineering is fun and surprisingly addictive, hope you enjoy the diversion!
The starter project contains your compiled binary bomb, empty input.txt and readme.txt files, and a .gdbinit file. Check out a copy of your cs107 repository with the command:
hg clone /afs/ir/class/cs107/repos/assign5/$USER assign5
Do not start by running the bomb to "see what it will do..." You will quickly learn that "what it does" is explode :-) Each wrong input explodes the bomb and earns a 1/2 point deduction. Your first task should be to put your kid gloves on and carefully poke around to figure how you can prevent explosion notifications. By first putting in place protection against explosions, you will then be free to experiment with the levels without any nail-biting anxiety about setting off the bomb.
When grading, we will run your bomb on your input (e.g. invoke bomb input.txt
) to verify which levels you have successfully defused. As you solve successive levels, add the necessary inputs to the input.txt
file and document your work in the readme.txt
file. Be sure to regularly commit your changes to these files.
Your final committed version of input.txt
should contain the strings that correctly defuse the levels that you solved and readme.txt
should contain the information that documents your understanding. Don't forget to use the normal submit process when finished to send your files for grading. If needed, review our late policy.
When an unadulterated bomb explodes, it prints "KABOOM", notifies the authorities, and terminates. The bomb can only explode when it is "live", i.e., executing in shell or gdb. Using tools such as nm, strings, and objdump to examine the executable will not explode the bomb.
The bomb has no secrets -- all the code is right there. If you dig into the code that processes explosions you can determine for yourself how/when/whether the word gets out. Avoiding the entire explosion is one straightforward approach to assure that we won't hear about it.
We count all explosions that reach us. Consider it a fun challenge to work so carefully that you never detonate it, but should an explosion slip through, be assured that no cute baby animals have actually been harmed and the cost for that uncaught explosion is a mere 1/2 point and capped at 5 points total.
You are asked to provide a correct input to pass each level along with an explanation of how the level operates. This will require a fairly complete exploration of the code path you follow for a given level, but any code outside that path can be investigated on a need-to-know basis.
The contents of input.txt should consist of the input for each level on its own line and a line should end with a standard Unix newline (n). Stop in gdb and examine the line read from your file to spot the discrepancy between what you need and what you have. Watch for extra spaces before/after the input and pay close attention to the line endings. The unix editors available on myth (emacs, vim, gedit, etc.) use the correct line endings (n) by default. Editors on other platforms that are using the line-ending conventions for Mac (r) or Windows (rn) will cause you grief. The easiest approach to avoid problems is to edit the input.txt file using a unix editor.
Our unix-based tools support the IA32 att (AT&T) syntax and all of our materials (text, lecture, lab) are consistent with this syntax. If you hunt down other resources in the wild, you may encounter Intel syntax where the order of operands are reversed, register names are not prefixed with %, immediate values are not prefixed with $, indirection is expressed with brackets instead of parens, and so on. For example, the att instruction push %ebp
is written as push EBP
in Intel and att movl $1, (%esp)
becomes movl [ESP], 1
. Translating between them isn't terribly hard, but you may find it easier to stick with resources that use the same syntax we do.