Assignment 5: Binary bomb

Due date: Mon Feb 23 11:59 pm - Hard deadline: Fri Feb 27 11:59 pm

Assignment by Julie Zelenski, ideas taken from the original by Randal Bryant & David O'Hallaron (CMU )

Learning goals

Completing this assignment will give you:

  1. solid practice in reading and tracing IA32 assembly
  2. a clear understanding of how data access, control structures, and function calls translate between C and IA32
  3. experience reverse-engineering object code
  4. a compelling reason to invest in mastering the gdb debugger!

The problem

Those nefarious Cal students have broken into our myth machines and planted some mysterious executables we are calling "binary bombs." These programs are believed to be armed and dangerous. Without the original source, we don't have much to go on, but we have observed that the programs seem to operate in a sequence of levels. Each level challenges the user to enter a string. If the user enter the correct string, it defuses the level and the program proceeds on. But given the wrong input, the bomb explodes by printing an earth-shattering KABOOM! and terminating. To deactivate the entire bomb, one needs to successfully defuse each of its levels.

The Cal students have littered our systems with these landmines and we need your help. Each of you is given a bomb to disable. Your mission is to apply your best IA32 detective skills to work out the input required to pass each level and render the entire bomb harmless.

You will use a variety of tools and techniques to dissect your bomb. The levels get progressively more complex, but the expertise you gain as you move up from each level should offset this difficulty. One confounding factor is that the bomb has a hair trigger, prone to exploding at the least provocation. Each time your bomb explodes, it notifies the staff, which deducts from your score. Thus, there are consequences to detonating the bomb-- you must tread carefully!

Tools and strategies

Your bomb is given to you as an executable, i.e. as compiled object code. There are various utilities for poking around in object code that can be useful in analyzing your bomb. The task before you is one of reverse-engineering, working backwards to reconstruct the original C source from the executable. Making progress on this task will require a mix of approaches and tools. Here are some suggested points of attack:

We do have one rule: do not use brute force! You could write a program to try every possible input to find a solution. But this is trouble for several reasons:

Logistics

Our counter-intelligence efforts been able to confirm a few things about how the bombs operate:

Readme file

In the readme.txt file, you are to include:

  1. An overview of the tactics you used to suppress/avoid explosions
  2. An explanation of how each level operates

After sharing how you used your gdb/executable-hacking skills to hold explosions at bay, the rest of the readme is for you to demonstrate that you have a solid grasp on what C code was compiled to create the assembly for each level. It is not necessary that you provide a line-for-line perfectly faithful reproduction of the original C source, but we do expect an accurate description of the "shape" of the code. You should identify what constructs are being used (if/else, loops, switch), how control flows, and the operation/intent of the functions being used. Given an if statement, does it include an else, is there a sequence of if statements that are nested or cascading? Given a loop, where does it start and stop? How does increment? Can you tell if the loop is a for, while, or do-while? Is there any use of break or continue in the loop body? Given a switch statement, how many cases are there and what values do they have? Do any cases fall-through (no break statement), is there a default case? For function calls, what are the parameters being passed? How/what does it return?

Some assembly cannot be definitively matched to an unique C sequence (for example, for and while loops are largely indistinguishable, and a cascading if/else can look a lot like a switch) and likewise, the same C sequence can compile to different assembly. It's fine to provide one correct interpretation and indicate your confidence in how exact the match or acknowledge where there is room for ambiguity/alternatives. There are situations where the code has multiple paths through it. There may be more than one path that leads to a solution, harmless/useless paths, paths that explode, dead ends, and so on. In order to figure out which path(s) is the one you want, you will likely have to at least look at a few. The readme should start with a correct summary of the control "shape", i.e. what paths are available and what decisions determine how/which paths are followed. Your explanation should provide details of what happens when following along the path(s) you do take, but you can be fairly vague about the contents along the other paths you rejected or left unexplored.

A level should be described using C/pseudocode with English annotation as necessary. A painstaking play-by-play of the assembly (e.g. "compares the value at %ebp + 8 to the value at %ebp + 12 and jumps to this address if they are equal") is not appropriate. There is no need to refer registers or assembly instructions at all.

If you get stuck on a level, you can earn partial credit by using the readme to describe the steps you have been able to figure and/or the tactics you have been using to work on it.

The requirement for the readme file is not intended to be onerous. Strunk and White famously say "vigorous writing is concise" and we couldn't agree more. Most levels can be described in handful of C lines and an English sentence or two. In fact, excess verbiage can weaken an otherwise correct explanation, so you're best off sticking the facts!

Grading

For this assignment, the total number of points is expected to be 68, distributed as follows:

Past scores on this assignment have been quite high (median over 90%) and students report that studying someone else's code can be much less time-consuming that debugging their own. :-) Reverse-engineering is fun and surprisingly addictive, hope you enjoy the diversion!

Getting started

The starter project contains your compiled binary bomb, empty input.txt and readme.txt files, and a .gdbinit file. Check out a copy of your cs107 repository with the command:

    hg clone /afs/ir/class/cs107/repos/assign5/$USER assign5

Do not start by running the bomb to "see what it will do..." You will quickly learn that "what it does" is explode :-) Each wrong input explodes the bomb and earns a 1/2 point deduction. Your first task should be to put your kid gloves on and carefully poke around to figure how you can prevent explosion notifications. By first putting in place protection against explosions, you will then be free to experiment with the levels without any nail-biting anxiety about setting off the bomb.

When grading, we will run your bomb on your input (e.g. invoke bomb input.txt) to verify which levels you have successfully defused. As you solve successive levels, add the necessary inputs to the input.txt file and document your work in the readme.txt file. Be sure to regularly commit your changes to these files.

Finishing

Your final committed version of input.txt should contain the strings that correctly defuse the levels that you solved and readme.txt should contain the information that documents your understanding. Don't forget to use the normal submit process when finished to send your files for grading. If needed, review our late policy.

Frequently asked questions about assign5

How do I know if the bomb has exploded?

When an unadulterated bomb explodes, it prints "KABOOM", notifies the authorities, and terminates. The bomb can only explode when it is "live", i.e., executing in shell or gdb. Using tools such as nm, strings, and objdump to examine the executable will not explode the bomb.

How can I tell if the staff heard my explosion?

The bomb has no secrets -- all the code is right there. If you dig into the code that processes explosions you can determine for yourself how/when/whether the word gets out. Avoiding the entire explosion is one straightforward approach to assure that we won't hear about it.

I did something careless and exploded my bomb. Can I undo that explosion?

We count all explosions that reach us. Consider it a fun challenge to work so carefully that you never detonate it, but should an explosion slip through, be assured that no cute baby animals have actually been harmed and the cost for that uncaught explosion is a mere 1/2 point and capped at 5 points total.

Do we need to recreate every single line of C source in the bomb?

You are asked to provide a correct input to pass each level along with an explanation of how the level operates. This will require a fairly complete exploration of the code path you follow for a given level, but any code outside that path can be investigated on a need-to-know basis.

My input defuses the level when typed manually, but when I added the same input to input.txt, it is rejected. What gives?

The contents of input.txt should consist of the input for each level on its own line and a line should end with a standard Unix newline (n). Stop in gdb and examine the line read from your file to spot the discrepancy between what you need and what you have. Watch for extra spaces before/after the input and pay close attention to the line endings. The unix editors available on myth (emacs, vim, gedit, etc.) use the correct line endings (n) by default. Editors on other platforms that are using the line-ending conventions for Mac (r) or Windows (rn) will cause you grief. The easiest approach to avoid problems is to edit the input.txt file using a unix editor.

I found some other IA32 reference material that seems syntactically/logically inconsistent with the IA32 from our textbook/lecture/lab/objdump/gdb. What's up?

Our unix-based tools support the IA32 att (AT&T) syntax and all of our materials (text, lecture, lab) are consistent with this syntax. If you hunt down other resources in the wild, you may encounter Intel syntax where the order of operands are reversed, register names are not prefixed with %, immediate values are not prefixed with $, indirection is expressed with brackets instead of parens, and so on. For example, the att instruction push %ebp is written as push EBP in Intel and att movl $1, (%esp) becomes movl [ESP], 1. Translating between them isn't terribly hard, but you may find it easier to stick with resources that use the same syntax we do.