Lab 5: IA32

Lab sessions Mon Feb 09 to Thu Feb 12

Lab written by Julie Zelenski

Learning goals

This lab is designed to give you a chance to:

  1. use objdump and gdb to disassemble and trace assembly code
  2. study the relationship between source code and its assembly translation
  3. reverse-engineer a little assembly back to C
  4. hand-generate a simple sequence in assembly

Find an open computer and somebody new to sit with. Introduce yourself and share your suggestions about how to best prep for the upcoming midterm.

Lab exercises

  1. Get started. Clone the lab starter project using the command

    hg clone /afs/ir/class/cs107/repos/lab5/shared lab5
    

    This creates the lab5 directory which contains source files and a Makefile. Bring up our guide to IA32 basics and this handy IA32 cheat sheet (stolen from The Laboratory for Software Technology) in your browser for reference during lab. And as always, have the online lab checkoff ready so you can jot down things as you go. At the end of the lab period, you will submit the sheet and have the TA check off your work.

  2. Disassembling with objdump. objdump is a tool that operates on object files (i.e. files containing compiled machine code). It can dig out all sorts of information from the object file (the objdump man page is a good resource for learning more), but one of the more common uses is to disassemble object code into assembly form. Let's try it out!

    • Invoking objdump -d disassembles an object file. The output is a list of binary-encoded machine instructions each alongside its assembly equivalent (this format is like that you used for disassemble() in assignment 4). If the object file was compiled with debugging information, adding the -S flag to objdump will intersperse the original C source with the assembly. Thus, objdump -S -d shows each C construct matched with its compiled translation into assembly. This sort of dump is called a deadlist ("dead" to distinguish from the study of "live" assembly as it executes). The deadlist of an entire program can be long and a bit overwhelming.
    • The countops.py utility from this lab reports the assembly instructions most heavily used in a given object file. Try out countops.py trace.o for an example. This python program operates by invoking objdump to disassemble the file, tallies instructions by opcode, and reports the top 10 most frequent. Try it out on a few executables (your spellcheck or reassemble, or programs like emacs or gcc and so on) to get an idea of what the mix of IA32 instructions tends to look like.

  3. Assembly-level debugging in gdb. Our debugger has great support for working with code at the assembly level. Load the trace program in gdb and try out the gdb commands listed below that allow to poke around at the assembly-level. To learn more about a command available within gdb, use gdb's built-in help.

    • The gdb disassemble command. With no arguments, will print the disassembled instructions for the currently executing function. You can also give an optional argument of what to disable: function name, address, or range of addresses. disassemble/m will intersperse assembly with original C source which can be helpful when trying to match up the two.

      (gdb) disassemble main
      Dump of assembler code for function main:
      0x08048596 <+0>:     push  %ebp
      0x08048597 <+1>:     mov   %esp,%ebp
      0x08048599 <+3>:     and   $0xfffffff0,%esp
      ...
      

      In the disassembly as printed by gdb, the hex number in the leftmost column is the address in memory for that instruction and in angle brackets is the offset of that instruction relative to the start of the function. You may notice minor differences in presentation between the disassembled instructions as printed by gdb versus the output from objdump, e.g. use of movl instead of mov, negative signed values may display as large unsigned, and so on.

    • The gdb x command (examine memory) includes an i format. x/i addr will decode the binary-encoded instruction at a given address and print its disassembled translation:

      (gdb) x/i main                 prints first instruction of main
      (gdb) x/8i main                prints first 8 instructions of main
      
    • You can set a breakpoint at a specific machine instruction by specifying its address b *address or an offset within a function b * main+6. Note that the latter is not 6 instructions into main, but 6 bytes worth of instructions into main. Given the variable-length encoding of IA32 instructions, 6 bytes can correspond to one or several instructions.

      (gdb) b *0x08048375            break at specified address
      (gdb) b *main+6                break at instruction 6 bytes past start of main
      
    • The gdb stepi and nexti commands allow you to single-step through assembly instructions. These are the assembly-level equivalents of the source-level step and next commands. (They can be abbreviated si and ni).

      (gdb) stepi                    executes next single machine instruction
      (gdb) nexti                    executes next machine instruction (proceed over fn calls)
      
    • The gdb info reg command will print the value of the eight integer registers and the control codes. info all-reg includes floating point and vector registers. You can refer to an individual register by name to view or change the register's value. Within gdb, a register name is prefixed with $ instead of the % as in the assembly.

      (gdb) info reg
      (gdb) p $ebp                   show current value in %ebp register
      (gdb) set $eax = 9             change current value in %eax register
      
    • You can add a display expression to print the current value of a given expression each time your program stops in the debugger. One useful expression to display when stepping is the next instruction to be executed. The eip register holds the address of the next instruction to be executed, setting it to display before you stepi will print the next instruction before executing it. The display command works for other expressions, too--- variables, parameters, arithmetic, and so on--- very handy!

      (gdb) display/i $eip
      
    • Last, but certainly not least, this is a great time to try out the tui (text user interface) I have been using in lecture. Tui mode splits your session into panes for simultaneously viewing the C source, assembly translation, and/or current register state. The gdb layout command puts the debugger into tui mode. The layout argument specifies which pane(s) you want (src,asm, regs, or split). Tui mode is a great tool for tracing/visualizing execution, but sadly also can be a nuisance at times (garbling the display and/or misleading you about the current state of affairs). If your tui window has gotten whacked, the refresh command sometimes works to clean it up. If things get really out of hand, ctrl-x a will exit tui mode and return you to ordinary non-graphical gdb.

  4. Reading assembly. Read over the C code in trace.c. Compile the program and use objdump -d -S trace.o to deadlist the generated assembly interspersed with the original C source (or use disassemble/m fn_name in gdb to do same by function name) . There are several interesting observations you can make by comparing the C code to its translation. Study the disassembled output in order to answer the following questions.

    In the variables function:

    • How are the values in the nums array initialized? What happened to the strlen call on the string constant to init the last array element?
    • What instructions were emitted to compute the value assigned to count? What does this tell you about the sizeof operator?
    • In the loop body, look at the instructions and addressing mode used to access each array element via nums and compare to the use of ptr. Accessing an element using ptr[i] requires one more memory access than via nums[i], can you identify where that happens in the instruction stream? Do you understand why there is an additional memory load in this case?

    In the u_arith and s_arith functions:

    • These functions invoke similar arithmetic operations but differ in the signedness of the operand. For add and subtract, there is no difference in the assembly issued for unsigned versus signed. How it is possible that the same add/sub instruction does the correct thing for both unsigned and signed arithmetic?
    • To multiply by 8, what instruction is used? Is the same for unsigned and signed?
    • To mod by 16, what instruction is used for unsigned? Why does that choice work?
    • To div by 2, what instruction is used for unsigned? For signed, the assembly sequence seems more complex. Trace through by hand or stepi in the debugger. Can you explain what's going on in this case and why it's more complex?
    • When doing a right-shift, does gcc emit an arithmetic (sar) or logical (shr) shift?

    In the conversions functions:

    • What instructions are emitted to convert/assign a signed int to an unsigned int or vice versa?
    • What instructions are emitted to promote a char to integer bitwidth? Does it matter whether the destination integer is signed or unsigned? Does it matter whether the source char is signed or unsigned?
    • How many instructions are used to convert between float and int? What about when copying the raw bits unconverted?

    In the loops function:

    • First, examine the C code for the three loop variants. Under which conditions will all three loops have the same behavior and when will they differ?
    • Now examine the assembly code. What is the difference between the assembly for the for loop versus the while loop? For the while versus the do-while?

  5. Reverse-engineering and hand-generation. The program trace uses solve to make several calls to the function mystery and prints its results. Let's look into this mystery further! The mystery function was written in directly in assembly, not compiled from C. Open the mystery.s file to read the assembly. Now use gdb stepi through the execution of a call to mystery and observe its operation. Once you understand it, jot down an equivalent C version of mystery. And lastly, try out hand-generating a little assembly by editing the mystery.s file to change the implementation of mystery to instead return the negative inverse (i.e. -value) of the smaller of the two arguments. Compile and test to verify that your assembly code is correct.

  6. Visualize awesomeness. Three super-cool CS107 alumni (Thank you Julia, Kat, and Constance!) developed a web-based interactive simulator/visualizer designed for students learning IA32. I think it's pretty nifty! Visit Rainbow Onion to check it out. You can use it as tutorial by choosing one of the topics from the tutorial menu and walk through its guided example, using the included self-test exercises along the way to confirm your understanding. You can also use it as simulator for experimentation and play. Edit the assembly code in the main pane and then "Step" through it, while observing updates to the registers and condition codes. Does the visualization help you better understand what's happening at the machine level?

Check off with TA

Before you leave, be sure to submit your checkoff sheet (in the browser) and have your lab TA come by and confirm so you will be properly credited. If you don't finish everything before lab is over, we strongly encourage you to finish the remainder on your own!