Assignment 6: Crash reporter

Due date: Mon Mar 02 11:59 pm - Hard deadline: Fri Mar 06 11:59 pm

[Quick Links: Implementation details, Advice page, Grading]

Assignment by Julie Zelenski

Learning goals

After completing this assignment, you can proudly say that you have learned how to:

  1. dissect an ELF object file
  2. write a signal handler
  3. manually troll the runtime stack
  4. build a custom library and link it with other programs

The problem

When a program crashes outside of the debugger, its dying words, often nothing more than Segmentation fault, are little help in pinpointing the problem. With knowledge of the runtime stack and the executable file layout, you can come to the rescue by writing a fault handler. The handler can be linked into a program and will intercede on a crash and walk up the stack to give a symbolic backtrace of where the crash occurred. Fault handlers are often built into commercial applications to capture crash reports out in the field.

The assignment consists of two tasks:

  1. Implement the namelist program that prints the list of functions contained in an object file. This program is a simplified version of the nm utility. This intermediate milestone verifies your handling of the ELF function symbol data.
  2. Implement the crash reporter library that contains a fault handler to print a backtrace when a program crashes. It maps return addresses to symbolic name using the same symbol data as namelist.

The ELF format

Both namelist and crash reporter require you to do a little dissection of an object file. An object file is the result of compiling, assembling, and possibly linking, C source code. It contains global variables, string constants, function names, IA32 code, etc., all encoded as binary data. Object files on our Linux machines are in the Executable and Linking Format (ELF). The elf man page and sections 7.3-7.5 of Bryant and O'Hallaron provide explanation and diagrams for ELF. (And for the insomniacs, here is the full 100-page ELF specification). The helpful readelf command can be used to print the contents of a specific part of an ELF file to see how the data is stored. And here is a neat ELF file diagram that passed by HackerNews the other day. (Thanks to Dmitri and Daltron for sharing!)

For this assignment, you will dig into the ELF file to extract the function symbol information. The relevant ELF data includes:

The diagram below shows the parts of the ELF format you will access (uint is used as an abbreviation for unsigned int and uintptr_t types):

image

An ELF file is designed to be directly accessed with minimal translation from its on-disk representation. For example, the section header table is a contiguous sequence of section headers, where each section header is the same size and has the same fields in the same order. Thus, the section header table is laid out as an array of section header structures. Similarly, the symtab section is an array of symbol structs. This means that you can apply a typecast to the location of the data within the file and treat it like an array, directly accessing entries using ordinary array notation.

Here are the specific steps to read the symbols from an ELF file:

  1. Read the initial characters of file and compare to the expected 32-bit ELF header. If header is valid, read the entire ELF file into memory. The starter project includes code for this task.
  2. Use the offset and nsectionheaders fields from the file header to identify the location and length of the section header table.
  3. Apply a typecast to location of the section header table to process it as an array of section headers. Loop over the array to find the section header whose type field is equal to SHT_SYMTAB. This is the header for the symbol table (symtab) section. An ELF file will have at most one symtab section.
  4. Use the offset and size fields from the symtab section header to identify the location and size of the symtab section data. The offset for a section header is the number of bytes between the beginning of the ELF file and the first byte of the section data.
  5. Apply a typecast to location of the symtab section data to process it as an array of symbols. Loop over the array to access each symbol in the table.
  6. The strings for the symbol names are not directly stored in the symtab section. Instead, there is a separate companion string table (strtab) section that contains the string data for all symbol names. The symtab section header has a strtab_index field, which is the index into the section header table for the companion string table section header. There can be more than one string table section in an ELF file, so be sure to use strtab_index to access the correct one.
  7. Use the offset and size fields from the strtab section header to identify the location and size of the strtab section data.
  8. Apply a typecast to location of the strtab section data to access the strings. The strtab section data is a sequence of null-terminated strings laid out contiguously. A symbol's name field identifies where to find the symbol's name. The name offset is expressed as the number of bytes from the start of the strtab table section data to the start of the symbol's name string.

The symbols module

Your first task is to write code to extract the function symbols from an ELF file. Both namelist and crash reporter build on this functionality, thus you will put this code into a shared module that is compiled into both. You are to design the public interface of the symbols module. The goal of any interface is to provide useful functionality via routines that are sensibly designed and easy to use. A client should be able to make a simple request to get the desired information which is returned in a tidy package. Your interface should have the necessary flexibility to support its two known clients (namelist and crash reporter), but you don't need to anticipate needs of other potential clients in an attempt to predict the future.

Just because the internals of the symbols module have to deal with ELF in its ugly, native format doesn't mean it should return data to the clients in this raw state. The symbols module can, and should, abstract away the goopy details and provide the requested information in a form that doesn't require the client to get entangled in the low-level technicalities of ELF.

The namelist program

The namelist program prints the function symbols from an object file. It is a simplified version of the standard nm utility.

Once you can reliably produce a list of function symbols and match an address to a symbol, you're ready for the fun job--- building on this work to create a crash reporter!

The libreporter library

When a program crashes outside the debugger, it leaves few clues to follow up on. If the bug is reproducible, running again under gdb can get you a backtrace. But what about those elusive bugs that aren't repeatable or only occur outside of gdb? A crash reporter can provide critical information about a crash without requiring a debugger.

You are to write the libreporter library that provides a crash reporter tool that can be linked into any program. Once initialized, the crash reporter monitors the executing program and on fatal error, it intercedes to produce a symbolic backtrace of the runtime stack before terminating. How does the crash reporter work?

  1. When initialized, it harvests the list of function symbols/addresses using the symbols module and tucks them away for later use.
  2. Also at initialization, it registers a callback function to be executed on a fatal error. This is done with a signal handler (more on that in a bit).
  3. If a fatal error occurs, the callback function is invoked. It examines the runtime stack by traversing backwards from current frame to previous frames. The return address in each stack frame is mapped to its symbolic name.

When an exceptional event occurs (memory access violation, divide by zero, etc.), the kernel sends the process a signal. The default action for fatal signals is to terminate the program. Alternatively, you can register a function as the callback (called a signal handler) to instead process the signal. That callback might attempt error recovery, do cleanup, give information to the user, and so on. Our starting code in reporter.c shows the boilerplate code required to set up a signal handler. If you're curious, you can read more about signals from the man page for sigaction and in section 8.5 of Bryant and O'Hallaron.

The crash reporter signal handler will register for fatal signals that prints information about the event (signal number/name, faulting instruction) along with a stack backtrace. A sample crash report looks like this:

Program received signal 11 (Segmentation fault)
Faulting instruction at [0x08048f00] crash_here (+0x10)
[0xf778d410] <unknown>
[0x08048f52] dinky (+0x2c)
[0x08048f17] winky (+0x12)
[0x08048f24] pinky (+0xb)
[0x08048f3f] dinky (+0x19)
[0x08048f66] binky (+0x12)
[0x08048f78] main (+0x10)

Here is what is expected from your libreporter:

Summary of project

Grading

Background information on how we grade assignments.

Functionality (100 points)

Code quality (buckets weighted to contribute ~20 points)

Here we will read and evaluate your code in areas such as:

Getting started

Check out a copy of the starter project from your cs107 repository using the command

hg clone /afs/ir/class/cs107/repos/assign6/$USER assign6

The assign6 samples directory is linked in your repo as slink and includes a sample namelist program and solution version of the crash reporter library.

Finishing

There is a sanity check that verifies output conformance of both namelist and crash reporter. Be sure that all debugging print statements are removed/disabled as extraneous output can interfere with the autotester. When finished, submit your code for grading. If needed, be sure to familiarize yourself with our late policy.