Web

Assignment 7: Heap allocator

Due date: Wed Mar 11 11:59 pm - Hard deadline: Fri Mar 13 11:59 pm

Assignment by Julie Zelenski, based on original assignment given by Randal Bryant & David O'Hallaron (CMU)

[Quick Links: Implementation details, Advice page, Grading]

Learning goals

In completing this assignment, you will:

study the internals of the heap allocator
experiment with the process of evaluating and choosing among alternative designs
apply optimization techniques to find and mitigate runtime/memory inefficiencies
explore performance trade-offs and weigh improvements against the complexity of code required to achieve them
bring together all of your CS107 skills and knowledge into a satisfying capstone experience!

Your assignment

You are to write an allocator that manages the heap and implements the dynamic allocation functions malloc, realloc, and free. The allocator grabs a large memory segment using the low-level OS memory routines and parcels out this segment to service dynamic memory requests. Your code will manage the heap segment, track the available/in-use space, and update your heap data structures in response to requests. The key goals for the allocator include:

it must correctly service any combination of well-formed requests
it makes compact use of memory (i.e. densely-packed, low fragmentation)
it runs fast

The starter code contains a sample allocator. This trivial allocator is largely correct but it is heinously inefficient. Your goal is to replace this allocator with one that uses intelligent data structures and efficient algorithms. The challenge is to strike a balance between tight memory utilization and high throughput, while maintaining correctness in all situations.

Just three little functions? How hard can that be? :-) It is possible to write an efficient allocator with a couple hundred lines of code. However, it will require your most skilled and sophisticated coding and you may end up more proud of those three functions than any code you've written heretofore. What a great way to finish off an awesome quarter!

For this assignment, you have the option of working in a team of two or working individually. The project requirements are the same either way. Working in a team has advantages (two heads are better than one) and disadvantages (scheduling, coordinating, arguing). Choosing to partner won't cut your work in half, but the collaboration may contribute to a higher-quality result and better mastery of the material. A team makes one submission and both partners receive the same score. Read our advice for successful partnership.

Implementation details

Here are the specific requirements for the allocator:

The interface should match the standard libc allocator. Carefully read the malloc man pages to learn what constitutes a well-formed request and the expected allocator behavior. You may ignore the esoteric details from the NOTES section of the man page.
There is no requirement on how you handle improper requests. If asked to realloc a non-malloced pointer, free an already freed pointer, or a client overruns an allocated block, your response can be anything, including crashing or corrupting the heap.
Your routines are named mymalloc, myrealloc, and myfree to avoid clashes with the libc version. You will also implement the myinit function that configures the initial state and the validate_heap debugging function that is used to verify internal heap consistency.
An allocated block must be at least as large as requested, but it is not required to be exactly that size. Every allocated block must be aligned to an address that is a multiple of the ALIGNMENT constant (8). A payload address could be 0xa5b8 or 0xa5b0, but not 0xa5b4 or 0xa5b1. The alignment requirement applies to the address of the payload; whatever headers/footers you have placed around it are your own business.
The theoretical maximum size request for a 32-bit address space is 4GB, but in practice, the max size will be constrained by other in-use segments. Your design can plan that payload size tops out at 1GB. This assumption may help you squeeze out a critical few extra bits in your block housekeeping.
Your allocator will manage a large contiguous heap segment that it parcels out to satisfy allocation requests. We provide routines that wrap the low-level calls and track the heap segment in large page-sized chunks. Read segment.h in the starter code for more information.
You should not invoke any memory-management functions other than those provided in segment.h. By "memory-management" we specifically mean those operations that allocate or deallocate memory, so absolutely no calls to malloc, realloc, free, calloc, sbrk, brk, mmap, or related variants. The use of other library functions is fine, e.g. memmove and memset can be used as they do not allocate or deallocate memory.
You may use a small amount of static global data in your allocator, limited to at most 150 bytes. This dictates the bulk of the heap housekeeping must be stored within the heap segment itself. There is no constraint on what you store (i.e. no requirement to use or not use block header/footers, an explicit free list, a hashtable, and so on) but there is a condition on where you store it (i.e. at most 150 bytes of global data; any additional housekeeping must be stored within your heap segment).
Your allocator should include an implementation for the validate_heap function. This function is called from our test harness between requests to verify the internal consistency of the heap data structures. This debugging hook is intended to help surface problems in your heap sooner rather than later. Note that validate_heap will not be called during any performance testing, so there is no concern about its efficiency (or lack thereof).
After perfecting the correctness of your allocator, your goal is make it highly efficient in utilization and throughput. Utilization is the ratio of the total payload bytes allocated (and not yet freed) to the total heap segment size. For example, if 100 blocks of size 32 have been allocated and the heap segment is currently the size of one page (4K), the utilization will be (32*100)/4096 = 78%. Throughput counts the number of requests serviced per second. The throughput performance is reported as a percentage relative to a fixed target that is based on libc standard. The utilization ranges from 0 to 100%, the throughput from 0 to > 100%.
Your submitted project will include a readme.txt file that documents your design, how you chose its features, your efforts to optimize, and your evaluation of the end result.
Our code review for this assignment will involve less scrutiny, which is an appropriate segue for the end to CS107. Our TAs have expended untold hours this quarter to provide detailed feedback on your code, but subsequent systems classes tend to grade mostly/solely on functionality. The belief is that you will have internalized the lessons of CS106/107 and can apply your own internal compass. You may rejoice that can get away with all sorts of questionable practices when no one is looking closely, but we hope our efforts have helped convince you that writing clean code is truly its own reward in terms of easier testing, debugging, maintenance, and more.
Don't miss our page of hints and advice about this assignment!

Using the alloctest program and script files

You will need to do thorough testing of your allocator. An ordinary C program that makes targeted calls to the allocator functions is one means to test. The simple.c program in the starter project can be used in this way. The starter project also includes alloctest, a script-based test program. A script file is a text file containing a sequence of allocator requests. The alloctest program reads a script file, executes its requests, attempts to verify they were properly serviced, and measures allocator performance. A sample report on one script is shown below.

alloctest -f tiny1.script
Evaluating allocator on tiny1....done.

 script name     correct?    utilization   requests       secs        Kreq/sec
-------------------------------------------------------------------------------
tiny1                Y            17%           12       0.000018        665
-------------------------------------------------------------------------------

Aggregate 1 of 1                  17%           12       0.000018        665
17% (utilization) 6% (throughput, expressed relative to performance target)

Below is an overview of how alloctest operates. You can review the code in alloctest.c for further details.

If you run alloctest will no arguments, it will run all the scripts in the directory /afs/ir/class/cs107/samples/assign7. You can use the -f argument to change which scripts are being run. The -f argument is the path to a single script file or a directory of script files.
Script files
- A script file is a text file containing a sequence of allocator requests expressed in a simple domain-specific language created for this purpose. The alloctest program parses the script file and make calls to mymalloc and so on as requested in the script.
- The starter project contains a few tiny scripts. These scripts are much too small to produce meaningful performance results. They are intended only as a first step in early development and debugging. We will not test allocator performance on such small scripts and neither should you.
- The directory /afs/ir/class/cs107/samples/assign7 contains a collection of varied scripts. These scripts are best suited for testing the performance of an already debugged implementation as they are very large (thousands of requests). The ones named x-trace were constructed by tracing the allocation calls made by a running program, e.g. reassemble-trace traces our assignment 1. (For the curious: see man page for ltrace on how to snoop on a program's library calls.) These represent "real world" testing patterns. The scripts named x-pattern were mechanically generated via patterns such as alternately allocating a small and large chunk.
- You can make your own script files in order to help with early development or to experiment with different use cases or patterns for performance testing. Open one of the small script files in a text editor to see the file format.
By default, it runs each script first to verify correctness, then runs each script again to measure performance. Change the flags when invoking alloctest to selectively verify correctness (-c) or measure performance (-p). The default (no flags) is to do both.
When testing correctness, it has some simple checks to try to verify that the requests are being properly serviced. For example, it checks that each malloc/realloc call returns a pointer which appears valid (points to a block within the heap segment, start address is correctly aligned, and doesn't overlap any other currently in-use block). It writes a repeated byte to the payload of each allocated block (a number to identify the block). When a block is realloced/freed, it reads the payload to verify the contents have remained intact. It also makes a call to your validate_heap between each request. If any of these checks turn up trouble, it prints an error message to draw attention to the problem.
One easy-to-overlook detail of the alloctest program is that it calls the allocator's myinit function after executing one script and before starting another. Take care to properly implement the myinit function to wipe the slate clean, removing any previous heap contents and starting fresh, lest you encounter bugs due to ghost data left behind in the heap from the previous script.
When evaluating performance, it runs each script several times and counts cycles used. The alloctest uses the timer functions from B&O that count cycles by reading the processor's real-time clock and does repeat trials until the results converge to reduce measurement noise. It will do at least 3 trials, and up to as many as 20, until the measurements converge. It does no correctness checks while running the scripts to measure performance.

Grading

Background information on how we grade assignments. This assignment focuses on efficiency and the majority of the points will be awarded on the basis of performance trials.

Correctness (48 points)

Basic cases (26 points) Correct servicing of sequences of mixed requests. We will test on the provided simple program, published sample scripts, and additional tests/scripts. It is essential your program work reliably on correct allocator usage to earn these points.
Robustness (20 points) Handling of required unusual/edge conditions.
Clean compile (2 points) We expect your code to compile cleanly without warnings.

Performance (80 points)

We will measure the utilization and throughput of your allocator on a set of mixed scripts that are similar, but not identical, to the published samples. The those two measurements will be converted according to this scale:

U = utilization - 25. The full-credit benchmark U = 40 is reached at utilization >= 65%, up to 10 bonus is added for exceeding it.
T = throughput / 2. The full-credit benchmark T = 40 is reached at throughput >= 80%, up to 10 bonus is added for exceeding it.

The value for U and T will be a number in the range [0, 50]. Performance is scored using the formula points = 1.5*min(U,T) + .5*max(U,T). The performance points are not awarded on the straight sum U + T, but instead a blend that rewards a balanced optimization. The idea is to not go to extremes to optimize one at the expense of the other, but moderate between the two. 80 points is the full-credit benchmark, and you can earn up to 20 additional bonus points. Submissions with bugs that interfere with performance testing will earn points at a reduced rate, based on the performance formula applied to the limited scripts which operate correctly and a penalty adjustment for the failures.

Utilization is generally stable on the same inputs, but throughput varies with CPU/cache contention, so it is normal to see small ups and downs from run-to-run and even larger discrepancies when the system is under heavy load (use top or uptime to view activity/load). During grading, we will evaluate performance using an isolated quiescent host to reduce artifacts. All myths have identical software, but some have skimpier hardware (e.g. 4M L2 cache versus 6M, lower clock) which can tweak your results. You can view a host's specs in the file /proc/cpuinfo if you're curious. We will measure performance for all submissions on the same myth under the same conditions to ensure consistency in grading.

Your allocator's performance on the published samples is predictive of the measurement we will make for grading, but not an exact match. The goal is not to tune your allocator to perform at its best on exactly and only the samples, but instead develop a design that fares well in a wide variety of scenarios, for which the samples are representative possibilities. The scripts for performance grading will be selected from the samples and include a few new scripts of comparable scope/scale. If your allocator has consistent performance across most/all sample scripts, then taking a few out and replacing with others should result in only minor change in overall performance, but if you have wider variability script-to-script, then your allocator performance may experience a more significant swing when we mix things up in grading. You can estimate your worst-case outcome by calculating how your allocator would fare on a mix constructed from the published samples having removed your top few best performers and replacing with copies of your weakest performers.

Code quality (buckets weighted to contribute ~20 points)

We expect your allocator code to be clean and readable. We will look for descriptive names, helpful comments, and consistent layout. We expect your code to show thoughtful design and appropriate decomposition. Control flow should be clear and direct. We expect common code to be factored out and unified. Any tricky, complex tasks or expressions that are repeated deserve to be decomposed into shared helpers.
The code review will also assess the thoroughness and quality of your validate_heap routine.

Design documentation (32 points)

A number of points are allocated for the readme file where you will document your allocator design, along with your process and results. You are to address the issues below with clear, complete, and concise writing. The intended audience is a programmer-literate peer.

Overview. (8 points) Summarize the key implementation features (e.g. data structures and algorithms used).
Rationale. (8 points) Provide a brief justification of why you made those design choices (esp. in light of alternatives tried/considered).
Optimization. (8 points) Describe techniques/strategies you used to analyze performance and make improvements. Where appropriate, provide supporting data (timing results, callgrind counts and cache statistics, disassembly comparison, etc.) on your efforts.
Evaluation. (8 points) Give a realistic evaluation of the overall strengths/weaknesses of your final allocator.
References. You must include proper citation for any resources (people, books, web, etc.) that contributed to the design of your allocator.

We will evaluate your design document on its thoughtfulness and completeness in describing, analyzing, and evaluating your design and process. Readme credit is not tied to successful performance results. If your results are not what you'd hoped for, use the readme to tell us about it, i.e. the observed problems, what its root cause seems to be, what you tried that didn't pan out, what you learned in the process, and so on.

Getting started

Check out a copy of the starter project from your cs107 repository using the command

hg clone /afs/ir/class/cs107/repos/assign7/$USER assign7

The starter repo contains these files:

Makefile The Makefile builds two targets: simple and alloctest. You may edit the definition of ALLOCATOR_EXTRA_CFLAGS in the Makefile to configure the best build settings for your allocator, but should make no other changes to the Makefile.
allocator.c/.h The allocator.c module is where you will write your allocator implementation.
segment.c/.h The segment.c module provides the low-level memory allocator. You will use these functions in writing your allocator. Read the header file to see what is available. Do not edit these files.
fcyc.c/.h The fcyc.c module is the cycle-counting timer from Chapter 5 of the Bryant and O'Hallaron. It is used by alloctest to count cycles used by your allocator. Do not edit these files.
simple.c The simple.c module is a sample program that manipulates linked lists and strings using dynamic allocation. You can use/change/cannibalize this program for testing.
alloctest.c The alloctest.c module contains the script-driven test harness to evaluate your allocator. Do not edit this file.
script files The project contains a few scripts. The tiny scripts are suitable for very simple testing and serve as examples of the script file format. There is a collection of large scripts in the cs107/samples/assign7 directory for further testing and performance measurements. You can also write your own script files.
readme.txt This is a text file for you to writeup your design decisions and rationale. This text file is part of your Mercurial repository and changes to it should be committed. The file will be submitted along with your repo and evaluated in grading.

Special attention to the Honor Code

Before you begin coding, you'll want to consider a range of design options. The textbook contains good foundation material which you are encouraged to review as a starting point. You may also find it helpful to talk over implementation options and trade-offs with classmates. Your readme must properly cite any external resources or discussions that influenced your design. When it comes to code-level specifics, your allocator must be your independent work. Under no circumstances should you be studying code from outside sources nor incorporating such code. If your investigations accidentally stumble across allocator code, we expect you to immediately about-face from it and get back to designing and writing your own allocator. We have a zero-tolerance policy for submissions containing code that has been adopted/borrowed from others. Lacking citation, this act constitutes plagiarism and will handled as a violation of the Honor Code.

Finishing

When finished, don't forget to submit your files for grading. Only one partner makes the submission, not both. No late submissions will be accepted after the hard deadline (Sunday Dec 7th) Late days count individually per-partner.

It's definitely time to celebrate-- congratulations on an awesome quarter!