Assignment 4: A bit of fun

Due date: Mon Feb 09 11:59 pm - Hard deadline: Fri Feb 13 11:59 pm

Assignment written by Julie Zelenski
Thanks to John Regehr (Utah) for saturating arithmetic and Michael Chang for UTF-8

[Quick Links: Implementation details, Advice page, Grading]

Learning goals

During this assignment, you will gain practice in

  1. using the C bitwise operators to manipulate binary data
  2. dissecting the bitwise data representation of various types (ints, floats, instructions)
  3. working within/around the limits of integer and floating point representation

The assignment

Your next assignment consists of a programming "problem set". The material we've been covering recently spans a diverse set of topics that don't easily lend themselves to being aggregated into one combined program, so instead this is a set of small exercises that individually explore topics such as bitwise operators, bit vectors, integer arithmetic, floating point representation, and instruction encoding.

For each exercise below, you are asked to write a function or two to accomplish a specific task. Each of the functions requires just a handful of lines (the disassembler is a bit longer), but don't be fooled! The code may be short, but it can be dense and will require thorough testing to ferret out any bugs that might be lurking in the edge cases.

1. 2 bits, 4 bits, 8 bits, a short...

Your first task is to use bitwise operations to dissect two integers into their component bits to compare the count of "on" bits between the two values. The cmp_bits function is given two integer values a and b. It returns a negative result if the bitwise representation of a has fewer 1s than b, a positive result if a has more 1s than b, and zero if both have the same number of 1s. It should return a correct result for any possible pair of signed integer inputs.

int cmp_bits(int a, int b)

2. Sudoku sets

Treating an unsigned type as a bit vector can compactly represent and efficiently manipulate a set over a given range. Bit vectors are used in time and space-constrained situations to provide very dense storage and allow fast manipulation of the entire group of bits in one unit. In a bit vector, each individual bit is designated to represent a particular value and a bit is turned on to indicate its corresponding value is contained in the set. In this context of this problem, we are representing sets drawn from the range [1-9]. A set is represented using an unsigned short (16 bits). The bit at the nth position (counting from least significant bit at the 0th position) is designated to represent the value n. The bits in the positions 1 to 9 represent the set membership and the other seven bits of the short are ignored. Below is a few example sets and their corresponding bit vectors. The bit positions 1 to 9 are shown in blue to distinguish them from the unused bits around them.

0000000000000000           // {}
0000001010100100           // {2 5 7 9}
0000000000001110           // {1 2 3}

Write the make_set function to create a bit vector set from an array. The arguments to the function are an array of integer values and the array count. The function returns an unsigned short that represents a bit vector set of the values from the array. The bits in positions 1-9 of the returned result mark the set membership, the remaining bits are zeros. You may assume that no value in the array will be outside the range 1-9.

unsigned short make_set(int values[], int nvalues)

Now write the is_single function to manipulate these bit vector sets in efficiently computing a result needed when solving a Sudoku puzzle. The goal of Sudoku is to assign every cell in the grid a digit from 1 to 9 subject to the constraint that each digit can be used only once in each row, column, and block. (You can read about Sudoku in wikipedia if you're curious, but for the purposes of this problem, you don't need to know any more details). The is_single function is given three well-formed bit vector sets representing the digits already used in the row, column, and block for a cell. ("well-formed" means that the short will have bits on only in positions 1-9, remaining bits are zeros). The possibilities for a cell consist of only those digits that are unused; any digit that is already used in the row, column, or block is not a possibility. The function is_single returns true if the already used digits force only one option for this cell (number of possibilities is exactly 1) and false otherwise (i.e. number of possibilities is 0 or 2 or more).

bool is_single(unsigned short used_in_row, unsigned short used_in_col, unsigned short used_in_block)

The ideal implementation of is_single uses only straight-line code (i.e., no loops or conditionals) and bitwise operators (~ & ^ | << >>) with very limited use of the integer arithmetic/relational operators. If you can't quite achieve the ideal, a working solution that is unnecessarily awkward/inefficient can still earn most of the credit.

3. Unicode

A single byte character maps to one of 256 entries in the ASCII set, which covers only the basic roman alphabet, digits, and punctuation. The international-friendly Unicode standard is a broader representation of characters that allows glyphs from a variety of world languages. Each Unicode character is called a code point and there are about a million of them -- including wacky characters you never even knew existed! UTF-8 is commonly used to represent (or encode) a Unicode code point. UTF-8 is a variable-width encoding; some code points are represented in just one byte, others in two or three. One neat feature of UTF-8 encoding is that it is designed to overlay onto ASCII; the lower 128 ASCII characters use the same single byte representation, so those characters are completely backwards-compatible with systems that only know about ASCII. For this problem, you will write the to_utf8 function that takes a Unicode code point and construct its UTF-8 encoding.

void to_utf8(unsigned short code_point, unsigned char buf[])

A Unicode code point is a number, the UTF-8 encoding for a code point is a sequence of one, two, or three bytes that represent that code point. The function to_utf8 has two parameters; an unsigned short code_point (for our purposes we will only consider code points of 16 bits or fewer) and a character array buf. The function converts the given code point into its UTF-8 representation and writes it as a null-terminated sequence of bytes into buf. The buf given by the client is guaranteed to have enough space (up to 4 bytes, including the '') to hold the UTF-8 encoded result.

We define the bit-count of a value as the number of bits required to represent that value in binary. For example, the bit-count of 5 is 3, the bit-count of 256 is 9. The bit-count of the code point dictates whether its UTF-8 encoding uses one, two, or three bytes according to these rules:

The table below demonstrates a few examples. In Unicode, the notation U+NNNN identifies the code point whose value is hex 0xNNNN. In the UTF-8 encoded result, the dark gray bits are the high-order bits that introduce the leading and continuation bytes, and the red, blue, and green bit sequences are taken from the binary representation of the code point and divided among the output bytes according to the rules above.

Unicode UTF-8
Character Hex Binary (bit-count in parens) Hex Binary
A U+0041 00000000 01000001 (7) 41 01000001
Þ U+00DE 00000000 11011110 (8) c3 9e 11000011 10011110
Ւ U+0552 00000101 01010010 (11) d5 92 11010101 10010010
U+221C 00100010 00011100 (14) e2 88 9c 11100010 10001000 10011100

4. Saturating arithmetic

Saturating arithmetic operations are those where results that overflow or underflow "stick" at the maximum or minimum representable result instead of wrapping around like the usual modular arithmetic operations. If you are using saturating addition and subtraction on signed integers, (INT_MAX + 1) == INT_MAX and (INT_MIN - 1) == INT_MIN. Saturating arithmetic is a common feature in 3D graphics and digital signal processing applications. A saturating addition function takes values a and b and returns their sum a+b if it doesn't overflow/underflow the representable range or otherwise returns the appropriate "sticky" value. You are to write two variants of saturating addition: one that operates on unsigned values, another on signed.

stype sat_add_signed(stype a, stype b)
utype sat_add_unsigned(utype a, utype b)

Note that the functions are written in terms of stype/utype which are placeholders that can be filled by any integer family type. The bits.h header file has a #define that controls which type from the integer family (char, short, int, long, or long long) is chosen for stype/utype. Your implementation of the saturating operators must work independently of the bitwidth, using a single unified sequence of operations that works correctly regardless of which type is chosen. This portability requirement will lead you to think through how promotion/truncation operates between the integer-family types.

You are required to execute the exact same code for all bitwidths, which means you should not use any sort of if/switch that divides into cases for the different bitwidths. This doesn't mean that you can't use conditional logic (such as to separately handle overflow or non-overflow cases), but conditionals that dispatch based on the type/sizeof of operands are totally off-limits.

5. The wrath of Kahan

This problem examines computing the average of a float array. Using a straightforward loop to sum the values and divide sum by count seems like it should do the job, but the subtleties of floating point makes even simple code challenging to analyze. A naive average seems transparently "correct" but in fact, it will not compute a correct answer for all inputs. The possibility of overflow (magnitude too large for range), underflow (magnitude too small for range), loss of precision due to rounding error in addition, and loss of significant digits in subtraction all come with the territory in digital floating point and these quirks can individually and collectively interfere to make even ordinary calculations go astray.

Ideally, an average function would compute the correct result for all inputs. In pure mathematics, a sequence of real numbers has one true average. On a computer, the best we can hope for is the nearest representable float to that true average, so that's actually where we set the standard. If a given computation produces that nearest representable float, we will call that a "correct" average. The true average of {1, 1, 0, 0, 0} is .4, but as noted in lab, .4 is a repeating binary decimal. Its nearest representable float is 0.400000006, so this is considered the "correct" result. Note that the inputs being averaged are also subject to representation limits. The true average of {16777217, -1} would be 8388608. But the value 16777217 will be rounded in assignment (in lab, we learned its nearest representable float is 16777216), and this rounding happened before average even starts executing. Given the sequence being averaged is actually {16777216, -1}, the "correct" result is 8388607.5. These limits of finite representation are unavoidable and there is nothing algorithmic that can be done to work around then. However, what we can attack are approaches that miss the goal entirely-- the result computed is not the nearest representable float, sometimes not at all close! These flaws come from vulnerabilities in its use of floating point calculations, which can be reorganized for better outcomes.

An average function operates on a non-empty array of floats. The given code has four variants of average: one straightforward "naive" version and three "improved" variants A, B, and C. Each improved version makes an intentional rearrangement of operations to avoid a specific floating point pitfall present in the naive version. The reformulations are equivalent according to the rules of pure math, but in the floating point world, these simple changes can and do affect the outcome for certain inputs. The improved versions succeed somewhat on their good intentions by fixing some outcomes that were previously incorrect, yet each still has inputs for which it is incorrect.

This problem is formulated like a lab exercise. Start by studying the given code and comparing the four versions to understand their differences. For each improved version, work through why its modifications were made -- what weakness is it trying to address in the naive version? How is the computation changed? For what inputs does this result in a changed outcome? Sketch on paper the effect on various inputs and then write some test code to verify your hypotheses. Experiment and make observations until you can fully identify what improvement is offered by each version as well as what vulnerabilities still remain. From the results of your experimentation, you are to provide targeted input sequences to demonstrate the strengths and weaknesses of the various versions. The four required input sequences are:

You are to implement the body of the fill_sequence function to fill its pass-by-reference parameter with a sequence. The which argument indicates the number of sequence being requested (1, 2, 3, or 4). A sequence must be non-empty and all values in the sequence must be non-exceptional floats. The final entry in the sequence should be followed by INFINITY, this is used as the sequence terminator. Each value in the sequence value should be exactly representable as float (i.e. do not use values which will be rounded on assignment).

void fill_sequence(Sequence *pseq, int which)

An important learning goal is arriving at an understanding of the floating point limitations. You should be able to reason about errors that arise in floating point calculations and recognize the inherent tradeoffs that come in trying to work around them. Your code for fill_sequence must be accompanied a thorough comment that documents the reasoning for each of your sequences. A valid justification is not simply "I found these values and they seem to work"; instead you are to explain how you constructed the sequence, why it produces the desired effect, and demonstrate your understanding of the salient floating point facts. The grading for this question will value the thoughtfulness and thoroughness of your documentation more than perfection in results.

6. A beautiful day in the neighborhood

The epsilon of a given float is defined as the absolute value of the distance from the float to its nearest neighbor on the floating point number line. This measurement of the "gap" between neighboring floats is sometimes referred to as its epsilon. The epsilon function returns the value of epsilon for a given float. The code we give you for epsilon extracts the raw bits from the float and passes them off to a helper function epsilon_bitwise. Your job is to implement epsilon_bitwise to compute the value of epsilon solely in terms of bitwise manipulation on the underlying float bits.

float epsilon(float f)
unsigned int epsilon_bitwise(unsigned int floatbits)

A few issues to note:

7. Some disassembly required

Your final task is to pick apart a binary-encoded IA32 machine instruction and print its human-readable assembly equivalent. This is the job of a disassembler (the inverse of the assembler), such as the objdump tool or the gdb disassemble command. The instructions you must decode are the five variants of the pushl instruction shown in the table below.

Disassembled instruction Variant (what is pushed?) Binary-encoded machine instruction (num bytes in parens) In hex
pushl $0x3f10 immediate 4-byte constant 01101000 00010000 00111111 00000000 00000000 (5) 68 10 3f 00 00
pushl %ebp register 01010101 (1) 55
pushl (%edx) indirect 11111111 00110010 (2) ff 32
pushl 0x8(%eax) indirect with displacement 11111111 01110000 00001000 (3) ff 70 08
pushl 0xff(%ebp,%ecx,4) indirect with displacement and scaled-index 11111111 01110100 10001101 11111111 (4) ff 74 8d ff

IA32 uses a variable-length encoding for machine instructions; some instructions are encoded as a single byte, and others require more than a dozen bytes. The pushl instructions vary from 1 to 5 bytes in length. The first byte of a machine instruction contains the opcode which indicates the type of instruction. The opcode is followed by 0 or more bytes, the number depends on the variant of the instruction. These additional bytes encode the instruction operands, such as which register or constant value to push.

The table above lists various pushl assembly instructions each with its binary-encoded machine equivalent. To interpret the binary encoding:

void disassemble(const unsigned char *raw_instr)

The disassemble function takes a pointer to a sequence of raw bytes that represent a single machine instruction. How many bytes are used by the instruction is figured out during disassembling. The idea is to read the first byte(s) and use the opcode and variant to determine how many additional bytes need to be examined. Call our provided helper function print_hex_bytes to print the raw bytes, then use bitwise manipulation to extract and convert the operands and print the assembly language instruction. For constants (immediate or displacement), use the printf format %#x to show hex digits prefixed with 0x and no leading zeros. Note that the instruction ispushl-- that suffix is l as in long, not the number 1.

As an example, if the disassemble function were passed a pointer to the bytes for the final instruction in the table above, it would print this line of output:

ff 74 8d ff    pushl 0xff(%ebp,%ecx,4)

We will only test your function on well-formed instructions of the five pushl variants listed above. There is no requirement that you gracefully handle any other inputs, such as other variants of push or other instruction types. For slightly obscure reasons, there is a slight anomaly in the encoding used when %ebp or %esp is the base register in indirect/indirect-with-displacement or %esp as the index register for scaled-index. You do not need to allow for this nor make a special case for it, just disassemble as though the standard encoding is used for all registers without exceptions.



Read our page on how assignments are graded for background information on our grading process and policies.

Functionality (~90 points)

Code quality (buckets weighted to contribute ~20 points)

We will prefer solutions that are direct, efficient, and readable to those that sacrifice one of these qualities for the others.

Getting started

The assign4 project contains the source file, the skeleton test program, and a Makefile. Check out the starter project from your cs107 repo using the command

hg clone /afs/ir/class/cs107/repos/assign4/$USER assign4

Note that these exercises are similar in spirit to the partnered explorations you do in lab, but the collaboration rules for assignments are much more restrictive than for lab. It is allowable to discuss high-level concepts with your peers, but these discussions must not descend into assignment-level specifics. When working on an assignment, you should be doing your own independent thinking, design, coding, and debugging.


There is a trivial sanity check for this assignment that merely verifies that one simple call to each routine and checks that the output printed by your disassemble operations is in the proper format for automated testing. Be sure that all other print statements in the required functions are removed/disabled. Any extraneous output can interfere with the autotester and cause test failures. When finished, use submit to turn in your code for grading. You may use late days on this assignment, but we don't recommend it as you don't want to eat into your time to prepare for Friday's midterm!