Assignment 0: High-stakes testing
Due date: Mon Jan 12 11:59 pm - Hard deadline: Mon Jan 12 11:59 pm
Assignment by Julie Zelenski
Completing this assignment will give you valuable practice with
- working in the unix environment (navigating the filesystem, executing programs, editing files, using Mercurial and Valgrind)
- devising strategies for testing and creating test inputs
- writing a high-quality bug report
Before we put you to work coding your first C program, this small exercise will acclimate you to the unix environment and bring out your inner software testing ninja. The assigned task is to make observations of a program's behavior relative to its specification and determine which tasks it does correctly and which it bungles. The program you'll be testing will be your implementation task as the next assignment so your testing efforts now will set you up nicely for testing your own version next week.
The compiled program you are given is a good-faith attempt at meeting the requirements described in the Assignment 1 Reassemble writeup. The program is mostly correct, but has been maliciously seeded with four known flaws, one in each of these categories:
- Usage. The usage refers to how the program is invoked from the command-line. A program is expected to correctly respond to well-formed usage and gracefully reject incorrect usage with a helpful error message. There is a usage bug that the program doesn't handle in accordance with the requirements given in the writeup.
- Robustness to input format. The writeup details the format for fragment input and lists the specific malformations that must be handled. The program has a slip-up in reading the input that erroneously rejects a well-formed input or fumbles the handling of a malformed input.
- Logic. The writeup dictates how the merging is supposed to proceed. Due to an error in the logic used by the program to align/merge a single pair, there are inputs that fail to reassemble as expected, i.e., the output is incorrectly merged or the program behaves badly when attempting to process it.
- Memory/Valgrind. A correct program should get a clean bill of health when run under Valgrind---no memory errors, no memory leaks. The program has a hidden memory issue. You are looking for a situation in which the program is invoked properly on a well-formed input, reassembles into the correct output, and exits cleanly, all with no visible sign of error, yet this same scenario run under Valgrind reports a memory error or leak.
Your job is to relentlessly torture the program into exposing its four problems and then write up a concise bug report on each.
A plan of attack
- Begin by reading our advice on effective software testing, specifically the section on black-box testing. Then review the requirements in the Assignment 1 writeup, draft a testing plan of what behaviors to examine, and sketch out small inputs that cover the various cases.
- There is a sample executable at
/afs/ir/class/cs107/samples/assign0/reassemble_soln. This fully-correct solution demonstrates the expected output and models appropriate handling for error conditions. Comparing a program's behavior to that of the sample allows to you determine where it is out of spec. If ever in doubt about the expected behavior (e.g. should this input be rejected? what is the correctly merged output for these fragments? how long should the reassembly take?), use the sample solution as a guide.
- The bugs are designed so that you can reproduce them using an input of just a few fragments. The particular trigger for the bug may also exist in some/all of the larger sample files, but it can be difficult to identify the pattern when looking only at large files. You'll narrow in on your issue much more easily by building up small examples than by breaking down large ones.
- Our sanity check tool automates the output comparison of your program to the sample. The default sanity check runs a few trivial cases, but there is also a custom option that allows you to provide your own test cases to use instead. Learn how to use the custom sanity check now! Custom sanity check is necessary to complete this assignment and a useful tool to have in your arsenal for all the subsequent assignments.
- It can be convenient to enter input via the terminal, aka
stdin (standard in). This allows you to run a quick test without the overhead of creating and saving the input to a file. Just invoke reassemble with no arguments, type the fragments directly at the terminal, and indicate you're finished by typing control-d.
Usage and robustness
- To clarify how to distinguish usage from robustness: usage concerns only and exactly how the program is invoked, i.e. properly validating that the user's command-line has appropriate arguments (in terms of number, sequence, values, and so on) that allow the program to get off and running. Once program execution is underway, robustness is the resilience of the program in detecting and handling unusual conditions during execution. For this assignment, the robustness concern is specific to reading the fragment input and validating that it is in the proper format.
- For usage, you'll need to experiment with various ways of invoking the program until the trouble comes to light.
- For robustness, you will need to try a assortment of correct and incorrect inputs to find the case which is mishandled.
- Note the writeup does not require error detection for a file containing too many fragments (> 20000)--- so this is not the robustness issue you are looking for.
- A crash or garbled output is unmistakable evidence of a problem, but some flaws are more subtle. A program that erroneously rejects a well-formed input or only offers the user crummy error messages needs fixing, too!
- For fatal conditions, a program is expected to do three things: (1) detect the issue, (2) give the user a clear message about the problem, and (3) gracefully exit. A buggy program could fail to detect the problem at all, another might detect it but misidentify the problem, and another might provide great message but forget to exit and stumble on to a crash. Botching any of the three is a problem.
- The specification does not require the wording of error messages to exactly match the sample, but any substitution should be at least as helpful and appropriate. If a program mismatches the sample, it might be harmless variation, but if the feedback or handling of errors seems noticeably weaker than the solution's, that's definitely a sign you're onto something.
- The logic bug affects the process of aligning/merging fragments. The program is properly invoked (no usage problem) and the input is well-formed and correctly read (no robustness problem), yet the reassembled output doesn't match the expected or exhibits a runtime misbehavior (crash, hang, premature exit, etc.) during reassembly. This indicates a problem in the program's logic.
- The complexity of the code in the align/merge step is ripe with opportunities for the logic to be a little off. You'll likely find this bug to be the most challenging one to work through.
- The planted error is in aligning/merging a single pair and the pattern is most easy to discern from small inputs. Brainstorm the variety of ways two fragments align and flesh out your test repertoire with inputs of just two fragments. A somewhat more roundabout strategy is to start with complex files that exhibit problems (such as a large sample file) and repeatedly cut it down and re-test until you have narrowed in on the pair of fragments at the heart of the problem.
- It may help to theorize about how the code might have been written and visualize the effect of common mistakes (such as off-by-one loops) to expose the pattern.
- Be sure to first read the CS107 guide to Valgrind. Valgrind will be your go-to tool this quarter for help resolving known memory bugs, but plays another important role for testing: to uncover memory problems you don't yet know you have. The flaw that we seeded is one of those those "lucky" asymptomatic ones which doesn't draw attention when executing, but lurks nonetheless. With the help of Valgrind, you will be able to ferret it out.
- The Valgrind problem affects a proper program invocation (no usage issue) on a well-formed input that is correctly read (no robustness issue) that reassembles into the correct result (no logic issue). Despite the program seeming to have run perfectly, the Valgrind report will show memory errors or leaks. This is the Valgrind problem you seek. It might affect all inputs or be selectively triggered.
- Note that runs that encounter a fatal error are allowed to exit without cleaning up memory and other runtime error symptoms may be memory-related, so don't look at error-causing or buggy inputs for your Valgrind issue. It will be on a completely correct run with no visible execution error.
- A program can take several times longer to run under Valgrind than without it, that slowness is expected and not a sign of a problem.
As a black-box tester working only from a Valgrind report, it's difficult to have much insight into a memory problem. You are not expected to speculate about the root cause or what might fix it. Instead, we ask your memory bug report to provide these two observations:
- Does the problem occur on all valid inputs or just certain varieties?
- Does the size or number of leaks/errors reported seem to meaningfully relate to the size of the input file or does it appear constant or even random?
- We've randomized the bugs seeded in the programs, so each of you has your own custom puzzle to untangle.
- The flaws we've created are representative of ordinary coding errors/oversights and not designed to be intentionally obscure. For example, a logic error could affect every containing match, but we would not insert a bug that triggers on one weird isolated case like merging a string of length 17 with another that contains no 'z' characters. The underlying pattern will be straightforward to describe.
- A bug report should narrow the bug to its most specific form and generalize the pattern. Rather than stop at concluding "here are a few inputs that fail", you are to more definitively identify the nature of the problem. Experiment on related cases to identify patterns and isolate the critical feature that distinguishes the non-working cases from the working. Is the bug due to something unusual in the file's format? a peculiarity in one specific fragment? a sensitivity to the order of the fragments within the file? hitting some unexpected size limit on the input or output? or something else entirely? You only need to provide one specific example input that triggers the bug, but your bug report should broadly describe how to create infinitely many more such inputs by describing the feature(s) they have in common.
- We planted only one bug per category, so if you are seeing more than one strange behavior, look for a larger pattern that encompasses both.
- As a rule, any mismatch in the behavior/output of your program compared to the sample solution is a starting point for further investigation. Although the discrepancy might be an equivalent re-wording of a message or harmless difference in tie-breaking, a deviation that is a turn for the worse (e.g. crashing instead of a clean exit, an unhelpful/inappropriate error message) is definitely considered a problem.
Readme and custom test files
The project directory contains two text files
custom_tests where you are to document the results of your testing efforts. Edit
readme.txt to include a concise bug report for each category. A bug report should identify the underlying pattern of the bug and document how to reproduce it with a sequence of steps and/or a minimal input. The pattern is best stated as a sentence that lists the specific condition(s) that trigger the bug and the undesirable outcome, e.g. "On any input file whose reassembled result is longer than 1000, the program goes into an infinite loop" or "Valgrind reports an invalid read error. It happens on all inputs and the count of errors is roughly equal to number of fragments." Assume the audience is the original software developer to whom you are reporting the key facts that allow the developer to reproduce the problem. Document the specific, minimal trigger that reproduces the bad behavior without additional long-winded commentary. Do not reiterate information of which the developer is already well-aware (e.g. no need to repeat program specification, make arguments for why a crash is a bad thing, and so on).
Each bug report can (and should) be documented in 1-3 sentences. The shell command
wc -w readme.txt will count the total number of words in your readme file. Four solid and concise reports can total under 250 words altogether -- aim for your readme to be that tight and earn the gratitude and approbation of your grader :-)
custom_tests file is for use with sanity check and should be edited to list your test cases (where possible). When you have created your own fragment files to reproduce your bugs, be sure to add them to your repository (using
hg add filename) so they will be included with your submission. You can refer to these files in your readme.txt as the reproducible test cases. Note that custom sanity check does not support testing under Valgrind, so your memory test need not be listed in
The text files are managed under Mercurial revision control. After making edits, make a commit to record a snapshot for your revision history. After your final commit, use the submit script to send your work to us for grading. Please note that commit and submit are two different steps and both are essential!
The bug report for each category is worth 6 points, for an assignment total of 24 points. For full credit, we require a concise and accurate bug report in the readme.txt file and where possible/appropriate, a reproducible minimized test case added to custom_tests and submitted with the repo. If you can't nail down a particular bug, you can earn partial credit for describing your efforts and what you learned thus far. For example, providing one example input to trigger the bug but without identifying the broader underlying pattern is worth about half credit.
The assign0 project contains a compiled
reassemble program and the two text files
custom_tests. The project is distributed as a Mercurial repository. Clone the starter project from your cs107 repository using the command
hg clone /afs/ir/class/cs107/repos/assign0/$USER assign0
The $USER shell variable automatically expands to your sunet id. If you find there is not a repo for your sunet, this means you were not registered for the course when the repos were distributed, please register asap and send email to cs107@cs asking us to create your repo.
/afs/ir/class/cs107/samples/assign0 contains a correct (non-buggy!) reassemble solution and some sample fragment files. You can access these via their full path (e.g.
/afs/ir/class/cs107/samples/assign0/reassemble_soln), but it is a bit unwieldy. Your repo has
slink, which is a symbolic link to this shared samples directory. You can use it to more easily refer to those files (e.g.
slink/reassemble_soln) and avoid retyping the full path. However, since it is a link it has some peculiar qualities--- you do not have permissions to edit within the samples directory and that if you cd into slink, the parent (..) directory will not refer to your repo, but to the true parent instead.
Once you have cloned, immediately make and commit a simple change to verify that your environment is properly configured:
- Edit the readme.txt file to add your name, save changes
- Commit that change
Now you're good to start bug hunting! If your commit is not successful, resolve the issue before going any further: Mercurial guide, forum/email, office hours, ...
Your final step is to submit your work for grading. Remember that submit is distinct from commit. You commit as you are working to take snapshots of your progress and submit when finished to send your work to the staff to be graded. We recommend you do a trial submit well in advance of the deadline to familiarize yourself with the process and allow time to work through any snags. You can replace that submission with a subsequent one if desired.
You may not use any late days on this assignment. The deadline is absolutely firm and no late submissions will be accepted. Don't miss this chance to snap up some quick points and start your quarter off right!
Frequently asked questions about assign0
When I try to run the reassemble program in my directory, it says "command not found". What can I do?
This classic question has plagued every newcomer to Unix since the beginning of the epoch. The solution is in our frequently asked questions about unix.
Am I responsible for identifying the root flaw in the code is or suggesting how the developer might fix it?
No. As a black-box outsider, you aren't privy to the necessary information to make this kind of judgment. When looking for the bug, it may be helpful to speculate about possible bugs and how they would surface (e.g. an off-by-one loop bound) but the bug report you submit needs only to identify the observed pattern and provide the reproducible test input.