Best practices for completing CS107 assignments

Written by Julie Zelenski

We keep your pipeline filled with regular programming assignments, and most of your learning will come from the efforts you expend in completing them. We want to share some of the best practices that will help you achieve successful results on your programs with minimal loss of sanity.

Getting your "C legs"

Syntactically, C is not a large jump from Java or C++, but it has vast philosophical differences. Intended for professional use, C is designed for high efficiency and unrestricted programmer control, with no emphasis on safety and little support for high-level abstractions.

Be vigilant. Having used languages with aggressive compiler warning/errors or extensive runtime error-detection, you may find C's laissez-faire attitude downright shocking. A C compiler won't complain about such things as uninitialized variables, narrowing conversions, or functions that fail to return a needed value. C has no runtime error support, which means no helpful messages when your code accesses an array out of bounds or dereferences an invalid pointer, such errors compile and execute with surprising results. You must become vigilant about scouring your code for problems that you may have previously depended on the language to detect for you.
Standard libraries. The C standard library is relatively small; offering functions for input/output, primitive string handling, dynamic memory, and not much else. Appendix B of K&R describes the entire standard library and only requires 18 pages! Review that appendix and/or our CS107 guide to the C library so you are aware of what's available. Use the man pages for details of a particular function you need to use. Note the standard library embraces the philosophy of C and has little protection against misuse. Checking for errors would slow down all calls and many things are completely impossible to detect in C even if you wanted to (e.g. an out-of-bounds array reference). This means you can and will get strange errors if you make bad calls to the C library. When a backtrace shows a crash within code you didn't write (i.e. crashing inside the C library or in code we supplied), it is tempting to assume the problem is within the library code, but almost without exception, the real problem lies in a bad call made by you.
Memory and pointers. By far, the most difficult challenges in C come in mastering pointers and memory. We promise that your first-hand experience making mistakes, resolving difficult bugs, and late-night head-banging will eventually pay off. Take care to study the fundamentals: the relationship between arrays/pointers, how to unravel the various type declarations and use of * and &, knowing what data is in the stack and what's in the heap, and understanding how memory is laid out and accessed. Keep in mind that the observable effects of a memory error can come at a place and time far removed from the root cause (i.e. running off the end of a array may "work fine" until you later read the contents of a supposedly unrelated variable). gdb and Valgrind can be invaluable weapons against these kind of tough bugs.

Decomposition and incremental development

Our CS106 courses emphasize the value of good decomposition, yet the take-home for some students has been that decomposition is something you do when finished coding to appease your grader. Attempting to get the program working using a chaotic, sprawling main function and then spending even more time to decompose it afterward is utter foolishness. Decompose problems, not programs. Decomposition is not the frosting to spread on the finished cake, it's the tool that helps you get the job done. Starting with a good decomposition, you'll have an easier path through coding, testing, and debugging, and you'll spend less time at it.

Your initial work should be in design space: decomposing the problem from the top-down into sub-problems that you further decompose as needed. Sketch each function's role and have a rough idea of its inputs and outputs. A function should be designed to complete one well-defined task. If you can't describe the function's role in a sentence or two then maybe your function is doing too much and should be decomposed further. Commenting the function before you write the code may help you clarify your design (what the function does, what inputs it takes, and what outputs it produces, how it will be used). Pushing yourself to be specific now will force you to state your assumptions and resolve ambiguities earlier rather than later. Exploit opportunities for code unification, a sufficiently general function can handle multiple use cases within the program.

When ready to implement, write one function at a time and thoroughly test before moving on. To test, you might need to write code to specifically exercise the function (this may be dead-end code that is later discarded), create sample input files, and/or run under gdb and Valgrind to look for problems. Thorough testing gives you peace of mind that further functions can build on these pieces with confidence, rather than adding another floor on what amounts to a house of cards.

A corollary to this is my suggestion: always have a working program. Add features to your program one at a time, testing until complete, while verifying no regressions have been introduced. At a given point, your program may not cover all requirements, but the existing code is correct and can be verified as functional. That is vastly preferable to a program that attempts everything yet succeeds at nothing. It is much easier to extend a working but incomplete program than to fix a bug-riddled "complete" one.

Testing

Testing is an incredibly important skills for all programmers and we intend for you to become proficient at it. Testing is not something separated from programming, it is integral part of the development process. We provide some sample inputs and the simple sanity check to get your testing started, but these are intentionally insufficient. Your efforts are needed to brainstorm and identify additional cases that need to be tested, devise test inputs, and monitor your progress toward satisfying all test cases. Go check out our advice on software testing for a plentitude of ideas about testing tactics and strategies. We're especially keen on short-cycle test-driven development!

Debugging

Many students have up to now done all their debugging via print statements. That works for simple cases, but becomes unwieldy as programs get larger and have more complex interactions. Now is the time to invest in mastering the powerful tools provided by a debugger. During development, you may want to always run your program under gdb (CS107 guide to gdb), so that when the unexpected hits, you have the ability to poke around and get information to better understand the program state.

Successful debugging depends on a careful and systematic approach.

Observe the bug. If you never see the bug, you'll likely never fix it. Another reason you want comprehensive testing!
Create a reproducible input. Creating a trivial input that reliably induces the failure is a huge help.
Narrow the search space. Studying the entire program or tracing the execution line-by-line is generally not feasible. Some suggestions for how to narrow down your focus :
- Start where your intuition believes is the likely culprit, such as a function that recently changed or one you find suspicious.
- Use binary search to dissect. Set a breakpoint at the midpoint and poke around to determine whether the program state is already corrupt (which indicates the problem is in the front half) or looks good (so you need to focus your attention on the back half). Repeat to further narrow down.
- Run under Valgrind to identify the root cause of any lurking memory errors.
- Use gdb conditional breakpoints or watchpoints to identify the point where data is first noticed to be corrupt.
Analyze. With only a small amount of code under scrutiny, execution tracing becomes feasible. Use gdb to see what the facts (values of variables and flow of control) are telling you. Drawing pictures may help.
Devise and run experiments. Make inferences about the root cause and run experiments to validate your hypothesis. Iterate until you identify the root cause.
Modify code to squash bug. The fix should be validated by your experiments and passing the original failed test case. You should be able explain the series of facts, tests, and deductions which match the observed symptom to the root cause and the corrected code.

Do not change your code haphazardly. This is like a scientist who changes more than one variable at a time. It makes the observed behavior much more difficult to interpret, and tends to introduce new bugs. That said, if you find buggy code, even if it is not obviously related to the bug you are tracking, you still might want to make a detour to fix it, using a reproducible input to trigger that bug and validate its fix. That bug might be related to or obscuring the original bug and it's good to remove any source of potential interface.

Use Valgrind early and often

We run submissions under Valgrind during grading to report on memory errors and leaks. Some students have the impression that Valgrind is merely a final "double-check" on a finished program. Nothing could be farther from the truth. Doing regular Valgrind runs is an important part of your testing coverage. Valgrind reports on two types of memory issues: errors and leaks. Memory errors are toxic and should be found and fixed without delay. Valgrind can be a huge help with this. Memory leaks are of less concern and can be ignored early in development. Given that the wrong deallocation can wreak havoc, we recommend you write the initial code with all free() calls commented out. Much later, after having finished with the correct functionality and turning your attention to polishing, add in the free calls one at a time, run under Valgrind, and iterate until you verify complete and proper deallocation. (CS107 guide to Valgrind)

Mercurial: the power of undo

Using the revision control system may first seem more impediment than benefit. But the day you accidentally wipe out a critical file or make a last-minute change that introduces an evil bug, you will be eternally grateful for the "undo" capability provided by maintaining a revision history. Even without such catastrophes, revision control allows you to monitor your progress, review changes, try experiments that are easily backed out, and manage your efforts with more effective organization and less room for error. Adopt the habit to commit very regularly--- after making a critical fix, adding a new feature, when pausing for a snack break, and definitively at the end of each work session. Having this complete audit trial can also serve as an insurance policy should something go astray in your submission. We can grab a previous version from your history and confirm its provenance, but if there is nothing in your history to return to, you're stranded. Every serious project is managed under revision control or should be. (CS107 guide to Mercurial)

Write the high-quality version first (and only)

When faced with a challenging programming task, it can tempting to first hack together a low-quality solution where you use little/no decomposition, use one-letter identifiers, hard-code magic numbers, and copy-and-paste code, all in slapdash effort to get something working. After much iteration and debugging, you eventually get the functionality together, at which point you go back to clean up the decomposition, choose better names, unify common code, and so on to get the program up to the "A" level. This strategy has been tried quite a bit, and it doesn't work. It's easier and takes less time to write it at the "A" level right from the start. Well-decomposed, readable code is easier to write, easier to test, will have fewer bugs, and what bugs there are will be more isolated and easier to track down. Realistically, most of your development time is not going to be consumed by typing in long identifier names or writing two 10-line functions instead of a 20-line one. Write it once, and write it right. (Read Nick's awesome Landmarks in coding quality).

Although it may go without saying, the same reasoning justifies why you should strive to write functionally correct code from the get-go. When faced with a question (do I need to use calloc instead of malloc? do I need a +1 or -1 on this calculation? should this void * be cast to a char * or a char **?) you could make a quick guess and figure you'll find out later if it wasn't right. "Throwing code at the wall to see if it sticks" is not a effective strategy! ! If you happened to guess right, consider yourself lucky, but did you learn how to approach that decision so you can make the right choice in the future (say on the next assignment or the exam)? Worse, if your guess was wrong, how and when will you discover it? Maybe the bug will just lurk there undiscovered (until the autotester sniffs it out), or perhaps extensive testing/debugging/Valgrind will eventually lead you back to the incorrect passage, either way leads to much suffering. The truth is that the simplest and best time to get the code functionally correct is when you are writing for the first time. When the code you are writing involves something tricky, take the time to think it through -- draw some pictures, review the underlying concepts, ask questions about anything unresolved, whatever it takes, so that when you are writing that code, you understand the how/why of each step, you can accurately predict the behavior, and you feel strongly confident that it is correct. (But still test it... anyone can make a mistake!)

Healthy working habits

Start early. Why not the first day it goes out? You can start thinking about the problem right away, set some background processing in motion, and start iterating on design ideas. You get more days to spread the work over, more opportunities to come to office hours and ask forum questions, and will have built in more of a time cushion if you encounter any unexpected setbacks.
Have a strategy. After decomposing the problem, you have concrete tasks in the form of functions to write. Identify your goals for a session when sitting down to work. Monitor your progress and watch for ratholes -- if something is consuming much more time than anticipated, back away and rethink your strategy, get some help, or take a break.
Don't overdo it. CS lore suggests coding requires caffeine-fueled marathon all-nighters. I think programming does have "in the flow" aspects that can benefit from longer stretches of concentrated time, but working too long and too late is often of little value. Depending on what time of day and how tired/hungry you are, you will reach a point of diminishing returns, but it can be hard to recognize it. You're deep into it and the code is live in your mind, and it seems stopping will derail everything, but when you're tired, you're operating at low efficiency. More than once, I've debugged deep into the night, unwilling to stop because I am stubbornly (and wrongly) convinced that "I am so close". Yet the next day, I fix the problem in the first 10 minutes despite having to "start over", due to coming with a fresh energy. Learn to recognize when you're losing steam and how to refresh yourself.
Know when to get help. 107 has a lot of helpful resources, but it's up to you to recognize when you need help and of what kind. We have written materials on the web site, textbook readings, lectures, labs, the discussion forum, email help line, office hours, and more. Familiarize yourself with what we have to offer and don't be shy about taking advantage when you need it.
Stay positive. It's your choice how many units to take, what grades to aim for, what extracurriculars to add, whether to spend tonight in the cluster, at a party, or getting to sleep early. What is worth doing? How will it help your sense of achievement, your health, your happiness? Don't let the judgments or expectations of other people make decisions for you. Fill your life with people who respect your choices and provide positive support to you.