Written by Julie Zelenski
Here's our best advice on how to hit a home run with your CVector and CMap implementation!
cvec_replace
is a constant-time operation but if coded as a call to cvec_remove
followed by cvec_insert
it will require linear-time. If cmap_remove
first calls cmap_get
to verify existence of the key then searches again when ready to remove it, you've wastefully done the search twice. When use of other public functions is problematic in these ways, create private helper functions that factor out tasks in common to enable code-sharing without the negative consequences. The goal is to unify common code, whether public-public or public-private is the better option depends on context; reusing code via copy and paste doesn't count as sharing. :-)Incomplete types. Both CVector and CMap are defined as incomplete types. For example, in the cvector.h interface, the CVector type is declared as
typedef struct CVectorImplementation CVector; // this is from cvector.h
This typedef establishes the nickname CVector
as a a synonym for the type struct CVectorImplementation
. But nowhere in cvector.h is this struct defined! This makes CVector an incomplete type. The header only vaguely mention some struct by name, its true nature is deliberately not published to the client. The incomplete type prevents clients from declaring variables of CVector
(only CVector*
is ok), and disallows any attempt to dereference a CVector*
.
The type is completed in the implementation making its internal structure visible only to the implementer. At the top of cvector.c, you complete the type by providing your definition of struct CVectorImplementation like this:
struct CVectorImplementation { // this type definition is at top of cvector.c
int a, b; // include your desired fields in the struct
char c;
}; // don't forget to end with semi-colon!
The true nature of CVector is revealed and your implementation can now freely use CVector as a regular type without restrictions.
The .h files include typedefs for the cleanup and comparator functions used as client callbacks. Those convenient nicknames can be used as the declared type for a variable or a struct field that will store such a function pointer.
cvec_first,cvec_next
. The client retrieves the first element and passes previous to cvec_next
to retrieve the next. The implementation of cvec_next
has to orient itself from the previous to identify the next in sequence. At first, it may seem a bit mystical how you can do this, but don't forget you are the omniscient implementer and can take advantage of your knowledge of how the data is laid out internally to root around in memory to locate where you are in the iteration and access what you need. It may be helpful to know you can subtract two pointers to get the distance between them. Pointer subtraction includes the same implicit scaling as pointer addition.lfind
and bsearch
will come in handy. Students are sometimes confused as to why lfind
requires the client to pass the count of elements by reference -- I attribute it to a misguided desire to be parallel with lsearch
which is required to pass count by reference. It's not a big deal to comply with as a client, just a bit weird.cvec_create
, you use a default capacity based on your guess of the common use case. Vectors span the gamut of size from very small to large, so no one capacity is always best, but leaning toward small makes sense given it can grow on demand. Our solution uses 16.cmap_put
, cmap_get
, and cmap_remove
all do the same search to find a matching key, which suggests this code should be factored into a shared helper. Making the helper work for both put and get is straightforward, incorporating the needs of remove is a little more involved, but still worthwhile.cmap_first,cmap_next
operations allow the client to iterate over entries in the CMap. Just as with the CVector iterator, it can take a little thinking to see how you can orient from the previous key and access the key that follows, but keep in mind you are allowed to take advantage of your privileged knowledge of the CMap internals as necessary.cmap_first,cmap_next
. The public iterator functions are intended for client use and using them as the implementer turns out to be awkward, inappropriate, and can add gross inefficiency. It can also cause functional problems as the CMap iterator requires that the map contents are not changed during iteration, yet both rehash and dispose are definitely mucking with entries as they go. As the implementor of CMap, you should directly walk the internal structure during rehash and dispose -- this gives you clean and efficient access to all entries.void*
only if the pointee type is unknown or can vary. Conversely, don't declare what should be a void*
pointer with a specific pointee type. Declaring a void*
as a char*
misleads the reader into thinking this pointer is a C-string and allows it to be mis-used as such with no compiler warning.void*
is not the same as type void**
. The former is a pointer, but the pointee type is unknown. Without knowledge of the pointee type, a void*
requires a typecast before you can dereference it or apply pointer arithmetic to it. The latter is a pointer to a pointer. Thus a void**
has a known pointee type and you can dereference or apply pointer arithmetic to a void**
, no need for typecast. After dereferencing a void**
once, the resulting void*
has an unknown pointee which does require a typecast to access. (I know that's confusing, re-read these sentences and/or draw a picture to get clarity)void*
s in this program. Remember that pointer arithmetic always includes implicit scaling by the size of the pointee. However, if the pointer is a void*
, its pointee size is unknown, so using it in an arithmetic expression doesn't quite make sense as is and will cause the compiler to balk. To make your intentions clear, you cast the void*
in the expression to indicate the desired scaling (typically you using a cast to char*
to get no scaling and manually multiply).Similarly, although memcpy
can be used to do any kind of assignment, that doesn't mean it always should be used. Consider the three following ways of assigning an integer:
int dst, src;
void *pDst, *pSrc;
pDst = &dst;
pSrc = &src;
dst = src; // option A
*(int *)pDst = *(int *)pSrc; // option B
memcpy(pDst, pSrc, sizeof(int)); // option C
All three of these accomplish the same thing (copying a 4-byte integer from src to dst). Option A, the ordinary assignment statement, is clean and safe and always preferred if possible. If you are forced to operate through a void*
, you will have consider options B and C. Option B uses a typecast which invokes the type system to match the type and size. Given the correct typecast, this is relatively safe and straightforward. Option C operates at the lowest-level, with no type safety. A mismatch in the pointer types or size is silently accepted, yet causes havoc. When trying to read/write contents of a void*
, go with Option B if possible. Drop down to the low-level memory copying of Option C only when you have no other choice. As a rule of thumb: if the size argument to memcpy is a compile-time constant (and especially if it uses sizeof), this strongly suggests you shouldn't be using memcpy. Memcpy is needed only when the type/amount of memory being copied is determined at runtime.
Typecasts should not be thrown about indiscriminately. Any time you introduce a typecast you are assuming all responsibility for type-checking. You will need some in this program, but you should approach them warily and use typecasts only and exactly when you must. A compiler warning about mismatched types is best handled dealt by fixing the mismatch, not by adding a cast to suppress the warning. Redundant or nonsensical casts, such as stringing together sequence of casts or casting a L-value, are worthless. You should never ever cast a function pointer. You also never need to cast anything to a void*
, since it is the universal recipient and is compatible with any pointer type.
memcpy
is more efficient than memmove
and should be used when the source and destination do not overlap. Note that either can be used to write an entire sequential block of a memory in one call, which can be much more efficient that repeated calls to work on the memory in smaller pieces.You will use assert to handle improper client calls as well as internal catastrophes such as allocation failures. The assert
macro from < assert.h > is a mechanism for fatal errors. You assert a condition that must be true to continue:
assert(num != 0);
If the condition fails, assert prints the error and halts the program. A failed assert reports the problem in a fairly cryptic manner, so assert is generally not the appropriate mechanism for communicating a correctable problem to the user. However, liberal use of asserts is a defensive programming practice used for programmer-to-programmer communication about unrecoverable errors. For example, you can verify an allocation request was successful by asserting the result was non-NULL. A NULL result can mean that heap is exhausted or the internal malloc data structures have been damaged, in either case, there is no point in carrying on. The alternative of blundering on with a NULL pointer will eventually lead to a crash, but that may happen far removed from the original error. Asserting at the first sign of trouble is an aid to identifying where the root cause lies.
It is not a good idea to put code in a assert, it is best to only include a test expression with no side effects. For example, avoid using assert like this:
assert ((s = malloc(100) != NULL);
When a project is moving from development to production, asserts may be turned off to improve efficiency by setting a preprocessor flag that disables/removes all asserts With the code above, turning off asserts are will remove not only the test, but also the call to malloc, which would be rather unfortunate!
The CVector/CMap interface list the required asserts. You should assert exactly and only those conditions listed. Some students aspire to go beyond with additional bullet-proofing, for example, rejecting the CMap * if it's NULL or attempting to complain if the client passes the address of an element that is the wrong type/size/level of indirection/already freed. Although we salute your intentions, it is impossible to make good on them. Determining whether a pointer is valid is not a solvable problem in general. Testing ptr == NULL
will catch exactly that one case, but doesn't detect all other invalid addresses. Any measure to detect invalid pointers will be half-hearted at best. As the implementer, you have little choice but to assume the client will supply valid addresses and should write your code accordingly. (That said, our solution CVector/CMap goes out of its way to include special-case defenses against an isolated few client misuses that can be detected. You may have tripped this when working on assign2 or observe it when testing your CVector/CMap against the solution. These protections are not documented in the public interface and we do not want you to implement them.)
Prefer the for
loop for ordinary iteration (iterating over an array, walking a linked list, freeing all buckets in the map) instead of a while
loop. The for
loop is usually a clearer way of expressing the iteration and makes it less likely you will have a careless error like forgetting the initialization or increment step. See example loop below of tidy linked list traversal-- what's not to love?
for (void *cur = head; cur != NULL; cur = get_next(cur))
Don't discount the power of good naming conventions. In the past, we have seen some particularly odd bits of math and logic for when to allocate and how much to allocate, often exacerbated by mis-interpretations of your own fields. Does num_avail
track all allocated or just the allocated but not used? Naming choices like alloc_num
or len
can add to the confusion as it is not clear what units these are expressed in: number of bytes or number of elements? A good naming convention can be a lifesaver--- names like nelems_used
or nbytes_allocated
more clearly communicate what is being tracked.
Attempting to implement and test all of the operations simultaneously is sheer lunacy. One operation at a time! If your cmap_get
isn't finding an entry, is it because cmap_get
searches incorrectly, cmap_put
didn't store the entry, cmap_remove
mistakenly removed it, rehash
scrambled the table, or something else entirely? If you build one operation at a time and throughly test/debug before moving on, you have a reliable foundation to build on and won't be forced into trying to debug all the code all the time. Here is a suggested order of attack:
CVector (definitely do vector first, its implementation is much simpler)
CMap:
(You have read our page on effective software testing, right?) This assignment will require a more significant investment in testing than any previous assignment. The number of variables in play (settings for CVector/CMap when created, which operations are called in which order, etc.) create many diverse code paths to verify. If your testing passes through only a limited subset of those paths, then you can't be confident it will stand up to our rigorous grading scrutiny. Be vigilant and thorough!
cmap_remove
in all of the given test code!). You will have to add your own client testing code to ensure you have adequate testing coverage. gdb.
A line number given to the list
or break
commands is assumed to refer to a line number within the most recently accessed source file. If you intend a line in a different file, you can reference the line prefixed by the filename, e.g. break cmap.c:114
. You can also set breakpoints on functions by name, which finds the named function in whichever file it is defined.In addition to the given client programs in the starter, you may also want to write test programs of your own. Follow these steps to add a new test program to the project:
myprogram.c
file with the desired client program. hg add myprogram.c
to add the file to your repo.myprogram
to the names listed for PROGRAMS
.Now myprogram
is a part of your project and make
will build it. If your assign2 was solid, you could copy your spellcheck.c into your assign3 directory and follow the above steps above to use it as an additional test program.
The Makefile also includes a target to build solution versions of all test programs. The binky_soln version is built from the same binky.c but substitutes the sample library implementation of CVector/CMap for yours. For example, you can build vectest_soln
(which uses our CVector/CMap) and compare its behavior to vectest
(which uses your CVector/CMap). You can make the solution version of a single program by name, e.g. make vectest_soln
, or use make soln
to build solution versions of all programs named in the Makefile.
mytest
and followed the steps above to add to the repo and Makefile. Build the program and its solution using make mytest mytest_soln
. You can then use mytest
as the executable in a custom sanity test and it will compare the output from mytest
to mytest_soln
. Pretty cool!Time spent. Student hours spent (self-reported):
Points earned. Past median assignment score has hovered around ~80% of the total points (95/120). Scores on this project have traditionally been lower than the others, largely due to inadequate testing. Let's see if we can break that trend this quarter!