Computer Systems Organization

CSCI-UA.0201(003), Fall 2022

Lab 2 (clab-part2): Basic C Programming

This lab will continue to hone your C programming skills. The lab asks you to write a few C functions manipulating strings, and to write a hash table which can be used to lookup keys in O(1) time.

To set up your lab repo, click on Lab2's github classroom invitation link (posted in Campuswire) and select your NYU NetID. Next, clone your repo locally by typing the following

$ cd cso-labs
$ git clone git@github.com:nyu-cso-fa22/clab-part2-<YourGithubUsername>.git  lab2
File modification
For this lab, the only files that you should modify are str.c, list.c and htable.c (additionally for bonus, parse_words.c and wordcount.c). You may read other files but you must not change them..

String functions

Complete 3 string functions, string_len, string_cmp, int_to_hex in file str.c by following the instructions in the comments.

Test by typing the following:

$ make
$ ./clab2_test
---Start testing str.c
      string_len...passed
      string_cmp...passed
      int_to_hex...passed
---Test str.c Passed

You should make sure that you've passed the tests for str.c before moving on to the rest of the exercises.

Linked List

You are to implement a linked list, where each node stores a key/value tuple (of type kv_t). The linked list should be sorted alphbetically by each node's key.

Read through the type definition of linked list node in file list.h Then, follow the instructions in list.c to implement the skeleton functions, list_init, list_insert_with_accum, list_find and list_get_all_tuples.

Test by typing make; ./clab2_test. You should see that you've passed the tests for list.c before moving on to the rest of the exercises.

Hash table

Now, you are ready to implement a hash table. In Python or Java, a hash table implementation is readily available to use as part of the language's core library (dictionary in Python, or HashTable from java.util.* in Java). However, C's core library has no hash tale implementation and you must implement your own (or download a 3rd party library).

Read through the type definition of our hash table in file htable.h. Below is a pictorial illustration of an example hash table.

Our hash table stores a collection of key/value tuples. It uses a hash function (hashcode) to map a tuple according to its key to one entry among the hash table's array of entries (also called buckets). The same key always maps to the same location, but different keys may also map to the same location, a scenario referred to as ``collision''. Our hash table handles ``collision'' by chaining; in particular, each entry points to a linked list of key-value tuples. Therefore, if multiple tuples map to the same entry, they can be chained together via the corresponding linked list. In the hash table example pictured above, tuples <key="hello", val=1> and <key="hehaw", val=1> map to the same entry at index 1. Therefore, these two tuples are chained together in a linked list of 2 nodes. By contrast, there is no collision at index 5 and tuple <key="world", val=1> is the only node in its linked list.

You hash table implementation should make use your linked list implementation.

Test by typing make; ./clab2_test. You should see that you've passed the tests for htable.c.

Bonus: counting word frequencies

This is a bonus exercise for those seeking extra practice on C.

In this exercise, you are going to count the frequency of words occured in a text document in order the find the topK most frequent words.

First, you will implement a splitter function parse_n_store_words (file parse_words.c) that parses a collection of words, separated by newline chracter '\n', and stores them into the hash table that you've implemented earlier in this lab.

Test by typing make; ./clab2_test. You should see that you've passed the tests for parse_words.c.

Next, read through the code in file wordcount.c. Complete the comparator function count_cmp.

Now let's download a file containing a list of 2 million breached passwords:

$ wget https://nyu-cso.github.io/labs/breached-passwds-2M.txt
Compile and run your wordcount program to find the top 10 most commonly used passwords in the file breached-passwds-2M.txt.
$ make
$ ./wordcount breached-passwds-2M.txt 

Handin Procedure

Follow these instructions.