Computer Systems Organization (Fall 2022)

Lab-4: Malloc lab

Introduction

In this lab you will be writing a dynamic storage allocator for C programs, i.e., your own version of the malloc, free and realloc routines. You are encouraged to explore the design space creatively and implement an allocator that is correct, efficient and fast.

This is an individual lab.

Obtaining the lab

First, click on Lab4's github classroom invitation link (posted on Campuswire) and select your NYU NetID. Next, clone your repo by typing the following

$ cd cso-labs
$ git clone git@github.com:nyu-cso-fa22/lab4-<YourGithubUsername>.git  lab4

This lab's files are located in the lab4/ subdirectory.

Files you can modify

The only files you should modify for this lab are mm-implicit.c. If you are doing the bonus part of this lab, you'll also modify mm.c, mm.h, mm-unittest.c.

Malloc API

The API of your malloc library consists of the following 5 functions (declared in mm-common.h:

int    mm_init(void);
void*  mm_malloc(size_t size);
void   mm_free(void *ptr);
void*  mm_realloc(void *ptr, size_t size);
heap_info_t mm_checkheap(bool verbose);

Your malloc implementation must complete the above 5 functions. Note that we use the prefix mm_* in the above APIs to distinguish your implementation from those come with the standard C library. The API semantics are as follows:

mm_init: This function is called to initialize your malloc library prior to any any mm_malloc and mm_free calls. In this lab, the given tester programs (mm_unittest.c, mdriver.c) invoke mm_init. The return value is -1 if there was a problem in performing the initialization, 0 otherwise.

mm_malloc: This function allocates size bytes of contiguous space on the heap. It returns a pointer to the allocated space. The allocated space should lie within the heap region and should not overlap with any other allocated space.

We will comparing your implementation to the version of malloc supplied in the standard C library. Since the standard C library malloc always returns payload pointers that are aligned to 16 bytes, your malloc implementation should do likewise and always return 16-byte aligned pointers.

mm_free: This function frees the corresponding chunk whose payload is pointed to by ptr. It returns nothing. This function assumes that the pointer ptr was returned by an earlier call to mm_malloc or mm_realloc and has not yet been freed.

mm_realloc: This function allocates size bytes of contiguious space, with the following constraints.

if ptr is NULL, the call is equivalent to mm_malloc(size)
if size is equal to zero, the call is equivalent to mm_free(ptr)};
if ptr is not NULL, it must have been returned by an earlier call to mm_malloc or mm_realloc. The call to mm_realloc changes the memory block pointed to by ptr (i.e.the old block) to one with size bytes and returns the address of the new block. Notice that the address of the new block might be the same as the old block, or it might be different, depending on your implementation, the amount of internal fragmentation in the old block, and the size of the realloc request.
The contents of the new block are the same as those of the old ptr block, up to the minimum of the old and new sizes (the new size can be either smaller or larger than the old size). Everything else is uninitialized. For example, if the old block is 8 bytes and the new block is 12 bytes, then the first 8 bytes of the new block are identical to the first 8 bytes of the old block and the last 4 bytes are uninitialized. Similarly, if the old block is 8 bytes and the new block is 4 bytes, then the contents of the new block are identical to the first 4 bytes of the old block.

Lastly, mm_checkheap: The mm_cheapheap routine is an auxilary routine that checks the consistency of your heap. The function argument verbose can be used to turn on/off verbose debug printing.

The mm_malloc, mm_free, mm_realloc semantics match those of the C standard library's malloc, realloc, and free routines. Type man malloc for the complete documentation.

When implementing mm_init, mm_free, mm_mallocmm_realloc functions, you need to invoke the following functions which emulate the OS' syscalls, instead of using the real ones (in order to make testing easier). These emulated syscalls are defined in memlib.c and described below:

void *mem_sbrk(int incr): Expands the heap by incr bytes, where incr is a positive non-zero integer and returns a generic pointer to the first byte of the newly allocated heap area. The semantics are identical to the Unix sbrk function, except that mem_sbrk accepts only a positive non-zero integer argument.
void *mem_heap_lo(void): Returns a generic pointer to the first byte in the heap.
void *mem_heap_hi(void): Returns a generic pointer to the last byte in the heap.
size_t mem_heapsize(void): Returns the current size of the heap in bytes.
size_t mem_pagesize(void): Returns the system's page size in bytes

Your implementation must observe the following rules:

You should not invoke any memory-management related library calls or system calls, i.e. you cannot use malloc, calloc, free, realloc, sbrk, brk or any variants of these calls in your code. You may use memcpy and other C library functions that are not related to memory-management. In case of doubt, please check with the staff on Campuswire.
You should not use any dynamically sized data structures such as dynamically-sized arrays, trees, or lists in your program. Yes, you can define a global array of a fixed number of elements of structs, pointers or other scalars.
For consistency with the C standard library's malloc package, which returns blocks aligned on 16-byte boundaries, your allocator must always return pointers that are aligned to 16-byte boundaries. The tester will enforce this requirement for you.

Implement implicit list malloc

You are asked to implement an existing simple design, i.e. the implicit list based malloc (without footer).

To implement implicit-list, you shall work from the skeleton code given to you in mm-implicit.{c,h}. You should only modify mm-implicit.c while leaving mm-implicit.h unchanged.

In mm-implicit.h, we give you the definition of chunk header in the implicit list design. For simplicity, we make the header (16-byte) aligned. Unlike the header discussed in the lecture, we will simplify it further and use an 8-byte integer (size_t) to store the chunk size (including chunk header) and another 8-byte integer to store the status on whether the chunk is allocated or free. Since our implementation frequently refers to the size of the header, we set global variable hdr_size = sizeof(header_t), and use hdr_size later in the code. We don't have a footer in the chunk, so coalescing is only done with the next but not the previous free chunk.

You shall complete the implicit list implementation in the following 3 steps:

Complete helper function next_chunk and use it to complete the heap checker function mm_heapcheck. You can test these two functions by compiling make and running the unit test ./implicit-unittest. The unit test should finish its first 2 tests, test_next_chunk and test_mm_checkheap and fails at the 3rd test (since you've not implemented the rest yet). Please read the unit test file mm-implicit-unittest.c to understand what the unit test is doing.
Complete helper functions ask_os_for_chunk, split, first_fit and use them to implement mm_malloc. Perform unit test on your implementation.
Complete helper functions payload2header, coalesce and use them to implement mm_free.

After you've completed your implemementation, make sure that all unit tests pass.

$ make
$ ./implicit-unittest
start test_next_chunk
start test_mm_checkheap
start test_ask_os_for_chunk
start test_split
start test_first_fit
start test_mm_malloc
start test_payload2header
start test_coalesce
start test_mm_free
start test_mm_realloc
all unit tests passed...

Now you are ready to run our trace-based tester. The tester program is provided as mdriver.c which is linked to your malloc implementation and tests it for correctness, space utilization, and throughput. We name the tester that's been linked with the implicit-list malloc as implicit-mdriver.

The tester implicit-mdriver reads a set of trace files, each of which contains a sequence of allocate, reallocate, and free events corresponding to some example application workload. It then calls your mm_malloc, mm_realloc, and mm_free routines according to the sequence of events.

To run the tester, type:

$ ./implicit-mdriver -V

The -V turns on verbose printing in the tester. It's unlikely that you'll pass all the tracefiles even though you've already passed the unit tests. To debug, first identify which tracefile caused the error based on the output of `./implicit-mdriver -V`. Then, run the tester on that specific tracefile instead of all of the required tracefiles. For example, suppose you've identified that tracefile named amptjp-bal.rep has caused error. Then, type the following to run the test based on amptjp-bal.rep tracefile alone:

$./implicit-mdriver -V -f traces/amptjp-bal.rep

Note that all the tracefiles reside in the traces/ subdirectory. You can also type $./implicit-mdriver -h to see a full list of command line options.

The next step in debugging is to understand why this tracefile has triggered error in your code. In order to do this, you must understand the sequence of malloc/free/realloc requests that your code has been invoked with according to the tracefile. Open the tracefile, which is a simple ASCII file, using any editor of your choice. For example, below is the content of the first 10 lines of realloc-bal.rep. The comments following // annotate what each line means:

a 0 512 //(m)alloc request 0 demands 512 bytes
a 1 128 //(m)alloc request 1 demands 128 bytes
r 0 640 //realloc demands request 0 be enlarged to 640 bytes
a 2 128 //(m)alloc request 2 demands 128 bytes
f 1     //free request 1
r 0 768 //realloc demands request 0 be enlarged to 768 (note this is the 2nd time request 0 has been realloc-ed)
a 3 128 //(m)alloc request 3 demaands 128 bytes
f 2     //free request 2
r 0 896 //realloc demands request 0 be further enlarged to 896 bytes (3rd time request 0 has been realloc-ed)
a 4 128 //(m)alloc request 4 demands 128 bytes
...

Once you know the sequence of malloc/free/realloc requests of a tracefile, then you know what the expected behavior of your heap should be. And you can run your code in gdb to see if the code's actual behavior matches its expected behavior.

Once you got every tracefile to pass correctly, you can test all tracefiles again to make sure everything goes well. Here's the example output if the tester runs successfully.

$ ./implicit-mdriver
Using default tracefiles in ./traces/
Performance index = 60.0 * util + 40.0 * (your throughput)/(libc's throughput)
11 out of 11 traces passed, average performance index 48.4 (out of 100.0)

You can refer to here to understand how the performance index is calculated. As we expect, the implicit list design works but does not achieve great performance.

Bonus part: Implement your own malloc

For extra challenge after finishing implicit-list malloc, you can implement your own malloc design, or do a variation of one of the designs taught in the lecture. For example, you can implement explicit-list, segregated explict-list, buddy system, or invent your own!

You are supposed to modify only these files: mm.c mm.h mm-unittest.c. This time, we no longer provide the unit test implementation nor any skeleton code. You should write your own unit test (mm-unittest.c) in the style of mm-implicit-unittest.c to test the correctness of your implementation in mm.c. The given makefile compiles your unit test mm-unittest.c into binary executable unittest which you can invoke.

Once you finish the implementation, invoke the trace-based tester as follows:

$ ./mdriver -V

Programming Advice

Your code should have a header comment that describes the overall design of your malloc implementation: e.g. the organization of the free list. Each function should also have a header comment that describes what the function does.
Define your chunk header as structs and modularize your implementation using helper functions. The skeleton code of mm-implicit.c gives you an example of modular implementation. Follow it. While Bryant/O'Hallaron's textbook is useful for reviewing the malloc design, I do not recommend follow the textbook programming style of doing pointer arithmetic everywhere and using C macros instead of regular functions. It makes the code harder to read and debug. Define structs to represent chunk meta-data and use regular helper functions, as demonstrated in mm-implicit.c
Heap consistency checker:

Your heap consistency checker mm_checkheap should be thorough. Some examples of what a heap checker might check are:
- Is every block in the free list marked as free?
- If you are coalescing with both previous and next free chunk, there should not be any contiguous free blocks. Is this property always true?
- Is every free block in the free list?
- Do the pointers in the free list point to valid free blocks?
- Do any allocated blocks overlap?
- Do the pointers in a heap block point to valid heap addresses?
Write unit tests (in mm-unittest.c) in the style of mm-implicit-unittest.c. They might seem tedious, but they are crucial for debugging.
When testing with trace-based tester, use simple workloads first to debug correctness. We provide two very simple trace files short{1,2}.rep to simplify debugging. Type mdriver -f ./traces/short1.rep to test your memory allocator on one of these workloads first.
The first 9 traces contain requests to malloc and free. The last 2 traces contain requests for realloc, malloc, and free. We recommend that you start by getting your malloc and free routines working correctly and efficiently on the first 9 traces. Only then should you turn your attention to the realloc implementation. For starters, build mm_realloc on top of your existing mm_malloc and mm_free implementations. But to get really good performance, you will need to build a stand-alone mm_realloc.
Use gdb. It will help you isolate and identify out of bounds memory references.
For advanced students, you should consider learning to use a profiler, such as gprof, to understand and optimize the performance of your program.

Evaluation

Programming style:

As is the case with previous labs, we reserve the right to critique your coding style and we will deduct up to 20% of total lab points for bad style based on our subjective evaluation.

You will receive zero points if you break any of the rules or your code is buggy and crashes the tester. Otherwise, your grade will be calculated as follows:

Correctness of implicit-list (40 points). To achieve full score, your implicit list implementation must pass both the unit test implicit-unittest and trace-based test implicit-mdriver.
Performance of your malloc (40 points). Two performance metrics will be used to evaluate your solution:
- Space utilization: The peak ratio between the aggregate amount of memory used by the tester (i.e., allocated via mm_malloc or mm_realloc but not yet freed via mm_free) and the size of the heap used by your allocator. The optimal ratio equals to 1. You should find good policies to minimize fragmentation in order to make this ratio as close as possible to the optimal.
- Throughput: The average number of operations completed per second.
The tester program summarizes the performance of your allocator by computing a performance index, which is a weighted sum of the space utilization (U) and throughput (T)
```
Performance index = 0.6*U + 0.4*min(1, T/T_libc)
```
where U is your space utilization, T is your throughput, and T_libc is the measured throughput of Standard C library's malloc on your system on the default traces. Throughput is calculated as the number of requests in a given trace divided by how long it takes a malloc library to process all those requests.
Style (20 points)
. We will look at your code to determine if you wrote them in a clear and readable fashion.

Observing that both memory and CPU cycles are expensive system resources, we adopt this formula to encourage balanced optimization of both memory utilization and throughput. To receive a good score, you must achieve a balance between utilization and throughput. Ideally, the performance index could reach 100%, but that is very difficult.

Handin Procedure

Follow these instructions.