Computer Systems Organization

CSCI-UA.0201(003), Fall 2022

Lab-3: Binary Mystery

In this lab, we give you 5 object files, ex1_sol.o, ex2_sol.o, ..., ex5-sol.o, and withhold their corresponding C sources. Each object file implements a particular mystery function (e.g. ex1_sol.o implements the function ex1). We ask you to deduce what these mystery functions do based on their x86-64 assembly code and write the corresponding C function that accomplishes the same thing for each of the five functions.

Obtaining the lab

First, click on Lab3's github classroom invitation link (posted on Campuswire) and select your NYU netid. Next, clone your repo by typing the following

$ cd cso-labs
$ git clone git@github.com:nyu-cso-fa22/lab3-<YourGithubUsername>.git  lab3

This lab's files are located in the lab3/ subdirectory. You will write code for files ex{1-5}.c.

Uncover the mystery of assembly

The object files whose assembly code you seek to understand are ex1_sol.o, ex2_sol.o and so on, located under objs/ sub-directory. File ex1_sol.o implements mystery function ex1; file ex2_sol.o implements function ex2 and so on. Your goal in this lab is to figre out what mystery function ex1 does, and write its corresponding C code in file ex1.c; figure out what mystery function ex2 does, and write its corresponding C code in file ex2.c, and so on.

No goto's
For this lab, the only files that you should modify are ex[1-5].c. Furthermore, your implementation should not contain any goto statements. .

Suppose you set out to figure out what function ex1 (implemented in ex1_sol.o) does. There are two approaches to do this. You should use them both to help uncover the mystery.

Do not try to match assembly

It is not the right approach to try to match the object code of your C function line-by-line to those contained in ex{1-5}-sol.o. Doing so is painful and not necessary. Differences in the compiler versions, compilation flags, and small differences in C code will all result in different object code, although they do not affect the code's semantics. Therefore, trying to find a C function that generates the same object code is likely futile.

Test your solution

After you've finished each function (remember to remove the assert(0) statement), you can test its correctness as follows:
$ make
$ ./tester
Testing ex1...
ex1 passed
Testing ex2...
ex2 passed
Testing ex3...
ex3 passed 
Testing ex4...
ex4 passed
Testing ex5...
ex5 passed
The above ouput ocurrs when all your ex{1-5} functions pass the test.

To test multiple times, run ./tester -r with the -r option. This runs the tester using a new seed for its random number generator.

Some of you might want to skip around and implement the five ex* functions in arbitary order. This is a good strategy if you are stuck on some function. To test just ex2, type ./tester -t 2. Ditto with other functions.

Note: Passing the test does not guarantee that your implementation is not necessarily correct. During grading, we may manually examine your source code to determine its correctness.

Explanations on some unfamiliar assembly and others

For this lab, you need to review the lecture notes and textbook to refresh your understanding of x86 assembly. Below are some additional information not covered in the lecture notes that are helpful for this lab as well.

For those of you who want to go out in the world to explore other object files, you will find the official Intel instruction set manual useful. Note that in the Intel manual, the source and destination operands are reversed in an instruction (i.e. destination operand first, source operand last). In the lecture notes and gdb/objdump's disassembled output, the destination operand appears last in an instruction. These differences are due to two assembly syntaxes, AT&T syntax and Intel syntax. The GNU software (gcc, gdb etc) and lecture notes use AT&T syntax which puts the destination operand last and Intel manual (of course) uses Intel syntax which puts the destination operand first.

Handin Procedure

Follow these instructions.