PA2: Side Channel Attacks Spring 2022

Total Points: 20
Due: Thursday, April 28 at 11:59pm

This is a group project; you can work in a team of size at most two and submit one project per team. You are not required to work with the same partner on every project. You and your partner should collaborate closely on each part.

You have two late days that you may use to turn in work past the deadline over the entire quarter. A late day is a contiguous 24-hour period. Both you and your partner will be charged for every late day that you use, and you both must have late days to use them. These late days are intended to cover your extension needs for usual circumstances: brief illness, busy with other classes, interviews, travel, extracurricular conflicts, and so on. You do not need to ask permission to use a late day.

The code and other answers you submit must be entirely your team's own work. You may discuss the conceptualization of the project and the meaning of the questions, but you may not look at any part of someone else’s solution or collaborate with anyone other than your partner. You may consult published references, provided that you appropriately cite them (e.g., with program comments).

Solutions must be submitted to Gradescope.

Introduction

The goal of this assignment is to gain hands-on experience exploiting side channels. This assignment will demonstrate how side channels can be just as dangerous as the control flow vulnerabilities you exploited in PA 1.

You will be provided a two skeleton files (memhack.c and timehack.c) that you will use to exploit side channels present in a target library (sysapp.c) to obtain a protected key. You may consult any online references you wish. You must document sources you used in your solutions, for example with program comments. Failure to do so will be considered a violation of the academic integrity policy.

Read this First

This project asks you to develop attacks and test them in a virtual machine you control. Attempting the same kinds of attacks against others’ systems without authorization is prohibited by law and university policies and may result in fines, expulsion, and jail time. You must not attack anyone else’s system without authorization! You are required to respect the privacy and property rights of others at all times, or else you will fail the course.

Alice's company Security4All has noticed possible user passwords breach. Alice wants to figure out if the breach really happened. Unfortunately for Alice and her company, the breach can be very bad news and can affect the company's reputation badly. Alice would be super happy to get your assistance in developing exploits to verify if passwords can indeed be breached using side channel attacks.

Getting Started

Just like PA1, for this assignment you will be provided with a VirtualBox VM containing all the files you need. You can download the VM image here. As in PA1, download CSE127-PA2-VBOX.zip if VirtualBox is compatible with your machine; download CSE127-PA2-UTM.zip if you are using an M1 Mac (Intel Macs can use either VM software).

Make sure to import the pa2box.vbox file into VirtualBox via the Machine → Add menu item; don't create a brand-new VirtualBox VM or import the pa2box-data.vmdk file on its own: If you do this, you won't have the correct configuration.

The VM is configured with two users: student, with password hacktheplanet; and root, with password hackallthethings. We recommend you use the VM via ssh and scp, just like previous assignments. Please note that SSH is disabled for root, so you can only SSH in as the student user.

SSH is listening on port 2222, i.e.

ssh -p 2222 student@localhost

You can still log in as root using su or by logging into the VM directly, i.e.

student@CSE127:~$ su root

You can redownload the starter code here.

Assignment Files

Starter files are included inside the student user's home directory within the VM image.

There are two parts to this assignment, each with its own subdirectory, memhack and timehack. Each subdirectory contains exploit starter code (memhack.c or timehack.c), which imports a library (sysapp.c, with the same contents across the two subdirectories) with password-checking functionality vulnerable to side-channel attacks. You should not modify sysapp.c, only memhack.c and timehack.c. Each subdirectory contains a Makefile for building your exploits, i.e. run make or make clean if desired.

Assignment Instructions

You will be writing two exploits, each of which takes advantage of a side channel, to obtain the password in sysapp.c. In memhack.c, you will exploit a memory-based side channel, and in timehack.c you will exploit a timing-based side channel. See Section 2 for specific details. Once both of your exploits can determine the password stored in correct_pass in sysapp.c and call the hack_system() function successfully, the assignment is complete.

Submitting Your Solutions

Submit memhack.c, timehack.c to the PA2: Side Channel Attacks assignment on Gradescope.

The score that you receive on Gradescope is NOT your final grade for the PA, so please make sure that you test your code thoroughly.

Grading

We will randomize the contents of the correct_pass variable, and we expect your memhack.c and timehack.c to correctly determine the password. Each attack is worth 8 points.

To summarize, the points will be allocated as follows:

10 points for the memory-based side channel exploit
10 points for the timing-based side channel exploit

Exploit Construction

Memory-Based Side Channel

We recommend you start with the memory-based side channel because it is deterministic and doesn't have problems with noise. Look at the check_pass() function in sysapp.c and note two things:

The password string is passed by reference
The memory it points to is checked against the reference password one character at a time.

Now look in memhack.c and note how a buffer is allocated such that the page starting at page_start is protected (i.e., accessing it will caused a segmentation fault, or SEGV) and the previous 32 characters are allocated.

Now look at the demonstration function demonstrate_signals(), which shows how referencing any memory in the protected page will produce a fault. Also observe how the function catches that fault. You do not need to use this function; it is merely there to show you how to use signals to capture whether a memory reference touched a page or not.

Now you will want to create a framework (in memhack.c) to call check_pass() with different inputs and catching any resulting faults so you can determine if part of the password is correct. We suggest a loop over the maximum password size (32 characters) where for the first guess you store the password such that its first character is one byte before page start and then iterate between possible choices for the first character (when you get it right you will get a page fault). Repeat this to guess the entire password. Note that all ASCII symbols from ASCII symbol 33 (!) to ASCII symbol 126 (∼) can be used in the password. Other hints:

You are already given a page protected buffer with enough memory to crack the password. All you need to do is use it appropriately for each guess you make.
The demonstrate_signals function handles all the segfaults for you. You can re-use almost all of it in your code.

Note on `sigsetjmp`/`siglongjmp`

We highly advise reading the man pages for sigsetjmp/siglongjmp so you understand how they work.

One frequent pitfall students run into is calling sigsetjmp inside of a helper function, which can cause undefined behavior (especially when running on Gradescope).

An excerpt from man sigsetjmp:

   If the function which called setjmp() returns before longjmp() is called, the behavior is undefined. Some kind of subtle or unsubtle chaos is sure to result.

Timing-Based Side Channel

Unlike the memory-based side channel, the timing-based side channel will deal with noise. Go back and look at check_pass(). An artificial delay has been added when each character is checked to make your life easier (it’s possible to do the assignment without it but it would require a much more careful methodology). The execution time of check_pass() depends on how many characters you guess correctly.

Look in timehack.c and find a macro there for a function called rdtsc() which invokes the processors cycle counter (a clock that increments by one for each processor cycle that passes). In general, treat rdtsc() as a free running timer that returns a long. Insert a call to rdtsc() before the call to check_pass() and afterwards. Print the difference between these values to see how long (in cycles) the password checking ran. Run the program a few times. Now change the guess string so the first character is correct. Run again and see how the time difference changes.

Now automate this entire process, in the style of the original approach in memhack.c. Note that unlike the memhack attack, the timehack problem will have to deal with noise. Depending on things like what other programs are running, the status of the cache, the contents of the branch target buffer, etc... there can be significant variability in the amount of time each check takes. This will matter in practice. You will want to run a lot of trials before you reach your conclusion about each character. Other hints:

Be careful in using printf’s. These can blow out the instruction cache and data caches and perturb your results (i.e. overwhelm the timing effects you are trying to detect).
Be careful in averaging across trials. If your process is descheduled in the middle of a measurement, the time cost of that individual trial will be so large that it overwhelms everything else. Instead, the median is your friend. However, feel free to experiment if something does not work for you.
If time is not continuing to increase as you progress through characters, then you probably made a bad guess earlier. Backtrack.
rdtsc() will wrap around at some point. You may need to handle this outlier if it is causing issues.
Debugging advice: make a big array to hold your timing measurements and print them at the end.
Be sure to test a bunch of different passwords. We will when we grade.

Final Notes

Do not write a solution that simply checks all passwords exhaustively. You will not get credit for this. The timing side channel should be exploitable in linear time (we will stop programs that are running for excessive periods) and it will basically feel instantaneous for passwords of 8 characters or less (note we will not test passwords over 12 characters).