I am a computer science and engineering undergraduate at the University of Michigan
about to complete my bachelor's degree in May 2025. I am interested in computer science theory,
low-level software, and computer architecture. I am focused on developing skills in the
intersection of those areas, particularly high-performance computing and system architecture.
Working with Professor Max New to provide office hours, responses to student questions on Piazza,
leading a discussion section, and general support for course logistics during the Winter 2025 semester.
Analog Devices
Hardware & Systems Engineering Intern
May - Aug. 2024
During this internship, I was tasked with automating an existing workflow for a Neural Processing Unit's SDK for
PPA (power, performance, and area) analysis. This involved modifying memory map tables, generating linker scripts,
compiling TensorFlow Lite neural networks into binary executables for the hardware, running SystemC simulations,
scraping simulation logs for performance data, generating/simulating parameterizable RTL, and performing power
usage analysis. There was a large amount of work done with Cadence tools as well as Cadence DSPs. I also did
profiling and optimizations on an existing CUDA and
C++ codebase for 5G signal processing. This involved
downconversion, OFDM via FFTs, channel estimation, demodulation, and EVM calculations on Nvidia GPUs.
The GPU kernels were profiled with a mixture of Nsight Systems, Nsight Compute, roofline analysis, and
Python scripting.
Both tasks were fairly high in visibility and led to significantly improved workflows / data harvesting.
Analog Devices
System Software Engineering Intern
May - Aug. 2023
During this internship, I worked on the product security team and developed a
SystemC model of an elliptic curve
cryptographic block in an embedded security enclave to aid in its functional verification and NIST certification.
Super scary secret stuff.
3. Projects
Below you will find various notable technical projects I have worked on as part of my studies and free-time tinkering.
I am slowly filling out the descriptions for some.
Sharded Paxos based key/value service
Academic project
Dec. 2024
I implemented a distributed sharded key/value storage system with fault tolerance and consistency using Paxos.
The system partitions keys across multiple replica groups, each responsible for a subset of shards, with a central
"shard master" managing shard assignments and reconfigurations. I developed the shard master to handle configuration
changes using Join(),
Leave(),
Move(), and
Query() RPCs exposed to the client,
ensuring even shard redistribution while maintaining fault tolerance. Later on, I built the sharded key/value
servers to handle client requests with single-copy semantics, designing a robust protocol for shard transfers during
configuration changes. This system successfully manages dynamic reconfigurations and ensures
consistency across all replica groups.
Multiprocessor cache coherence protocol
Academic project
Nov. 2024
I designed and formally verified a directory based cache coherence protocol for a 4 core processor.
The protocol was a modification of the canonical MSI scheme
found in many computer architecture courses. Modifications
mainly included additional handshakes and transient states to deal with the fact that messages could be arbitrarily
reordered by the interconnect network. Formal verification was done by encoding the replicated state machine of
the processor and directory cache controllers in the Murphi language
and exploring the ~700,000 possible state space for inavariant violations and deadlocks.
Paxos based key/value service
Academic project
Nov. 2024
In this project, I implemented a Paxos-based key/value storage system that ensures high reliability and fault
tolerance. The system replaces a single master view server with Paxos to manage consensus, enabling all replicas
to process client requests in a consistent order without relying on a single point of failure. My work involved
designing and implementing a Paxos library that supports concurrent agreement across multiple instances,
managing memory efficiently for forgotten instances, and maintaining linearizable semantics for the key/value
service. The project also required careful handling of duplicate requests and ensuring the system could
recover state when replicas lagged behind. By layering the Paxos library, a replicated state machine,
and the key/value server, I structured the implementation to separate concerns and simplify complexity.
Computer security exploits
Several academic projects
Jan. 2024
Disclaimer: All computer security projects listed on this page and subsequent exploits were sanctioned
by the University of Michigan. Any infrastructure mentioned as targets of exploits or breaches are owned
and operated by the university for educational purposes.
1) In the first project, I implemented several exploits against known vulnerabilities in cryptosystems
like MD5, SHA-1, SHA-2, and other hashes using
Merkle-Damgård construction.
First, I wrote scripts which perform a length extension attack and hash collision attack. I also
created scripts that performed padding oracle attacks on a vulnerable endpoint that contained
encrypted communications and made the mistake of
doing decryption before message authentication
in its backend.
2) In the second project, I executed several website security exploits such as SQL injection, Cross-site scripting (XSS),
and Cross-site request forgery (CSRF) against various tiers of defenses.
3) In the third project, I retraced the steps an attacker took to breach a fictional company's wesbite and database. I had
to employ tools like Wireshark, Python scripting, and networking code to find security vulnerabilities in the company's
mobile device management (MDM) infrastructure, particularly in their DNS resolver and their multi-factor authentication tool.
4) In the fourth project, I performed several exploits against application security vulnerabilities, focusing on buffer
overflows and control-flow hijacking. First, I developed input that triggered stack variable overwrites in a controlled
environment, demonstrating the risks of improper memory management. Then, I crafted payloads to overwrite return addresses,
redirecting program execution to arbitrary code. Using tools like GDB and x86-64
assembly, I created exploits that bypassed
security defenses like DEP (Data Execution Prevention) and ASLR (Address Space Layout Randomization). Finally, I applied
reverse engineering techniques with Ghidra to analyze a closed-source binary, identifying and exploiting vulnerabilities
to achieve the desired control. These tasks enhanced my understanding of machine architecture, assembly language, and
the importance of secure coding practices.
5) In the final project, I was tasked with performing a forensic analysis on a fictional cyber criminal named Leslie. I
was provided with a forensic copy of his hard drive and well as his physical machine. In order to find incriminating evidence,
I had to utilize a lot of the techniques from previous projects, including new ones like using stegonography,
password crackers, binwalk, spectrograms, and the Autopsy digital forensics software.
For this project, my team and I designed and implemented a 32-bit RISCV superscalar out-of-order processor in behavioral SystemVerilog.
Simulation and synthesis were performed via Synopsys VCS. Our design was inspired by the MIPS R10000
implementation of Tomasulo's algorithm and
included core components like a reservation station (RS), reorder buffer (ROB), physical register file (PRF), and a retirement
register allocation table (RRAT).
We chose a 3-way superscalar configuration to balance complexity and performance, incorporating
advanced features such as early tag broadcasting (ETB), a non-speculative load-store queue with internal data forwarding from in flight stores to dependant loads,
and a non-blocking L1 data cache with prefetching. ETB enabled dependent instructions to execute back-to-back with minimal delays, while the load-store queue
effectively reduced memory access contention. The load-store queue handled dependency tracking with bit masks. Each load in the RS maintained a bit mask that
indicated which stores in the Store Queue (SQ) were older and a second bit mask was used to track which older stores had unresolved addresses. This design was
inspired by similar mechanisms in the Berkeley Out-of-Order Machine.
Our I$ and D$ were designed with prefetching and non-blocking mechanisms, enhancing data throughput.
Memory handling was enhanced by implementing miss status handling registers (MSHRs) to
reduce stalls caused by cache misses. Performance analysis revealed improvements in cycles per instruction (CPI) compared to
baseline in-order designs, particularly on benchmarks leveraging instruction-level parallelism (ILP). However, challenges
like cache aliasing and limited branch prediction accuracy highlighted areas for future improvement. Additionally, a
React-based GUI debugger was developed to visualize all processor signals at every cycle of a program's execution to aid debugging.
We were able to meet slack with a 13.7ns clock period (~73Mhz frequency).
C
to
x86-64
optimizing compiler
Several academic projects
Jan - Mar. 2024
Over the course of several projects, I iteratively implemented an optimizing compiler which supported
a sizeable subset of the C language
(which we dubbed Oat) to x86-64 machine code.
The compiler was entirely implemented in OCaml
and followed the AMD64 System V ABI calling conventions. It was written in the following phases.
Phase 1:
Implemented an assembler and simulator for a small, idealized subset of the x86-64 platform
that will serve as the target language for the compiler.
Phase 2:
Implemented a non-optimizing compiler for a
subset of the LLVM IR language
(dubbed LLVMlite) with x86-64 as the target.
At this point, the compiler's backend was largely completed.
Phase 3:
Implemented a non-optimizing compiler for Oat
with LLVMlite as the target. At this point,
the compiler's frontend was largely completed and it supported compiling simple Oat programs.
[Oatv1 rules]
Phase 4:
Implemented new Oat language features
such as structs, function pointers, distinguishing between possibly null and definitely not null references,
array initializers, and updating the type system for supporting all the prior additions.
[Oatv2 rules]
Phase 5:
Implemented compiler optimizations at the LLVMlite IR
level in the backend. These included dataflow analysis, dead code elimination, constant propogation, and a proper
register allocation heuristic instead of placing all variables and intermediate values on the stack. For register
allocation, I chose to implement Chaitin's algorithm
with coservative node coalescing.
Multithreaded network file server
Academic project
Nov - Dec. 2023
Implemented an ACID compliant network file server using
C++20 and the
Boost libraries for regex, thread, and reader-writer lock functionality.
The file server can be run on any x86 Unix machine, utilizes BSD sockets for interprocess network communication,
and has several design considerations for fault tolerance.
CNN forwarding layers optimized w/ Nvidia GPUs
Academic project
Nov. 2023
Unix virtual memory pager
Academic project
Sept - Oct. 2023
Implemented a simulator of the pager portion of a Unix operating system kernel used to manage application
processes' virtual address space. The pager was written in C++20 and
implemented system calls like the Unix fork() which are used
to create, copy, destroy address spaces, allocate more space in existing ones, and switch between address
spaces. The pager features a simulated MMU and an interrupt handler for memory faults.