David Mekhtiev

Last updated: April 2025 •

1. About

I am a recent computer science and engineering graduate from the University of Michigan, interested in computer science theory, low-level software, and computer architecture. I am focused on developing skills in the intersection of those areas, particularly high-performance computing and system architecture.

Email

davidmek@umich.edu [mailto]

Me

2. Education

University of Michigan

B.S.E in Computer Science & Engineering

August 2021 - May 2025

During my time at the University of Michigan I was a member of the M-Fly autonomous vehicle engineering project team, a TA for an upper level CS course for compiler design, and a recipient of the engineering scholarship of honor. A list of relevant coursework is available on my LinkedIn.

3. Experience

Listed in descending order of recency.

Teaching Assistant

EECS 483 (Compiler Design) [course site] [prof site]

January - May 2025

Working with Professor Max New to provide office hours, responses to student questions on Piazza, a weekly discussion section, exam proctoring/grading, and general support for course logistics during the Winter 2025 semester.
Analog Devices

Hardware & Systems Engineering Intern

May - August 2024

During this internship, I was tasked with automating an existing workflow for a Neural Processing Unit's SDK for PPA (power, performance, and area) analysis. This involved modifying memory map tables, generating linker scripts, compiling TensorFlow Lite neural networks into binary executables for the hardware, running SystemC simulations, scraping simulation logs for performance data, generating/simulating parameterizable RTL, and performing power usage analysis. I also profiled and optimized an existing CUDA and C++ codebase for 5G signal processing. This involved downconversion, OFDM via FFTs, channel estimation, demodulation, and EVM calculations on Nvidia GPUs. The GPU kernels were profiled with a mixture of Nsight Systems, Nsight Compute, roofline analysis, and Python scripting. Both tasks were fairly high in visibility and led to significantly improved workflows / data harvesting.
Analog Devices

System Software Engineering Intern

May - August 2023

During this internship, I worked on the product security team and developed a SystemC model of an elliptic curve cryptographic block in an embedded security enclave to aid in its functional verification and NIST certification. The model was designed with transaction level granularity.

4. Projects

Below you will find various notable technical projects I have worked on.

Sharded Paxos based key/value service

Academic project

Decemeber 2024

I implemented a distributed sharded key/value storage system with fault tolerance and consistency using Paxos. The system partitions keys across multiple replica groups, each responsible for a subset of shards, with a central "shard master" managing shard assignments and reconfigurations. I developed the shard master to handle configuration changes using Join(), Leave(), Move(), and Query() RPCs exposed to the client, ensuring even shard redistribution while maintaining fault tolerance. Later on, I built the sharded key/value servers to handle client requests with single-copy semantics, designing a robust protocol for shard transfers during configuration changes. This system successfully manages dynamic reconfigurations and ensures consistency across all replica groups.
Multiprocessor cache coherence protocol

Academic project

November 2024

I designed and formally verified a directory based cache coherence protocol for a 4 core processor. The protocol was a modification of the canonical MSI scheme found in many computer architecture courses. Modifications mainly included additional handshakes and transient states to deal with the fact that messages could be arbitrarily reordered by the interconnect network. Formal verification was done by encoding the replicated state machine of the processor and directory cache controllers in the Murphi language and exploring the ~700,000 possible state space for inavariant violations and deadlocks.
Paxos based key/value service

Academic project

November 2024

I implemented a Paxos-based key/value storage system that ensures high reliability and fault tolerance. The system replaces a single master view server with Paxos to manage consensus, enabling all replicas to process client requests in a consistent order without relying on a single point of failure. My work involved designing and implementing a Paxos library that supports concurrent agreement across multiple instances, managing memory efficiently for forgotten instances, and maintaining linearizable semantics for the key/value service. The project also required careful handling of duplicate requests and ensuring the system could recover state when replicas lagged behind. By layering the Paxos library, a replicated state machine, and the key/value server, I structured the implementation to separate concerns and simplify complexity.
Computer security exploits

Several academic projects

January 2024

Disclaimer: All computer security projects listed on this page and subsequent exploits were sanctioned by the University of Michigan. Any infrastructure mentioned as targets of exploits or breaches are owned and operated by the university for educational purposes.

1) In the first project, I implemented several exploits against known vulnerabilities in cryptosystems like MD5, SHA-1, SHA-2, and other hashes using Merkle-Damgård construction. First, I wrote scripts which perform a length extension attack and hash collision attack. I also created scripts that performed padding oracle attacks on a vulnerable endpoint that contained encrypted communications and made the mistake of doing decryption before message authentication in its backend.

2) In the second project, I executed several website security exploits such as SQL injection, Cross-site scripting (XSS), and Cross-site request forgery (CSRF) against various tiers of defenses.

3) In the third project, I retraced the steps an attacker took to breach a fictional company's wesbite and database. I had to employ tools like Wireshark, Python scripting, and networking code to find security vulnerabilities in the company's mobile device management (MDM) infrastructure, particularly in their DNS resolver and their multi-factor authentication tool.

4) In the fourth project, I performed several exploits against application security vulnerabilities, focusing on buffer overflows and control-flow hijacking. First, I developed input that triggered stack variable overwrites in a controlled environment, demonstrating the risks of improper memory management. Then, I crafted payloads to overwrite return addresses, redirecting program execution to arbitrary code. Using tools like GDB and x86-64 assembly, I created exploits that bypassed security defenses like DEP (Data Execution Prevention) and ASLR (Address Space Layout Randomization). Finally, I applied reverse engineering techniques with Ghidra to analyze a closed-source binary, identifying and exploiting vulnerabilities to achieve the desired control. These tasks enhanced my understanding of machine architecture, assembly language, and the importance of secure coding practices.

5) In the final project, I was tasked with performing a forensic analysis on a fictional cyber criminal named Leslie. I was provided with a forensic copy of his hard drive and well as his physical machine. In order to find incriminating evidence, I had to utilize a lot of the techniques from previous projects, including new ones like using stegonography, password crackers, binwalk, spectrograms, and the Autopsy digital forensics software.
Parameterizable superscalar out-of-order processor

Academic project [final report]

February - March 2024

For this project, my team and I designed and implemented a 32-bit RISCV superscalar out-of-order processor in behavioral SystemVerilog. Simulation and synthesis were performed via Synopsys VCS. Our design was inspired by the MIPS R10000 implementation of Tomasulo's algorithm and included core components like a reservation station (RS), reorder buffer (ROB), physical register file (PRF), and a retirement register allocation table (RRAT).

We chose a 3-way superscalar configuration to balance complexity and performance, incorporating advanced features such as early tag broadcasting (ETB), a non-speculative load-store queue with internal data forwarding from in flight stores to dependant loads, and a non-blocking L1 data cache with prefetching. ETB enabled dependent instructions to execute back-to-back with minimal delays, while the load-store queue effectively reduced memory access contention. The load-store queue handled dependency tracking with bit masks. Each load in the RS maintained a bit mask that indicated which stores in the Store Queue (SQ) were older and a second bit mask was used to track which older stores had unresolved addresses. This design was inspired by similar mechanisms in the Berkeley Out-of-Order Machine.

Our I$ and D$ were designed with prefetching and non-blocking mechanisms, enhancing data throughput. Memory handling was enhanced by implementing miss status handling registers (MSHRs) to reduce stalls caused by cache misses. Performance analysis revealed improvements in cycles per instruction (CPI) compared to baseline in-order designs, particularly on benchmarks leveraging instruction-level parallelism (ILP). However, challenges like cache aliasing and limited branch prediction accuracy highlighted areas for future improvement. Additionally, a React-based GUI debugger was developed to visualize all processor signals at every cycle of a program's execution to aid debugging. We were able to meet slack with a 13.7ns clock period (~73Mhz frequency).
C to x86-64 optimizing compiler

Several academic projects

January - March 2024

Over the course of several projects, I iteratively implemented an optimizing compiler which supported a sizeable subset of the C language (which we dubbed Oat) to x86-64 machine code. The compiler was entirely implemented in OCaml and followed the AMD64 System V ABI calling conventions. It was written in the following phases.

Phase 1: Implemented an assembler and simulator for a small, idealized subset of the x86-64 platform that will serve as the target language for the compiler.

Phase 2: Implemented a non-optimizing compiler for a subset of the LLVM IR language (dubbed LLVMlite) with x86-64 as the target. At this point, the compiler's backend was largely completed.

Phase 3: Implemented a non-optimizing compiler for Oat with LLVMlite as the target. At this point, the compiler's frontend was largely completed and it supported compiling simple Oat programs. [Oatv1 rules]

Phase 4: Implemented new Oat language features such as structs, function pointers, distinguishing between possibly null and definitely not null references, array initializers, and updating the type system for supporting all the prior additions. [Oatv2 rules]

Phase 5: Implemented compiler optimizations at the LLVMlite IR level in the backend. These included dataflow analysis, dead code elimination, constant propogation, and a proper register allocation heuristic instead of placing all variables and intermediate values on the stack. For register allocation, I chose to implement Chaitin's algorithm with coservative node coalescing.
Multithreaded network file server

Academic project

November - December 2023

Implemented an ACID compliant network file server using C++ and the Boost libraries for regex, thread, and reader-writer lock functionality. The file server can be run on any Unix machine, utilizes BSD sockets for interprocess communication, and has several design considerations for fault tolerance.
CNN forwarding layers optimized w/ Nvidia GPUs

Academic project

November 2023

Given a pretrained convolutional neural network written in CUDA for classifying MNIST-Fashion clothing images into one of several discrete bins, I optimized the provided forwarding kernel (it comprised 98.95% of total execution time) by utilizing several GPU optimization techniques. These included placing constant filter values into the GPUs constant memory (which has orders of magnitude less access cycles), rewriting memory access patterns such that they were coalesced and minimized memory bank conflicts, unrolling loops, and rewriting the algorithm to leverage the massive parallel compute capability of the Tesla V100 datacenter GPU I worked with. In fact, I was able to parallelize the work for a batch of 10,000 images, the subsequent iterations over all output feature maps, and the iterations over each individual pixel. In summary, although the theoretical amount of calculations did not change in any of the kernel invocations, it was rewritten in such a way that the V100 scheduled as much work as possible across its 80 streaming multiprocessors. The final kernel ran in 0.17s across both passes, whereas the original implementation took 13.78s in total (~81x speedup).
Unix virtual memory pager

Academic project

September - October 2023

Implemented a simulator of the pager portion of a Unix operating system used to manage application processes' virtual address space. The pager was written in C++ and implemented system calls like the Unix fork() which are used to create, copy, destroy address spaces, allocate more space in existing ones, and switch between address spaces.
Unix thread library

Academic project

September 2023

Implemented a POSIX-like thread library in C++, enabling thread creation, synchronization, and context switching on multicore machines. I managed threads using custom thread control blocks (TCBs) and preemptive scheduling with interrupt safety.

David Mekhtiev

1. About

Email

Me

Links

2. Education

University of Michigan

B.S.E in Computer Science & Engineering

August 2021 - May 2025

3. Experience

Teaching Assistant

EECS 483 (Compiler Design) [course site] [prof site]

January - May 2025

Analog Devices

Hardware & Systems Engineering Intern

May - August 2024

Analog Devices

System Software Engineering Intern

May - August 2023

4. Projects

Sharded Paxos based key/value service

Academic project

Decemeber 2024

Multiprocessor cache coherence protocol

Academic project

November 2024

Paxos based key/value service

Academic project

November 2024

Computer security exploits

Several academic projects

January 2024

Parameterizable superscalar out-of-order processor

Academic project [final report]

February - March 2024

C to x86-64 optimizing compiler

Several academic projects

January - March 2024

Multithreaded network file server

Academic project

November - December 2023

CNN forwarding layers optimized w/ Nvidia GPUs

Academic project

November 2023

Unix virtual memory pager

Academic project

September - October 2023

Unix thread library

Academic project

September 2023