Harsit Upadhya

M.S. Computer Science · Emory University · Class of 2026

Open to AI/ML, data engineering, and SWE roles.

I build AI systems across data engineering, deep learning, and production software, from sub-second analytics on 385M events to biomedical RAG and medical image enhancement. Currently researching digital health at the Emory FIT Lab, with a focus on teams where rigorous engineering meets real-world impact.

Research Interests

My research explores how large language models and deep learning can be applied to real-world problems in healthcare and science. Current focus areas: retrieval-augmented generation for domain-specific QA, medical image enhancement, graph neural networks, and digital health monitoring using passive technology interaction data.

Retrieval-Augmented Generation Large Language Models Medical Imaging Digital Health Graph Neural Networks Data Engineering Deep Learning

Projects

CodeNexus: AI Code Review Knowledge Graph

New

A topology-aware code review system that fuses deterministic static analysis with specialized LangGraph agents. Instead of dumping raw files into an LLM, CodeNexus builds an AST-driven project graph so agents can reason over imports, call chains, inheritance, security risks, and developer rationale with precise codebase context.

LangGraph Tree-sitter NetworkX Static Analysis Ruff Semgrep Bandit ESLint

GitHub

E-Commerce Behavior Analytics Platform

Featured

A high-performance analytics platform that turns 385M+ raw e-commerce events into sub-second business intelligence. Built with a three-tier cloud-native stack (React + FastAPI + PostgreSQL star schema) on Google Cloud.

PostgreSQL 14 FastAPI React 18 Material-UI Google Cloud SQL Cloud Run Docker

Live Demo GitHub

CodeSelf: Self-Improving Coding Agent

Build Log

A reproducible research framework for studying whether coding agents can improve through execution feedback. CodeSelf converts benchmarks into canonical tasks, runs generated Python in a sandbox, scores correctness-first rewards, and scaffolds GRPO/PPO experiments with held-out statistical evaluation.

12 milestone build log GRPO + PPO scaffolds Sandboxed execution

Python Reinforcement Learning GRPO Coding Agents Sandboxing Evaluation

GitHub Blog Series

Code Visualizer: Python Runtime Map

Interactive

A browser-based Python execution visualizer that makes runtime behavior spatial. Paste code, run it with Pyodide, and scrub through line-by-line traces of scopes, variables, references, loops, function calls, object mutations, stdout, and errors.

sys.settrace snapshots Timeline playback Shareable #cv links

React TypeScript Pyodide Web Workers SVG Vitest

Dashboard GitHub

Retcon: Local-First LM Adaptation Lab

Lab

A reproducible lab for domain-adaptive language model experiments. Retcon turns local corpora into config-hashed training and evaluation artifacts, then tracks LoRA runs, checkpoint evaluations, contamination checks, forgetting signals, provenance, and reports.

LoRA + partial unfreeze Forgetting detection Provenance manifests

Python Hugging Face PEFT / LoRA Accelerate Typer Pydantic Streamlit

GitHub

Medical Image Enhancement (Pix2Pix)

Extended Pix2Pix with self-attention for chest X-ray quality improvement on NIH ChestX-ray14. Built synthetic degradation pipeline for paired training data.

PSNR 39.97 dB SSIM 0.9755

PyTorch Pix2Pix / cGAN U-Net PatchGAN Self-Attention

Report

nlTGCR: Second-Order Optimizer

A second-order optimizer using the Fisher Information Matrix that beats Adam and RMSProp on MLPs. Applied Nyström approximation and Kronecker-factored preconditioning with JAX-style JIT compilation.

17× faster per epoch +3.2% accuracy vs Adam

PyTorch Fisher Information Matrix Nyström Approx. K-FAC JIT Compilation CIFAR-10

Report

PEGASUS Paper Summarizer

Abstractive summarization pipeline for arXiv papers using google/pegasus-pubmed. Trained on 1,000 papers with beam search decoding. Published model on Hugging Face.

ROUGE-1: 0.377 ROUGE-2: 0.126 ROUGE-L: 0.219

PEGASUS PyTorch Lightning Hugging Face A100 / CUDA ROUGE

Report Model Card

TasteMatch: AI Dietitian Chatbot

LLM-powered personal dietitian for diabetes management. Generates personalized meal recommendations with glycemic index verification and portion size calculations against clinical guidelines.

Ollama FastAPI LLMs RAG Diabetes Care

Demo

GNN Document Classification

Document relationship modeling on CORA using Graph Neural Networks. Compared GCN, GAT, and GraphSAGE architectures for citation prediction and clustering.

PyTorch Geometric GCN GAT GraphSAGE CORA Dataset

Code

RAG-BioQA

Retrieval-augmented biomedical QA on PubMedQA. Dense retrieval with BioBERT + FAISS, re-ranking with BM25/ColBERT/MonoT5, and LoRA fine-tuned T5 generator.

BioBERT FAISS T5 + LoRA ColBERT MonoT5 PubMedQA

Details

View all projects & research →

Publications

Harsit Upadhya, Upadhyay, A.

XNLI 2.0: Improving XNLI Dataset and Performance on Cross-Lingual Understanding

IEEE 8th I2CT Conference · 2023

PDF arXiv

View all publications →

Experience

Jan 2026 – Present

Graduate Teaching Assistant

Emory University · Atlanta, GA

Instruct 40 undergraduates in Data Science 100, covering R tidyverse, data cleaning, visualization, and EDA

Jan 2025 – Present

Graduate Research Assistant

Emory FIT Lab · Atlanta, GA

Built automated pipeline to extract and analyze Amazon Alexa voice interaction logs using Python/Selenium
Identified technology engagement patterns indicating functional decline in older adults for digital health monitoring

Oct 2024 – Dec 2025

Proctor

Emory University, Office of Undergraduate Education

Supervised 50+ exams while enforcing academic integrity protocols and supporting 300+ students during testing sessions

May 2025 – Oct 2025

VP, International Student Affairs

Graduate Student Government Association (GSGA)

Selected executive board member; served as primary liaison for 500+ international graduate students

Feb 2025 – Aug 2025

International Student Liaison

Emory University, Laney Graduate School

Mentored international graduate students through academic and cultural transitions, strengthening peer support networks and community-building initiatives
Collaborated with university staff to surface student challenges and improve orientation programming and campus integration

Get in Touch

Open to collaborations in AI/ML, digital health, NLP, and research-backed product work.

[email protected]

Or via GitHub or LinkedIn.