Harsit Upadhya

Harsit Upadhya

M.S. Computer Science · Emory University · Class of 2026

Open to AI/ML, data engineering, and SWE roles.

I build AI systems across data engineering, deep learning, and production software, from sub-second analytics on 385M events to biomedical RAG and medical image enhancement. Currently researching digital health at the Emory FIT Lab, with a focus on teams where rigorous engineering meets real-world impact.

01

Research Interests

My research explores how large language models and deep learning can be applied to real-world problems in healthcare and science. Current focus areas: retrieval-augmented generation for domain-specific QA, medical image enhancement, graph neural networks, and digital health monitoring using passive technology interaction data.

Retrieval-Augmented Generation Large Language Models Medical Imaging Digital Health Graph Neural Networks Data Engineering Deep Learning
02

Projects

CodeSelf: Self-Improving Coding Agent

New

A reproducible research framework for studying whether coding agents can improve through execution feedback. CodeSelf converts benchmarks into canonical tasks, runs generated Python in a sandbox, scores correctness-first rewards, and scaffolds GRPO/PPO experiments with held-out statistical evaluation.

12 milestone build log GRPO + PPO scaffolds Sandboxed execution
Python Reinforcement Learning GRPO Coding Agents Sandboxing Evaluation

Code Visualizer: Python Runtime Map

New

A browser-based Python execution visualizer that makes runtime behavior spatial. Paste code, run it with Pyodide, and scrub through line-by-line traces of scopes, variables, references, loops, function calls, object mutations, stdout, and errors.

sys.settrace snapshots Timeline playback Shareable #cv links
React TypeScript Pyodide Web Workers SVG Vitest

Retcon: Local-First LM Adaptation Lab

New

A reproducible lab for domain-adaptive language model experiments. Retcon turns local corpora into config-hashed training and evaluation artifacts, then tracks LoRA runs, checkpoint evaluations, contamination checks, forgetting signals, provenance, and reports.

LoRA + partial unfreeze Forgetting detection Provenance manifests
Python Hugging Face PEFT / LoRA Accelerate Typer Pydantic Streamlit

Medical Image Enhancement (Pix2Pix)

Extended Pix2Pix with self-attention for chest X-ray quality improvement on NIH ChestX-ray14. Built synthetic degradation pipeline for paired training data.

PSNR 39.97 dB SSIM 0.9755
PyTorch Pix2Pix / cGAN U-Net PatchGAN Self-Attention

nlTGCR: Second-Order Optimizer

A second-order optimizer using the Fisher Information Matrix that beats Adam and RMSProp on MLPs. Applied Nyström approximation and Kronecker-factored preconditioning with JAX-style JIT compilation.

17× faster per epoch +3.2% accuracy vs Adam
PyTorch Fisher Information Matrix Nyström Approx. K-FAC JIT Compilation CIFAR-10

PEGASUS Paper Summarizer

Abstractive summarization pipeline for arXiv papers using google/pegasus-pubmed. Trained on 1,000 papers with beam search decoding. Published model on Hugging Face.

ROUGE-1: 0.377 ROUGE-2: 0.126 ROUGE-L: 0.219
PEGASUS PyTorch Lightning Hugging Face A100 / CUDA ROUGE

TasteMatch: AI Dietitian Chatbot

LLM-powered personal dietitian for diabetes management. Generates personalized meal recommendations with glycemic index verification and portion size calculations against clinical guidelines.

Ollama FastAPI LLMs RAG Diabetes Care

GNN Document Classification

Document relationship modeling on CORA using Graph Neural Networks. Compared GCN, GAT, and GraphSAGE architectures for citation prediction and clustering.

PyTorch Geometric GCN GAT GraphSAGE CORA Dataset

RAG-BioQA

Retrieval-augmented biomedical QA on PubMedQA. Dense retrieval with BioBERT + FAISS, re-ranking with BM25/ColBERT/MonoT5, and LoRA fine-tuned T5 generator.

BioBERT FAISS T5 + LoRA ColBERT MonoT5 PubMedQA

View all projects & research →

03

Publications

Harsit Upadhya, Upadhyay, A.

XNLI 2.0: Improving XNLI Dataset and Performance on Cross-Lingual Understanding

IEEE 8th I2CT Conference · 2023

View all publications →

04

Experience

Jan 2025 – Present

Graduate Research Assistant

Emory FIT Lab · Atlanta, GA

  • Built automated pipeline to extract and analyze Amazon Alexa voice interaction logs using Python/Selenium
  • Identified technology engagement patterns indicating functional decline in older adults for digital health monitoring

Jan 2026 – Present

Graduate Teaching Assistant

Emory University · Atlanta, GA

  • Instruct 40 undergraduates in Data Science 100, covering R tidyverse, data cleaning, visualization, and EDA

May 2025 – Oct 2025

VP, International Student Affairs

Graduate Student Government Association (GSGA)

  • Selected executive board member; served as primary liaison for 500+ international graduate students

Feb 2025 – Aug 2025

International Student Liaison

Emory University, Laney Graduate School

  • Mentored international graduate students through academic and cultural transitions, strengthening peer support networks and community-building initiatives
  • Collaborated with university staff to surface student challenges and improve orientation programming and campus integration

Oct 2024 – Dec 2025

Proctor

Emory University, Office of Undergraduate Education

  • Supervised 50+ exams while enforcing academic integrity protocols and supporting 300+ students during testing sessions
05

Get in Touch

Open to collaborations in AI/ML, digital health, NLP, and research-backed product work.

Or via GitHub or LinkedIn.