🔬 Research & Technical Projects

Research Assistant — Carnegie Mellon University, Pittsburgh, PA
February 2025 – June 2025

Designed and developed ORBIT, a unified sequential recommendation benchmark enabling fair and reproducible model comparison with hidden test evaluation.
Built ClueWeb-Reco, a high-quality, privacy-preserving web recommendation dataset by mapping private U.S. browsing sequences to public webpages via large-scale dense semantic retrieval.
Standardized data splits and evaluation protocols to improve cross-paper comparability in recommendation research.

Research Assistant — Carnegie Mellon University, Pittsburgh, PA
September 2023 – May 2024

Proposed BACR, a cluster-routing model for fine-grained search over cluster-based Approximate Nearest Neighbor (ANN) indices.
Improved ANN recall by 8.7% over FAISS IVFFlatIP on MSMARCO Web Search.
Extended DiskANN with multi-node index construction for memory–SSD hybrid large-scale retrieval.

Undergraduate Research Assistant — The Cottrell Lab, UC San Diego
January 2023 – June 2023

Trained word, object, and face expert networks to study subordinate-level image classification in the Fusiform Face Area.
Designed architectures with ResNet-18, custom CNNs, and gating layers; achieved 99.4% expert gating accuracy.
Optimized OpenCV data pipelines, reducing epoch loading time from 1 hour to 13 minutes.

Carnegie Mellon University
October 2023 – December 2023

Developed a rule-based polar question detector to guide Mistral-7B in answering both binary and factual questions (team of four).
Fine-tuned T5-base for question generation on Wikipedia datasets.
Integrated Sentence-T5 retrieval for retrieval-augmented answer generation.

University of California San Diego
December 2022

Fine-tuned BERT-base-uncased across 60 intent classes, achieving 87.32% sentence-level accuracy (team of five).
Applied UMAP to analyze representation clustering under supervised contrastive, cross-entropy, and contrastive learning objectives.

University of California San Diego
November 2022

Implemented encoder–decoder models using custom CNNs and ResNet-50 with LSTM for MSCOCO caption generation (team of three).
Achieved 58.04% BLEU-1 and up to 86.53–100% BLEU-4 for best captions.

University of California San Diego
March 2022 – June 2022

Co-developed an Android application for San Diego Zoo navigation using Android Studio.
Applied Agile methodology for iterative feature development and usability improvement.