🔬 Research & Technical Projects
Open Recommendation Benchmark for Reproducible Research with Hidden Tests (ORBIT)
Research Assistant — Carnegie Mellon University, Pittsburgh, PA
February 2025 – June 2025
- Designed and developed ORBIT, a unified sequential recommendation benchmark enabling fair and reproducible model comparison with hidden test evaluation.
- Built ClueWeb-Reco, a high-quality, privacy-preserving web recommendation dataset by mapping private U.S. browsing sequences to public webpages via large-scale dense semantic retrieval.
- Standardized data splits and evaluation protocols to improve cross-paper comparability in recommendation research.
Efficient Dense Retrieval with Boundary-Aware Cluster Routing (BACR)
Research Assistant — Carnegie Mellon University, Pittsburgh, PA
September 2023 – May 2024
- Proposed BACR, a cluster-routing model for fine-grained search over cluster-based Approximate Nearest Neighbor (ANN) indices.
- Improved ANN recall by 8.7% over FAISS IVFFlatIP on MSMARCO Web Search.
- Extended DiskANN with multi-node index construction for memory–SSD hybrid large-scale retrieval.
Mixture-of-Experts Neuro-Computational Network on Fusiform Face Area
Undergraduate Research Assistant — The Cottrell Lab, UC San Diego
January 2023 – June 2023
- Trained word, object, and face expert networks to study subordinate-level image classification in the Fusiform Face Area.
- Designed architectures with ResNet-18, custom CNNs, and gating layers; achieved 99.4% expert gating accuracy.
- Optimized OpenCV data pipelines, reducing epoch loading time from 1 hour to 13 minutes.
Question Answering System with Retrieval-Augmented Generation
Carnegie Mellon University
October 2023 – December 2023
- Developed a rule-based polar question detector to guide Mistral-7B in answering both binary and factual questions (team of four).
- Fine-tuned T5-base for question generation on Wikipedia datasets.
- Integrated Sentence-T5 retrieval for retrieval-augmented answer generation.
Transformer-Based User Intent Classification (Amazon MASSIVE Dataset)
University of California San Diego
December 2022
- Fine-tuned BERT-base-uncased across 60 intent classes, achieving 87.32% sentence-level accuracy (team of five).
- Applied UMAP to analyze representation clustering under supervised contrastive, cross-entropy, and contrastive learning objectives.
Image Captioning with CNN–LSTM Encoder–Decoder Architecture
University of California San Diego
November 2022
- Implemented encoder–decoder models using custom CNNs and ResNet-50 with LSTM for MSCOCO caption generation (team of three).
- Achieved 58.04% BLEU-1 and up to 86.53–100% BLEU-4 for best captions.
ZooSeeker — Android Navigation Application
University of California San Diego
March 2022 – June 2022
- Co-developed an Android application for San Diego Zoo navigation using Android Studio.
- Applied Agile methodology for iterative feature development and usability improvement.