Harsh Raj

I am Harsh Raj, a M.S. student in Computer Science at Northeastern University, Boston, and currently a Machine Learning Research Intern at Scale AI in NYC.

My research centers on language agents, language model evaluation, and software engineering — building systems that are not just capable, but reliable and reproducible. Over the past few years I have been fortunate to grow as a researcher with the support of generous collaborators and mentors.

Most recently, I led the MixtureVitae project, developing state-of-the-art pretraining corpora for LLMs. Our work produced the first permissible dataset achieving results comparable to or even surpassing non-permissible sources (paper).

In addition, I am the core author of Terminal-Bench and Harbor and collaborate with Professor Ludwig Schmidt’s lab.

News and Timeline

2026

2025

2024

2023

2022