Harsh Raj

i_am_a_fox

I am Harsh Raj, currently pursuing a Master of Science in Computer Science at Northeastern University, Boston. I am also the Co-Founder of the open-source organization Ontocord AI and a Researcher at AI Risk and Vulnerability Alliance (ARVA). Previously, I worked as an Applied Scientist at VijilAI, where I primarily worked on building their LLM evaluation and guardrails service.

I am passionate about making language models safe, useful, and controllable. Over the past few years, I have taken my first steps as a researcher, thanks to some wonderful collaborators and mentors.

I am currently working with David Bau and Robert West on interpretability of reasoning models. Most recently, I led the Preventing Adversarial Reward Optimization project at the AI Safety Camp. As an Applied Scientist at VijilAI, I worked alongside Leif to develop a database of red-teaming prompts.

Previously, I collaborated with Subho, Dom, and Vipul on evaluating and improving the consistency of language models.

Through the MLC community, I was fortunate to work with Yash and Laura on quantifying the robustness transfer from pretraining to downstream tasks.

News and Timeline

2025

May: Contributed to the project terminal-bench.
April: Started working as a Research Assistant at the Interpretable Neural Networks Lab under David Bau.
April: Published our first blog post announcing MixtureVitae, the most permissively licensed dataset released to date, as part of the Aurora-M2 project.
January: Our work on Improving Consistency in Large Language Models through Chain of Guidance was accepted at Transactions of Machine Learning Research (TMLR).

2024

December: Our work on Mitigating Unsafe Feedback with Learning Constraints got accepted for poster presentation at AAAI-25 Workshop on Artificial Intelligence for Cyber Security.
September: Published my first Lesswrong blog on Interpreting the effects of Jailbreaking in LLMs.
June: Released the preprint of our work on Reverse Preference Attack, led by Domenic.
January: Joined VijilAI as an Applied Scientist.
January: Our work on Vision and Language Navigation ranked 3rd on the R2R leaderboard. Team Name: MLR_Lab_DTU.

2023

December: Presented our work on robustness transfer at NeurIPS 2023 in New Orleans.
May: Our work on robustness transfer accepted to NeurIPS 2023. Led by Laura and mentored by Yash.

2022

November: Our work on consistency evaluation won the Best Paper Award with a cash prize of $5000.
April: Two papers accepted to NeurIPS Workshop 2023: one on consistency evaluation and another on evaluating the robustness of biomedical concept normalization.