i_am_a_fox

I am Harsh Raj, currently pursuing a Master of Science in Computer Science at Northeastern University, Boston. Additionally, I am the Co-Founder of the open-source organization Ontocord AI and a Researcher at AI Risk and Vulnerability Alliance (ARVA). Previously, I worked as an Applied Scientist at VijilAI, where I focused on making AI agents trustworthy.

I am passionate about making language models safe, useful, and controllable. Over the past few years, I have taken my first steps as a researcher, thanks to some wonderful collaborators and mentors.

I am currently working with David Bau on understanding reasoning models through mechanistic interpretability. Most recently, I co-led the Preventing Adversarial Reward Optimization project (in collaboration with Dom) at the AI Safety Camp. As an Applied Scientist at VijilAI, I worked alongside Leif to develop a database of red-teaming prompts.

Previously, I collaborated with Subho, Dom, and Vipul on evaluating and improving the consistency of language models.

Through the MLC community, I was fortunate to work with Yash and Laura on quantifying the robustness transfer from pretraining to downstream tasks.

News and Timeline

2024

2023

2022