I am Harsh Raj, an Applied Scientist at VijilAI, where my focus is making AI agents trustworthy.
Broadly, I am interested in making language models safe, useful, and controllable. Over the past few years, I have taken my first baby steps as a researcher, owing to some wonderful people and collaborations.
Most recently, I am working with some folks from CAIS (particularly Dom) on mitigating finetuning attacks on LLMs and reward hacking as a consequence of it. Being an Applied Scientist at VijilAI, I am working on building the largest database of red teaming prompts with Leif.
Before that, I worked with Subho, Dom, and Vipul on evaluating and improving the consistency of language models.
I was fortunate to collaborate via the MLC community with Yash and Laura on quantifying the robustness transfer from pretraining to downstream tasks from the lens of computer vision.
I did my bachelor thesis with Anil S. Parihar on Vision and Language Navigation (VLN) and fortunately, we secured a top-3 position in the most popular VLN challenge R2R.
I also spent my summer internship during my undergrad at Thoucentric as a researcher with Manu where I studied tabular data and built a novel deep learning framework.
News and Timeline
2024
- September: Published my first Lesswrong blog on Interpreting the effects of Jailbreaking in LLMs.
- June: Released the preprint of our work on Reverse Preference Attack, led by Domenic.
- January: Joined VijilAI as an Applied Scientist.
- January: Our work on Vision and Language Navigation ranked 3rd on the R2R leaderboard. Team Name: MLR_Lab_DTU.
2023
- December: Presented our work on robustness transfer at NeurIPS 2023 in New Orleans.
- May: Our work on robustness transfer accepted to NeurIPS 2023. Led by Laura and mentored by Yash.
2022
- November: Our work on consistency evaluation won the Best Paper Award with a cash prize of $5000.
- April: Two papers accepted to NeurIPS Workshop 2023: one on consistency evaluation and another on evaluating the robustness of biomedical concept normalization.