I'm a Researcher. an Engineer. |

Mudit was born in Varanasi, India, and has since lived in various places. He pursued his undergraduate studies in Information Technology at the Delhi College of Engineering and got his Ph.D. in Computer Science at Arizona State University advised by Dr. Subbarao Kambhampati. He is currently working as a Research Scientist at Google Gemini/Bard.

His research interests are agentic LLMs (LLM planning and reasoning) and improving RLHF/PbRL methods for sequential decision making, with a short stint in human-aware AI methods (like trust & teaming).

In his leisure time, Mudit enjoys sharing his insights on research papers through his YouTube channel, "Papers & Chill" (currently quite inactive) and likes to play chess. You can find him on lichess as "WhatsappOnly" and "pawnTakesPawnTakes". Some resources to be a better scientist, MLE and candidate : see my notes.

Education

Arizona State University

Ph.D. in Computer Science
(2019-2024)
CGPA : 4.0/4.0

Delhi Technological University

B.Tech in Information Technology
(2015-2019)
Gold Medalist
CGPA : 9.51/10.0

Ramjas School Pusa Road

Alumnus (2008-2015)
12th : 96.4%
10th : 10 CGPA

Experience

Sept. 2024 - Present
Mountain View, CA, USA

Research Scientist, Google LLC

Gemini/Bard group.

Internships

Summer 2023
Cupertino, USA

Machine Learning Research Intern, Apple Inc.

Research work with Machine Learning Research (MLR) Group. Advised by Rin Metcalf Susa and Barry Theobald. Hindsight PRIORs for Reward Learning from Human Preferences. (ICLR 2024)

Summer 2022
Cupertino, USA

Machine Learning Research Intern, Apple Inc.

Preference based Reinforcement Learning research with Machine Learning Research (MLR) group Advised by Rin Metcalf Susa and Barry Theobald. Symbol Guided Hindsight Priors for Reward Learning from Human Preferences at IROS RLCONFORM, NeurIPS HILL 2022.

Summer 2021
Santa Clara, USA

Deep Learning Software Engineering Intern, Intel Corporation

• First analysis of float32 ResNet50 architecture on Intel IceLake (ICX) machines. Advised by Wei Wang.
• Proposed Several optimizations (in parallel computing) like shared processes, to achieve BFloat16 performance (as bench- marked on CooperLake machines) on an ICX cluster.
• Additionally, first to provide the Best Known method (an automated way) for working with ResNet50 on Intel Endevour Cluster.
• Parallely, first to work with Quantized ResNet Models to show discrepancy in Saliency Based explanations between original RN50 and Quantized RN50.

Summer 2018
Bangalore, India

Software Engineering Intern, Samsung Semiconductor India Research

• Created DRAM Bank Simulator, (400 times faster) with enhanced Fault Classes. Advised by Atishay Kumar.
• Novel Approach to Redundancy Analysis Algorithms through State Space Reduction schemes & Beating RA through Monte Carlo Tree Search and Residual Networks.
• Awarded Best Intern Project at SSIR.

Summer 2017
Bangalore, India

Software Engineering Intern, Samsung Semiconductor India Research

• Diagnosed issues with SSDs & Implemented SSD Simulator for Read/Write/Garbage Collection. Advised by Sandeep Sammatshetti.
• Created an LSTM based Algorithm - Stream Selection for Smart Data Categorization (STRASDAC) to reduce write-wearing in SSDs and in turn further improve Garbage Collection.
• Reached Best Intern Project Finals at SSIR.

Research

2024

Guidance Priors to Reduce Human Feedback Burden in Sequential Decision Making

Mudit Verma

PhD Defense
Committee : Dr. Subbarao Kambhampati (Chair/Advisor), Dr. Dimitri Bertsekas, Dr. Siddharth Srivastava, Dr. Yu Zhang
Video

Hindsight PRIORs for Reward Learning from Human Preferences

Mudit Verma, Katherine Metcalf

ICLR 2024
Paper Poster

Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?

Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati

HRI 2024

Invited Talk : AGI Leap Summit 2024
Paper

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Vermab>, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

ICML 2024
Paper

On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models

Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati

Preprint, 2024
Paper

Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

Mudit Verma*, Atharva Gundawar*, Lin Guan, Karthik Valmeekam, Siddhant Bhambri, Subbarao Kambhampati

Preprint, 2024
Paper

2023

Trust-Aware Planning: Modeling Trust Evolution in Iterated Human-Robot Interaction.

Zahra Zadehi, Mudit Verma, Sreedharan, Subbarao Kambhampati

Human Robot Interaction (HRI)
Paper Poster

Methods and Mechanisms for Interactive Novelty Handling in Adversarial Environments.

Tung Thai, Mudit Verma, Utkarsh Soni, Gopalakrishnan S., Shen M., Garg M., Kalani A.,, Vaidya N., Kambhampati S., Varshney N., Baral C., Sinapov J., Scheutz M.

AAMAS Extended Abstract
Paper

Preference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks

Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati

Theory of Mind Workshop, Many Facets of Preference Learning (Oral) Workshop at ICML 2023.
Paper Poster

Exploiting Action Distances for Reward Learning from Human Preferences.

Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati

In Many Facets of Preference Learning Workshop at ICML 2023.
Paper Slides

Data Driven Reward Initialization for Preference based Reinforcement Learning

Mudit Verma, Subbarao Kambhampati

In AAAI R2HCAI 2023.
Paper Slides

Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning.

Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati

In AAAI R2HCAI 2023.
Paper Slides

2022

Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

Mudit Verma and Katherine Metcalf

NeurIPS HILL 2022

IROS RLCONFORM 2022
Paper

Advice Conformance Verification by Reinforcement Learning agents for Human-in-the-Loop

Mudit Verma, Ayush Kharkwal, Subbarao Kambhampati

NeurIPS HILL 2022

IROS RLCONFORM 2022
Paper Video

Towards customizable reinforcement learning agents: Enabling preference specification through online vocabulary expansion

Utkarsh Soni, Sarath Sreedharan, Mudit Verma, Subbarao Kambhampati

NeurIPS HILL 2022
Paper

Computing Policies That Account for the Effects of Human Uncertainty During Execution in Markov Decision Processes

Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati

ICAPS Workshop on Explainable AI Planning (XAIP) 2022
Paper

Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Inscrutable Representations

Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava and Subbarao Kambhampati

ICLR 2022
Paper Video

Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems.

Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, Lin Guan

In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) Blue Sky Track
Paper

Modeling the interplay between human trust and monitoring

Zahra Zahedi, Sarath Sreedharan, Mudit Verma and Subbarao Kambhampati

HRI 2022 (Late breaking paper)
Paper

2021

Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation.

Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati

In Advances in Neural Information Processing Systems. (NeurIPS) Spotlight
Paper

Trust-Aware Planning: Modeling Trust Evolution in Longitudinal Human-Robot Interaction

Zahra Zadehi, Mudit Verma, Sarath Sreedharan, Subbarao Kambhampati

In ICAPS 2021 Workshop on Explainable AI Planning, Also in ICAPS 2021 Workshop on Planning and Robotics
Paper

Synthesizing Policies That Account For Human Execution Er- rors Caused By State Aliasing In Markov Decision Processes

Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati

In ICAPS 2021 Workshop on Explainable AI Planning.
Paper

2020

Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Black Box Simulators

Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava, Subbarao Kambhampati

ICML Workshop on Human in the Loop Learning (HILL)
Paper Poster

Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning

Lin Guan*, Mudit Verma*, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati

NeurIPS Deep Reinforcement Learning Workshop (DRL)
NeurIPS Workshop on Human And Model in the Loop Evaluation and Training Strategies (HAMLETS)
Paper

Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning

Lin Guan*, Mudit Verma*, Subbarao Kambhampati

ICML Workshop on Human in the Loop Learning (HILL)
Paper Poster

Fine-grained Language Identification with MultilingualCapsNetModel.

Mudit Verma, Arun Balaji Buduru

IEEE International Conference on Multimedia Big Data (BigMM)
Paper Slides

2019

A Novel Framework for Neural Architecture Search in the Hill Climbing Domain.

Mudit Verma, Pradyumna Sinha, Karan Goyal, Apoorva Verma, Seba Susan

IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)
Paper

Making Smart Homes Smarter: Optimizing Energy Consumption with Human in the Loop

Mudit Verma, Siddhant Bhambri, Saurabh Gupta, Arun Balaji Buduru
Paper

Teaching & Service

Teaching Assistant, CSE 471 - Introduction To Artificial Intelligence, ASU (Fall 2019) by Dr. Subbarao Kambhampati

Reviewer, ICAPS XAIP-2022, ICAPS XAIP-2021

PC Member, ICLR- 2024, 2023

PC Member/Reviewer, ICML- 2024, 2023, 2022

PC Member, IJCAI-2024

PC Member/Reviewer, NeurIPS-2023, NeurIPS-2022, Neurips GenPlan 2023

PC Member/Reviewer, AAAI-2023, AAAI-2022

Awards

ASU SCAI Doctoral Fellowship, ASU 2024

ASU SCAI Doctoral Fellowship, ASU 2023

Engineering Graduate Fellowship, ASU 2022

ASU University Graduate/ Doctoral Fellowship, ASU 2019

DTU Merit Department Rank Scholarship BTech, DTU, 2019, 2018, 2017, 2016

4th , Hack In The North (IIIT Allahabad), 2018

Selected for Education Innovation Mentorship Programme , ReadAlliance, 2018

Department Topper for 6 consecutive semesters, DTU, 2018

1st READing Hackathon (USAID), 2017

Pramod Jain Scholarship , best student at DTU, 2017

Top 15 , World Food India Hackathon, 2017

Award for Exemplary Contribution , Computer Society of India-DTU Chapter, 2017

Interest Development Group Head , CSI-DTU Chapter, 2017

46th Rank at HackerEarth MLChallenge-1, 2017

Top 10 Synergy DTU-Hack, DTU, 2017

Projects & Other Stuff

Technical Report, Perfect Observability is a Myth: Restraining Bolts in the RealWorld. Spring 2021Paper

Technical Report, Implementation and Analysis of Recommender Systems. Spring 2021Paper

Technical Report, Diverging Emerging Field of Multi-Task Reinforcement Learning • Colors of Desert Used D3 to highlight deserts are indeed colorful. Spring 2020Paper

Technical Report, Colors of the Desert. Spring 2020Paper

Technical Report, Randomly Wired Networks are on the rise, have we been creating wrong Networks all along? Fall 2019Paper Slides Code

Shut The Fake Up  App/Website Wisdom of Majority & AI for Fake News detection.

Text Summarization Human like summarization using Pointer Generator Networks

StressOut App to check one’s stress levels and suggest better work timings to bring relief through Machine Learning.

CookHub  Open Source Community for Recipes where one can chat, push, pull, fork, collaborate & view trending recipes and contributors.

Tutoring All Children (TAC)  App that adapts and teaches children/adults (specially dyslexic) to read/write/recognize using ML Techniques.