Mudit was born in Varanasi, India, and has since lived in various places. He pursued his undergraduate studies in Information Technology at the Delhi College of
Engineering and got his Ph.D. in Computer Science at Arizona State University advised by Dr. Subbarao Kambhampati. He is currently working as a Research Scientist at Google Gemini/Bard.
His research interests are agentic LLMs (LLM planning and reasoning) and improving RLHF/PbRL methods for sequential decision making, with a short stint in human-aware AI methods (like trust & teaming).
In his leisure time, Mudit enjoys sharing his insights on research papers through his YouTube channel, "Papers & Chill" (currently quite inactive) and likes to play chess. You can find him on lichess as "WhatsappOnly" and "pawnTakesPawnTakes".
Some resources to be a better scientist, MLE and candidate : see my notes.
Sept. 2024 - Present
Mountain View, CA, USA
Research Scientist, Google LLC
Gemini/Bard group.
Summer 2023
Cupertino, USA
Machine Learning Research Intern, Apple Inc.
Research work with Machine Learning Research (MLR) Group. Advised by Rin Metcalf Susa and Barry Theobald. Hindsight PRIORs for Reward Learning from Human Preferences. (ICLR 2024)
Summer 2022
Cupertino, USA
Machine Learning Research Intern, Apple Inc.
Preference based Reinforcement Learning research with Machine Learning Research (MLR) group Advised by Rin Metcalf Susa and Barry Theobald. Symbol Guided Hindsight Priors for Reward Learning from Human Preferences at IROS RLCONFORM, NeurIPS HILL 2022.
Summer 2021
Santa Clara, USA
Deep Learning Software Engineering Intern, Intel Corporation
• First analysis of float32 ResNet50 architecture on Intel IceLake (ICX)
machines. Advised by Wei Wang.
• Proposed Several optimizations (in parallel computing) like shared processes, to achieve
BFloat16 performance (as bench- marked on CooperLake machines) on an ICX cluster.
• Additionally, first to provide the Best Known method (an automated way) for working with
ResNet50 on Intel Endevour Cluster.
• Parallely, first to work with Quantized ResNet Models to show discrepancy in Saliency Based
explanations between original RN50 and Quantized RN50.
Summer 2018
Bangalore, India
Software Engineering Intern, Samsung Semiconductor India Research
• Created DRAM Bank Simulator, (400 times faster) with enhanced Fault
Classes. Advised by Atishay Kumar.
• Novel Approach to Redundancy Analysis Algorithms through State Space Reduction schemes
& Beating RA through Monte Carlo Tree Search and Residual Networks.
• Awarded
Best Intern Project at SSIR.
Summer 2017
Bangalore, India
Software Engineering Intern, Samsung Semiconductor India Research
• Diagnosed issues with SSDs & Implemented SSD Simulator for
Read/Write/Garbage Collection. Advised by Sandeep Sammatshetti.
• Created an LSTM based Algorithm - Stream Selection for Smart
Data Categorization (STRASDAC) to reduce write-wearing in SSDs and in turn further improve Garbage
Collection.
• Reached Best Intern Project Finals at SSIR.
2024
Guidance Priors to Reduce Human Feedback Burden in Sequential Decision Making
Mudit Verma
PhD Defense
Committee : Dr. Subbarao Kambhampati (Chair/Advisor), Dr. Dimitri Bertsekas, Dr. Siddharth Srivastava, Dr. Yu Zhang
Hindsight PRIORs for Reward Learning from Human Preferences
Mudit Verma, Katherine Metcalf
ICLR 2024
PaperPosterTheory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?
Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati
HRI 2024
Invited Talk : AGI Leap Summit 2024
PaperLLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Vermab>, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy
ICML 2024
PaperOn the Brittle Foundations of ReAct Prompting for Agentic Large Language Models
Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati
Preprint, 2024
PaperRobust Planning with LLM-Modulo Framework: Case Study in Travel Planning
Mudit Verma*, Atharva Gundawar*, Lin Guan, Karthik Valmeekam, Siddhant Bhambri, Subbarao Kambhampati
Preprint, 2024
Paper2023
Trust-Aware Planning: Modeling Trust Evolution in Iterated Human-Robot Interaction.
Zahra Zadehi, Mudit Verma, Sreedharan, Subbarao Kambhampati
Human Robot Interaction (HRI)
PaperPosterMethods and Mechanisms for Interactive Novelty Handling in Adversarial Environments.
Tung Thai, Mudit Verma, Utkarsh Soni, Gopalakrishnan S., Shen M., Garg M., Kalani A.,, Vaidya N., Kambhampati S., Varshney N., Baral C., Sinapov J., Scheutz M.
AAMAS Extended Abstract
PaperPreference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks
Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati
Theory of Mind Workshop, Many Facets of Preference Learning (Oral) Workshop at ICML 2023.
PaperPosterExploiting Action Distances for Reward Learning from Human Preferences.
Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati
In Many Facets of Preference Learning Workshop at ICML 2023.
PaperSlidesData Driven Reward Initialization for Preference based Reinforcement Learning
Mudit Verma, Subbarao Kambhampati
In AAAI R2HCAI 2023.
PaperSlidesExploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning.
Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati
In AAAI R2HCAI 2023.
PaperSlides2022
Symbol Guided Hindsight Priors for Reward Learning from Human Preferences
Mudit Verma and Katherine Metcalf
NeurIPS HILL 2022
IROS RLCONFORM 2022
PaperAdvice Conformance Verification by Reinforcement Learning agents for Human-in-the-Loop
Mudit Verma, Ayush Kharkwal, Subbarao Kambhampati
NeurIPS HILL 2022
IROS RLCONFORM 2022
Paper VideoTowards customizable reinforcement learning agents: Enabling preference specification through online vocabulary expansion
Utkarsh Soni, Sarath Sreedharan, Mudit Verma, Subbarao Kambhampati
NeurIPS HILL 2022
PaperComputing Policies That Account for the Effects of Human Uncertainty During Execution in Markov Decision Processes
Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati
ICAPS Workshop on Explainable AI Planning (XAIP) 2022
PaperBridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Inscrutable Representations
Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava and Subbarao Kambhampati
ICLR 2022
Paper VideoSymbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems.
Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, Lin Guan
In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) Blue Sky Track
PaperModeling the interplay between human trust and monitoring
Zahra Zahedi, Sarath Sreedharan, Mudit Verma and Subbarao Kambhampati
HRI 2022 (Late breaking paper)
Paper2021
Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation.
Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati
In Advances in Neural Information Processing Systems. (NeurIPS) Spotlight
PaperTrust-Aware Planning: Modeling Trust Evolution in Longitudinal Human-Robot Interaction
Zahra Zadehi, Mudit Verma, Sarath Sreedharan, Subbarao Kambhampati
In ICAPS 2021 Workshop on Explainable AI Planning, Also in ICAPS 2021 Workshop on Planning and Robotics
PaperSynthesizing Policies That Account For Human Execution Er- rors Caused By State Aliasing In Markov Decision Processes
Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati
In ICAPS 2021 Workshop on Explainable AI Planning.
Paper2020
Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Black Box Simulators
Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava, Subbarao Kambhampati
ICML Workshop on Human in the Loop Learning (HILL)
PaperPosterExplanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning
Lin Guan*, Mudit Verma*, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati
NeurIPS Deep Reinforcement Learning Workshop (DRL)
NeurIPS Workshop on Human And
Model in the Loop Evaluation and Training Strategies (HAMLETS)
Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning
Lin Guan*, Mudit Verma*, Subbarao Kambhampati
ICML Workshop on Human in the Loop Learning (HILL)
PaperPosterFine-grained Language Identification with MultilingualCapsNetModel.
Mudit Verma, Arun Balaji Buduru
IEEE International Conference on Multimedia Big Data (BigMM)
PaperSlides2019
A Novel Framework for Neural Architecture Search in the Hill Climbing Domain.
Mudit Verma, Pradyumna Sinha, Karan Goyal, Apoorva Verma, Seba Susan
IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)
PaperMaking Smart Homes Smarter: Optimizing Energy Consumption with Human in the Loop
Mudit Verma, Siddhant Bhambri, Saurabh Gupta, Arun Balaji Buduru
PaperTeaching Assistant, CSE 471 - Introduction To Artificial Intelligence, ASU (Fall 2019) by Dr. Subbarao Kambhampati
Reviewer, ICAPS XAIP-2022, ICAPS XAIP-2021
PC Member, ICLR- 2024, 2023
PC Member/Reviewer, ICML- 2024, 2023, 2022
PC Member, IJCAI-2024
PC Member/Reviewer, NeurIPS-2023, NeurIPS-2022, Neurips GenPlan 2023
PC Member/Reviewer, AAAI-2023, AAAI-2022
ASU SCAI Doctoral Fellowship, ASU 2024
ASU SCAI Doctoral Fellowship, ASU 2023
Engineering Graduate Fellowship, ASU 2022
ASU University Graduate/ Doctoral Fellowship, ASU 2019
DTU Merit Department Rank Scholarship BTech, DTU, 2019, 2018, 2017, 2016
4th , Hack In The North (IIIT Allahabad), 2018
Selected for Education Innovation Mentorship Programme , ReadAlliance, 2018
Department Topper for 6 consecutive semesters, DTU, 2018
1st READing Hackathon (USAID), 2017
Pramod Jain Scholarship , best student at DTU, 2017
Top 15 , World Food India Hackathon, 2017
Award for Exemplary Contribution , Computer Society of India-DTU Chapter, 2017
Interest Development Group Head , CSI-DTU Chapter, 2017
46th Rank at HackerEarth MLChallenge-1, 2017
Top 10 Synergy DTU-Hack, DTU, 2017
Technical Report, Perfect Observability is a Myth: Restraining Bolts in the RealWorld. Spring 2021Paper
Technical Report, Implementation and Analysis of Recommender Systems. Spring 2021Paper
Technical Report, Diverging Emerging Field of Multi-Task Reinforcement Learning • Colors of Desert Used D3 to highlight deserts are indeed colorful. Spring 2020Paper
Technical Report, Colors of the Desert. Spring 2020Paper
Technical Report, Randomly Wired Networks are on the rise, have we been creating wrong Networks all along? Fall 2019PaperSlidesCode
Shut The Fake Up App/Website Wisdom of Majority & AI for Fake News detection.
Text Summarization Human like summarization using Pointer Generator Networks
StressOut App to check one’s stress levels and suggest better work timings to bring relief through Machine Learning.
CookHub Open Source Community for Recipes where one can chat, push, pull, fork, collaborate & view trending recipes and contributors.
Tutoring All Children (TAC) App that adapts and teaches children/adults (specially dyslexic) to read/write/recognize using ML Techniques.