I'm a Researcher. an Engineer. |

Mudit was born in Varanasi, India, and has since lived in various places. He pursued his undergraduate studies in Information Technology at the Delhi College of Engineering and got his Ph.D. in Computer Science at Arizona State University advised by Dr. Subbarao Kambhampati. He is currently working as a Research Scientist at Google Gemini/Bard.

His research interests are agentic LLMs (LLM planning and reasoning) and improving RLHF/PbRL methods for sequential decision making, with a short stint in human-aware AI methods (like trust & teaming).

In his leisure time, Mudit enjoys sharing his insights on research papers through his YouTube channel, "Papers & Chill" (currently quite inactive) and likes to play chess. You can find him on lichess as "WhatsappOnly" and "pawnTakesPawnTakes". Some resources to be a better scientist, MLE and candidate : see my notes.

Education

Arizona State University

Ph.D. in Computer Science
(2019-2024)
CGPA : 4.0/4.0

Delhi Technological University

B.Tech in Information Technology
(2015-2019)
Gold Medalist
CGPA : 9.51/10.0

Ramjas School Pusa Road

Alumnus (2008-2015)
12th : 96.4%
10th : 10 CGPA

Experience


Sept. 2024 - Present
Mountain View, CA, USA

Research Scientist, Google LLC

Gemini/Bard group.

Internships


Summer 2023
Cupertino, USA

Machine Learning Research Intern, Apple Inc.

Research work with Machine Learning Research (MLR) Group. Advised by Rin Metcalf Susa and Barry Theobald. Hindsight PRIORs for Reward Learning from Human Preferences. (ICLR 2024)


Summer 2022
Cupertino, USA

Machine Learning Research Intern, Apple Inc.

Preference based Reinforcement Learning research with Machine Learning Research (MLR) group Advised by Rin Metcalf Susa and Barry Theobald. Symbol Guided Hindsight Priors for Reward Learning from Human Preferences at IROS RLCONFORM, NeurIPS HILL 2022.



Summer 2021
Santa Clara, USA

Deep Learning Software Engineering Intern, Intel Corporation

• First analysis of float32 ResNet50 architecture on Intel IceLake (ICX) machines. Advised by Wei Wang.
• Proposed Several optimizations (in parallel computing) like shared processes, to achieve BFloat16 performance (as bench- marked on CooperLake machines) on an ICX cluster.
• Additionally, first to provide the Best Known method (an automated way) for working with ResNet50 on Intel Endevour Cluster.
• Parallely, first to work with Quantized ResNet Models to show discrepancy in Saliency Based explanations between original RN50 and Quantized RN50.

Summer 2018
Bangalore, India 

Software Engineering Intern, Samsung Semiconductor India Research

• Created DRAM Bank Simulator, (400 times faster) with enhanced Fault Classes. Advised by Atishay Kumar.
• Novel Approach to Redundancy Analysis Algorithms through State Space Reduction schemes &  Beating RA through Monte Carlo Tree Search and Residual Networks.
• Awarded Best Intern Project at SSIR.

Summer 2017
Bangalore, India 

Software Engineering Intern, Samsung Semiconductor India Research

• Diagnosed issues with SSDs & Implemented SSD Simulator for Read/Write/Garbage Collection. Advised by Sandeep Sammatshetti.
• Created an LSTM based Algorithm - Stream Selection for Smart Data Categorization (STRASDAC) to reduce write-wearing in SSDs and in turn further improve Garbage Collection.
• Reached Best Intern Project Finals at SSIR.

Research

2024

  • Guidance Priors to Reduce Human Feedback Burden in Sequential Decision Making

    Mudit Verma

    PhD Defense
    Committee : Dr. Subbarao Kambhampati (Chair/Advisor), Dr. Dimitri Bertsekas, Dr. Siddharth Srivastava, Dr. Yu Zhang

    Video
  • Hindsight PRIORs for Reward Learning from Human Preferences

    Mudit Verma, Katherine Metcalf

    ICLR 2024

    PaperPoster
  • Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?

    Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati

    HRI 2024

    Invited Talk : AGI Leap Summit 2024

    Paper
  • LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

    Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Vermab>, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

    ICML 2024

    Paper
  • On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models

    Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati

    Preprint, 2024

    Paper
  • Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

    Mudit Verma*, Atharva Gundawar*, Lin Guan, Karthik Valmeekam, Siddhant Bhambri, Subbarao Kambhampati

    Preprint, 2024

    Paper

2023

  • Trust-Aware Planning: Modeling Trust Evolution in Iterated Human-Robot Interaction.

    Zahra Zadehi, Mudit Verma, Sreedharan, Subbarao Kambhampati

    Human Robot Interaction (HRI)

    PaperPoster
  • Methods and Mechanisms for Interactive Novelty Handling in Adversarial Environments.

    Tung Thai, Mudit Verma, Utkarsh Soni, Gopalakrishnan S., Shen M., Garg M., Kalani A.,, Vaidya N., Kambhampati S., Varshney N., Baral C., Sinapov J., Scheutz M.

    AAMAS Extended Abstract

    Paper
  • Preference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks

    Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati

    Theory of Mind Workshop, Many Facets of Preference Learning (Oral) Workshop at ICML 2023.

    PaperPoster
  • Exploiting Action Distances for Reward Learning from Human Preferences.

    Mudit Verma*, Siddhant Bhambri*, Subbarao Kambhampati

    In Many Facets of Preference Learning Workshop at ICML 2023.

    PaperSlides
  • Data Driven Reward Initialization for Preference based Reinforcement Learning

    Mudit Verma, Subbarao Kambhampati

    In AAAI R2HCAI 2023.

    PaperSlides
  • Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning.

    Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati

    In AAAI R2HCAI 2023.

    PaperSlides

2022

  • Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

    Mudit Verma and Katherine Metcalf

    NeurIPS HILL 2022

    IROS RLCONFORM 2022

    Paper
  • Advice Conformance Verification by Reinforcement Learning agents for Human-in-the-Loop

    Mudit Verma, Ayush Kharkwal, Subbarao Kambhampati

    NeurIPS HILL 2022

    IROS RLCONFORM 2022

    Paper Video
  • Towards customizable reinforcement learning agents: Enabling preference specification through online vocabulary expansion

    Utkarsh Soni, Sarath Sreedharan, Mudit Verma, Subbarao Kambhampati

    NeurIPS HILL 2022

    Paper
  • Computing Policies That Account for the Effects of Human Uncertainty During Execution in Markov Decision Processes

    Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati

    ICAPS Workshop on Explainable AI Planning (XAIP) 2022

    Paper
  • Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Inscrutable Representations

    Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava and Subbarao Kambhampati

    ICLR 2022

    Paper Video
  • Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems.

    Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, Lin Guan

    In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) Blue Sky Track

    Paper
  • Modeling the interplay between human trust and monitoring

    Zahra Zahedi, Sarath Sreedharan, Mudit Verma and Subbarao Kambhampati

    HRI 2022 (Late breaking paper)

    Paper

2021

  • Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation.

    Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati

    In Advances in Neural Information Processing Systems. (NeurIPS) Spotlight

    Paper
  • Trust-Aware Planning: Modeling Trust Evolution in Longitudinal Human-Robot Interaction

    Zahra Zadehi, Mudit Verma, Sarath Sreedharan, Subbarao Kambhampati

    In ICAPS 2021 Workshop on Explainable AI Planning, Also in ICAPS 2021 Workshop on Planning and Robotics

    Paper
  • Synthesizing Policies That Account For Human Execution Er- rors Caused By State Aliasing In Markov Decision Processes

    Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati

    In ICAPS 2021 Workshop on Explainable AI Planning.

    Paper

2020

  • Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Black Box Simulators

    Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava, Subbarao Kambhampati

    ICML Workshop on Human in the Loop Learning (HILL)

    PaperPoster
  • Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning

    Lin Guan*, Mudit Verma*, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati

    NeurIPS Deep Reinforcement Learning Workshop (DRL)
    NeurIPS Workshop on Human And Model in the Loop Evaluation and Training Strategies (HAMLETS)

    Paper
  • Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning

    Lin Guan*, Mudit Verma*, Subbarao Kambhampati

    ICML Workshop on Human in the Loop Learning (HILL)

    PaperPoster
  • Fine-grained Language Identification with MultilingualCapsNetModel.

    Mudit Verma, Arun Balaji Buduru

    IEEE International Conference on Multimedia Big Data (BigMM)

    PaperSlides

2019

  • A Novel Framework for Neural Architecture Search in the Hill Climbing Domain.

    Mudit Verma, Pradyumna Sinha, Karan Goyal, Apoorva Verma, Seba Susan

    IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)

    Paper
  • Making Smart Homes Smarter: Optimizing Energy Consumption with Human in the Loop

    Mudit Verma, Siddhant Bhambri, Saurabh Gupta, Arun Balaji Buduru

    Paper

Teaching & Service

  • Teaching Assistant, CSE 471 - Introduction To Artificial Intelligence, ASU (Fall 2019) by Dr. Subbarao Kambhampati

  • Reviewer, ICAPS XAIP-2022, ICAPS XAIP-2021

  • PC Member, ICLR- 2024, 2023

  • PC Member/Reviewer, ICML- 2024, 2023, 2022

  • PC Member, IJCAI-2024

  • PC Member/Reviewer, NeurIPS-2023, NeurIPS-2022, Neurips GenPlan 2023

  • PC Member/Reviewer, AAAI-2023, AAAI-2022

Awards

  • ASU SCAI Doctoral Fellowship, ASU 2024

  • ASU SCAI Doctoral Fellowship, ASU 2023

  • Engineering Graduate Fellowship, ASU 2022

  • ASU University Graduate/ Doctoral Fellowship, ASU 2019

  • DTU Merit Department Rank Scholarship BTech, DTU, 2019, 2018, 2017, 2016

  • 4th , Hack In The North (IIIT Allahabad), 2018

  • Selected for Education Innovation Mentorship Programme , ReadAlliance, 2018

  • Department Topper for 6 consecutive semesters, DTU, 2018

  • 1st READing Hackathon (USAID), 2017

  • Pramod Jain Scholarship , best student at DTU, 2017

  • Top 15 , World Food India Hackathon, 2017

  • Award for Exemplary Contribution , Computer Society of India-DTU Chapter, 2017

  • Interest Development Group Head , CSI-DTU Chapter, 2017

  • 46th Rank at HackerEarth MLChallenge-1, 2017

  • Top 10 Synergy DTU-Hack, DTU, 2017

Projects & Other Stuff

  • Technical Report, Perfect Observability is a Myth: Restraining Bolts in the RealWorld. Spring 2021Paper

    Technical Report, Implementation and Analysis of Recommender Systems. Spring 2021Paper

    Technical Report, Diverging Emerging Field of Multi-Task Reinforcement Learning • Colors of Desert Used D3 to highlight deserts are indeed colorful. Spring 2020Paper

    Technical Report, Colors of the Desert. Spring 2020Paper

    Technical Report, Randomly Wired Networks are on the rise, have we been creating wrong Networks all along? Fall 2019PaperSlidesCode

    Shut The Fake Up  App/Website Wisdom of Majority & AI for Fake News detection. 

    Text Summarization Human like summarization using Pointer Generator Networks

    StressOut App  to check one’s stress levels and suggest better work timings to bring relief through Machine Learning.

    CookHub  Open Source Community for Recipes where one can chat, push, pull, fork, collaborate & view trending recipes and contributors.

    Tutoring All Children (TAC)  App that adapts and teaches children/adults (specially dyslexic) to read/write/recognize using ML Techniques.