Pranay Gupta

I am an incoming PhD student at the Robotics Institute at Carengie Mellon University. Previously, I was an MSR student at the Robotics Institute at Carnegie Mellon University, co-advised by Prof. Henny Admoni and Prof. David Held Before that I was a PreDoc Apprentice at TCS Research, working under Ms. Ramya Hebbalaguppe and Dr. Rahul Narain. I did my undergraduate studies in computer science from IIIT Hyderabad, where I worked as a research assistant at CVIT under the guidance of Prof. S Ravi Kiran .

At CMU, I have been working on developing an intelligent driver alert system, that alerts a driver to important objects in the scene that the driver is unaware of. At TCS, I worked on the problem of 3-D single view reconstruction. At CVIT, I have worked on problems related to skeleton based action recognition and zero shot and generalised zero shot skeleton action recognition. I spent the summer of 2020 working as an Applied Scientist intern at Amazon India, where I worked on semantic text similarity using Bert based siamese networks. I have also worked under the supervision of Dr. Manish Gupta on Knowledge aware video question answering.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo

I am currently working on developing methods for real-time Situational Awareness (SA) estimation of a driver using their eye-gaze. Prior to this I worked on the problem of estimating an object's importance for making a safe driving decision. My research interests revolve around human robot interaction and human nonverbal behaviour understanding. I am also interested in exploring multimodal learning.

An Interactive Protocol to Measure a Driver’s Situational Awareness
Abhijat Biswas*, Pranay Gupta*, David Held Henny Admoni,
7th International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions (VAM-HRI)

Commonly used protocols for capturing the ground-truth situational awareness (SA) of drivers involve halting a simulation and querying the driver. SA data collected in this way is unsuitable for training models for predicting real-time SA since it is inherently intermittent and does not capture transitions of SA (e.g. from not aware to aware). We introduce an efficient VR based interactive protocol designed to capture a driver's ground-truth situational awareness (SA) in real time. Our protocol mitigates the aforementioned limitations of prior approaches, and allows capturing continuous object-level SA labels that are more suitable for downstream real-time SA prediction tasks. Our initial findings highlight its potential as a scalable solution for curating large scale driving datasets with ground-truth SA.

Leveraging Vision and Language Models for Zero-Shot, Personalization of Household Multi-Object Rearrangement Tasks
Benjamin A. Newman, Pranay Gupta, Yonatan Bisk, Kris Kitani, Henny Admoni, Chris Paxton,
HRI 24' Workshop on Human – Large Language Model Interaction

Robots should adhere to personal preferences when performing household tasks. Many household tasks can be posed as multi-object rearrangement tasks, but solutions to these problems often target a single, hand defined solution or are trained to match a solution drawn from a distribution of human demonstrated data. In this work, we consider using an internet-scale pre-trained vision-andlanguage foundation model as the backbone of a robot policy for producing personalized task plans to solve household multi-object rearrangement tasks. We present initial results on a one-step table setting task that shows a proof-of-concept for this method.

Object Importance Estimation using Counterfactual Reasoning for Intelligent Driving
Pranay Gupta, Abhijat Biswas, Henny Admoni, David Held
Project Page / Code & Dataset / arXiv

The ability to identify important objects in a complex and dynamic driving environment is essential for autonomous driving agents to make safe and efficient driving decisions. It also helps assistive driving systems decide when to alert drivers. We tackle object importance estimation in a data-driven fashion and introduce HOIST - Human-annotated Object Importance in Simulated Traffic. HOIST contains driving scenarios with human-annotated importance labels for vehicles and pedestrians. We additionally propose a novel approach that relies on counterfactual reasoning to estimate an object's importance. We generate counterfactual scenarios by modifying the motion of objects and ascribe importance based on how the modifications affect the ego vehicle's driving. Our approach outperforms strong baselines for the task of object importance estimation on HOIST. We also perform ablation studies to justify our design choices and show the significance of the different components of our proposed approach.

News KVQA:Knowledge-Aware News Video Question Answering
Pranay Gupta, Manish Gupta
PAKDD, 2022
Dataset / arXiv

In this paper, we explore knowledge-based question answering in the context of news videos. To this end, we curate a new dataset with over 1M multiple-choice question-answer pairs. Using this dataset, we propose a novel approach, NEWSKVQA (Knowledge-Aware News Video Question Answering) which performs multi-modal inferencing over textual multiple-choice questions, videos, their transcripts and knowledge base

Quo Vadis, Skeleton action recognition?
Pranay Gupta, Anirudh Thatipelli, Aditya Aggarwal, Shubh Maheshwari, Neel Trivedi,
Sourav Das, Sarvadevabhatla, Ravi Kiran
IJCV, Special Issue on Human pose, Motion, Activities and Shape in 3D, 2021
project page / code / arXiv

In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. We introduce skeletics-152, a large scale into-the-wild skeleton action dataset. We extend out analysis to out of context actions by introducing Skelton-Mimetics dataset. Finally we introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. We benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results.

Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition
Pranay Gupta, Divyanshu Sharma, Sarvadevabhatla, Ravi Kiran
ICIP, 2021
project page / code / arXiv

In this paper, we study the effect of learning Part of Speech aware generative embeddings for zero shot and generalised zero shot skelton action recognition.

Teaching Assistant, Computer Vision course, Spring 2020

Design and source code from Jon Barron's website