Pranay Gupta

I am a PreDoc Apprentice at TCS Research, working under Ms. Ramya Hebbalaguppe and Dr. Rahul Narain. I did my undergraduate studies in computer science from IIIT Hyderabad, where I worked as a research assistant at CVIT under the guidance of Dr. S Ravi Kiran .

At TCS, I have been working on the problem of 3-D single view reconstruction. At CVIT, I have worked on problems related to skeleton based action recognition and zero shot and generalised zero shot skeleton action recognition. I spent the summer of 2020 working as an Applied Scientist intern at Amazon India, where I worked on semantic text similarity using Bert based siamese networks. I also work under the supervision of Dr. Manish Gupta on Knowledge aware video question answering.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo

My research interests revolve around video analysis, human action understanding, and 3-D computer vision. I am also interested in exploring multimodal learning, focusing on the interplay of vision and language.

News KVQA:Knowledge-Aware News Video Question Answering
Pranay Gupta, Gupta, Manish
PAKDD, 2022
code(coming soon) / arXiv

In this paper, we explore knowledge-based question answering in the context of news videos. To this end, we curate a new dataset with over 1M multiple-choice question-answer pairs. Using this dataset, we propose a novel approach, NEWSKVQA (Knowledge-Aware News Video Question Answering) which performs multi-modal inferencing over textual multiple-choice questions, videos, their transcripts and knowledge base

Quo Vadis, Skeleton action recognition?
Pranay Gupta, Anirudh Thatipelli, Aditya Aggarwal, Shubh Maheshwari, Neel Trivedi,
Sourav Das, Sarvadevabhatla, Ravi Kiran
IJCV, Special Issue on Human pose, Motion, Activities and Shape in 3D, 2021
project page / code / arXiv

In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. We introduce skeletics-152, a large scale into-the-wild skeleton action dataset. We extend out analysis to out of context actions by introducing Skelton-Mimetics dataset. Finally we introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. We benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results.

Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition
Pranay Gupta, Divyanshu Sharma, Sarvadevabhatla, Ravi Kiran
ICIP, 2021
project page / code / arXiv

In this paper, we study the effect of learning Part of Speech aware generative embeddings for zero shot and generalised zero shot skelton action recognition.

Teaching Assistant, Computer Vision course, Spring 2020

Design and source code from Jon Barron's website