Sai Kumar Dwivedi

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi, Omid Taheri, Cordelia Schmid, Michael J. Black, Dimitrios Tzionas

CVPR, 2025

Won Human Contact Challenge at CVPR 2025 (see here)

InteractVLM is a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images, enabling accurate joint reconstruction by leveraging large foundational model.

PICO: Reconstructing 3D People In Contact with Objects

Alpár Cseke*, Shashank Tripathi*, Sai Kumar Dwivedi, Arjun Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas

CVPR, 2025

PICO introduces PICO-db, a dataset of natural images with dense 3D human-object contact annotations, and PICO-fit, an optimization method that uses these annotations to jointly fit 3D body and object meshes to images.

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

Sai Kumar Dwivedi*, Yu Sun*, Priyanka Patel, Yao Feng, Michael J. Black

CVPR, 2024

Integrated into Meshcapade's commercial solution (see here)

TokenHMR addresses the paradox of declining 3D accuracy of HPS methods with increasing 2D precision by introducing a Threshold-Adaptive Loss Scaling (TALS) loss and reformulating the problem as token prediction.

ChatPose: Chatting about 3D Human Pose

Yao Feng*, Jing Lin*, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Michael J. Black

CVPR, 2024

ChatPose integrates Large Language Models to comprehend and reason about 3D human poses from images or textual descriptions, leveraging world knowledge and body language understanding to unify pose estimation and generation tasks.

POCO: 3D Pose and Shape Estimation using Confidence

Sai Kumar Dwivedi, Cordelia Schmid, Hongwei Yi, Michael J. Black, Dimitrios Tzionas

3DV, 2024 (Oral)

Featured in RSIP Vision Magazine (see here)

POCO is a novel framework that can be applied to common human pose and shape regressors, extending them to estimate the method’s confidence in the result without any downside.

Detecting Human-Object Contact in Images

Yixin Chen, Sai Kumar Dwivedi, Michael J. Black, Dimitrios Tzionas

CVPR, 2023

HOT introduces a novel dataset and detector to identify human-object contact in images, enhancing human-centered AI by addressing the absence of reliable detection methods.

Learning to Regress Bodies from Images using Differentiable Semantic Rendering

Sai Kumar Dwivedi, Nikos Athanasiou, Muhammed Kocabas, Michael J. Black

ICCV, 2021

DSR introduces a novel Differentiable Semantic Rendering (DSR) loss that utilizes semantic clothing information to improve 3D human body estimation, surpassing prior state-of-the-art methods.

ProtoGAN: Towards Few Shot Learning for Action Recognition

Sai Kumar Dwivedi, Vikram Gupta, Rahul Mitra, Shuaib Ahmed, Arjun Jain

ICCV Workshops, 2019

Show Summary | arXiv | Paper | Data | ICCV Workshops 2019

ProtoGAN addresses the challenge of few-shot learning for action recognition by synthesizing additional examples for novel categories using class prototype vectors, improving generalization towards novel classes.

Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition

Devraj Mandal, Sanath Narayan, Sai Kumar Dwivedi, Vikram Gupta, Shuaib Ahmed, Fahad Shahbaz Khan, Ling Shao

CVPR, 2019

Show Summary | arXiv | Paper | Code | CVPR 2019

While addressing the challenges of generalized zero-shot action recognition, our novel framework incorporates an out-of-distribution detector to distinguish between seen and unseen action categories, achieving significant improvements over existing methods.

Progression Modelling for Online and Early Gesture Detection

Vikram Gupta, Sai Kumar Dwivedi, Rishabh Dabral, Arjun Jain

3DV, 2019 (Oral)

Show Summary | arXiv | Paper | Data | 3DV 2019

Our simple yet effective multi-task learning framework addresses the issue of online and early gesture detection by modelling the gesture progression along with frame level recognition.