publications

2026

Endpoint Anticipation for Low-Latency Spoken Dialogue

Sathvik Udupa, Shinji Watanabe, Petr Schwarz, and Jan Cernocky

Under review, 2026

spoken dialogue system
Grounding Spoken LLMs in Multi-Speaker Audio via Diarization Conditioning

Anon

Under review, 2026

speech recognition audio understanding
Learning Critical Articulators via Phoneme Classification Loss

Jesuraj Bandekar, Sathvik Udupa, and Prasanta Kumar Ghosh

Under review, 2026

speech production
ORCA: Open-ended Response Correctness Assessment for Audio Question Answering

Šimon Sedláček, Sara Barahona, Bolaji Yusuf, Laura Herrera-Alarcón, Santosh Kesiraju, Cecilia Bolaños, Alicia Lozano-Diez, Sathvik Udupa, Fernando López, Allison Ferner, Ramani Duraiswami, and Jan Černocký

Under review, 2026

audio understanding evaluation

arXiv
MMAU-Pro: A challenging and comprehensive benchmark for holistic evaluation of audio general intelligence

Sonal Kumar, Šimon Sedláček, Vaibhavi Lokegaonkar, Fernando López, Wenyi Yu, Nishit Anand, Hyeonggon Ryu, Lichang Chen, Maxim Plička, Miroslav Hlaváček, William Fineas, Sathvik Udupa, Siyuan Hou, Allison Ferner, Sara Barahona, Cecilia Bolaños, Satish Rahi, Laura Herrera-Alarcón, Satvik Dixit, Siddhi Patil, Soham Deshmukh, Lasha Koroshinadze, Yao Liu, Leibny Paola Garcia Perera, Eleni Zanou, Themos Stafylakis, Joon Son Chung, David Harwath, Chao Zhang, Dinesh Manocha, and 4 more authors

AAAI, 2026

audio understanding dataset

arXiv

2025

Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training

Sathvik Udupa, Shinji Watanabe, Petr Schwarz, and Jan Cernocky

ASRU, 2025

spoken dialogue system

arXiv
Improving Dialect Identification in Indian Languages Using Multimodal Features from Dialect Informed ASR

Saurabh Kumar, Sumit Sharma, Sathvik Udupa, Sandhya Badiger, Abhayjeet Singh, Jesuraja Bandekar, Savitha Murthy, Prasanta Kumar Ghosh, and others

ICASSP, 2025

speech recognition
RESPIN-S1.0: A read speech corpus of 10000+ hours in dialects of nine Indian Languages

Saurabh Kumar, Abhayjeet Singh, Jesuraj Bandekar, Savitha Murthy, Sumit Sharma, Sandhya Badiger, Sathvik Udupa, Amala Nagireddi, Srinivasa Raghavan KM, Rohan Saxena, Jai Nanavati, Jai Nanavati, Janani Sridharan, Arjun Mehta, Ashish S, Sai Mora, Prashanthi Venkataramakrishnan, Gauri Date, Karthika P, and Prasanta Ghosh

NeurIPS, 2025

speech recognition dataset
LIMMITS’24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning

Sathvik Udupa, Jesuraja Bandekar, Abhayjeet Singh, Deekshitha G, Saurabh Kumar, Sandhya Badiger, Amala Nagireddi, Roopa R, Prasanta Kumar Ghosh, Hema A. Murthy, Pranaw Kumar, Keiichi Tokuda, Mark Hasegawa-Johnson, and Philipp Olbrich

IEEE Open Journal of Signal Processing, 2025

speech synthesis

2024

Adapter pre-training for improved speech recognition in unseen domains using low resource adapter tuning of self-supervised models

Sathvik Udupa, Jesuraj Bandekar, Saurabh Kumar, Deekshitha G, Sandhya B, Abhayjeet S, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan, Raoul Nanavati, and Prasanta Kumar Ghosh

Interspeech, 2024

speech recognition
IndicMOS: Multilingual MOS Prediction for 7 Indian languages

Sathvik Udupa, Soumi Maiti, and Prasanta Kumar Ghosh

Interspeech, 2024

speech quality estimation
Articulatory synthesis using representations learnt through phonetic label-aware contrastive loss

Jesuraj Bandekar, Sathvik Udupa, and Prasanta Kumar Ghosh

Interspeech, 2024

speech production
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech

Abhayjeet Singh, Amala Nagireddi, Anjali Jayakumar, Jesuraja Bandekar, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, Hema A. Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, and Philipp Olbrich

IEEE Open Journal of Signal Processing, 2024

speech synthesis
A machine-learning tool to identify bistable states from calcium imaging data

Aalok Varma, Sathvik Udupa, Mohini Sengupta, Prasanta Kumar Ghosh, and Vatsala Thirumalai

The Journal of Physiology, 2024

neuroscience

2023

Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages

Sathvik Udupa, Jesuraja Bandekar, G Deekshitha, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan, and Raoul Nanavati

ASRU, 2023

speech recognition
Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

Sathvik Udupa, C Siddarth, and Prasanta Kumar Ghosh

ICASSP, 2023

speech production
Real-time mri video synthesis from time aligned phonemes with sequence-to-sequence networks

Sathvik Udupa and Prasanta Kumar Ghosh

ICASSP, 2023

speech production
Exploring a classification approach using quantised articulatory movements for acoustic to articulatory inversion

Jesuraj Bandekar, Sathvik Udupa, and Prasanta Kumar Ghosh

Interspeech, 2023

speech production
Speaking rate attention-based duration prediction for speed control TTS

Jesuraj Bandekar, Sathvik Udupa, Abhayjeet Singh, Anjali Jayakumar, Sandhya Badiger, Saurabh Kumar, Pooja VH, Prasanta Kumar Ghosh, and others

preprint, 2023

speech synthesis

arXiv

2022

Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi

Anish Bhanushali, Grant Bridgman, Deekshitha G, Prasanta Ghosh, Pratik Kumar, Saurabh Kumar, Adithya Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, Vrunda Sukhadia, Umesh S, Sathvik Udupa, and Lodagala V. S. V. Durga Prasad

Interspeech, 2022

speech recognition
Streaming model for Acoustic to Articulatory Inversion with transformer networks.

Sathvik Udupa, Aravind Illa, and Prasanta Kumar Ghosh

Interspeech, 2022

speech production
Watch Me Speak: 2D Visualization of Human Mouth during Speech.

C Siddarth, Sathvik Udupa, and Prasanta Kumar Ghosh

Interspeech, 2022

speech production

2021

Estimating Articulatory Movements in Speech Production with Transformer Networks

Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, and Prasanta Kumar Ghosh

Interspeech, 2021

speech production
Web Interface for Estimating Articulatory Movements in Speech Production from Acoustics and Text

Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, and Prasanta Kumar Ghosh

Interspeech, 2021

speech production

2020

Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson’s Disease and Healthy Controls with CNN-LSTM using transfer learning

Jhansi Mallela, Aravind Illa, Suhas B. N., Sathvik Udupa, Yamini Belur, Nalini Atchayaram, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, and Prasanta Kumar Ghosh

ICASSP, 2020

speech and health