publications

2026

  1. ORCA: Open-ended Response Correctness Assessment for Audio Question Answering
    Šimon Sedláček, Sara Barahona, Bolaji Yusuf, Laura Herrera-Alarcón, Santosh Kesiraju, Cecilia Bolaños, Alicia Lozano-Diez, Sathvik Udupa, Fernando López, Allison Ferner, Ramani Duraiswami, and Jan Černocký
    Under submission, 2026
    audio understanding evaluation
  2. MMAU-Pro: A challenging and comprehensive benchmark for holistic evaluation of audio general intelligence
    Sonal Kumar, Šimon Sedláček, Vaibhavi Lokegaonkar, Fernando López, Wenyi Yu, Nishit Anand, Hyeonggon Ryu, Lichang Chen, Maxim Plička, Miroslav Hlaváček, William Fineas, Sathvik Udupa, Siyuan Hou, Allison Ferner, Sara Barahona, Cecilia Bolaños, Satish Rahi, Laura Herrera-Alarcón, Satvik Dixit, Siddhi Patil, Soham Deshmukh, Lasha Koroshinadze, Yao Liu, Leibny Paola Garcia Perera, Eleni Zanou, Themos Stafylakis, Joon Son Chung, David Harwath, Chao Zhang, Dinesh Manocha, and 4 more authors
    AAAI, 2026
    audio understanding dataset

2025

  1. Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
    Sathvik Udupa, Shinji Watanabe, Petr Schwarz, and Jan Cernocky
    ASRU, 2025
    spoken dialogue system
  2. Improving Dialect Identification in Indian Languages Using Multimodal Features from Dialect Informed ASR
    Saurabh Kumar, Sumit Sharma, Sathvik Udupa, Sandhya Badiger, Abhayjeet Singh, Jesuraja Bandekar, Savitha Murthy, Prasanta Kumar Ghosh, and others
    ICASSP, 2025
    speech recognition
  3. RESPIN-S1.0: A read speech corpus of 10000+ hours in dialects of nine Indian Languages
    Saurabh Kumar, Abhayjeet Singh, Jesuraj Bandekar, Savitha Murthy, Sumit Sharma, Sandhya Badiger, Sathvik Udupa, Amala Nagireddi, Srinivasa Raghavan KM, Rohan Saxena, Jai Nanavati, Jai Nanavati, Janani Sridharan, Arjun Mehta, Ashish S, Sai Mora, Prashanthi Venkataramakrishnan, Gauri Date, Karthika P, and Prasanta Ghosh
    NeurIPS, 2025
    speech recognition dataset
  4. LIMMITS’24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning
    Sathvik Udupa, Jesuraja Bandekar, Abhayjeet Singh, Deekshitha G, Saurabh Kumar, Sandhya Badiger, Amala Nagireddi, Roopa R, Prasanta Kumar Ghosh, Hema A. Murthy, Pranaw Kumar, Keiichi Tokuda, Mark Hasegawa-Johnson, and Philipp Olbrich
    IEEE Open Journal of Signal Processing, 2025
    speech synthesis

2024

  1. Adapter pre-training for improved speech recognition in unseen domains using low resource adapter tuning of self-supervised models
    Sathvik Udupa, Jesuraj Bandekar, Saurabh Kumar, Deekshitha G, Sandhya B, Abhayjeet S, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan, Raoul Nanavati, and Prasanta Kumar Ghosh
    Interspeech, 2024
    speech recognition
  2. IndicMOS: Multilingual MOS Prediction for 7 Indian languages
    Sathvik Udupa, Soumi Maiti, and Prasanta Kumar Ghosh
    Interspeech, 2024
    speech quality estimation
  3. Articulatory synthesis using representations learnt through phonetic label-aware contrastive loss
    Jesuraj Bandekar, Sathvik Udupa, and Prasanta Kumar Ghosh
    Interspeech, 2024
    speech production
  4. Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
    Abhayjeet Singh, Amala Nagireddi, Anjali Jayakumar, Jesuraja Bandekar, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, Hema A. Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, and Philipp Olbrich
    IEEE Open Journal of Signal Processing, 2024
    speech synthesis
  5. A machine-learning tool to identify bistable states from calcium imaging data
    Aalok Varma, Sathvik Udupa, Mohini Sengupta, Prasanta Kumar Ghosh, and Vatsala Thirumalai
    The Journal of Physiology, 2024
    neuroscience

2023

  1. Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages
    Sathvik Udupa, Jesuraja Bandekar, G Deekshitha, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan, and Raoul Nanavati
    ASRU, 2023
    speech recognition
  2. Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models
    Sathvik Udupa, C Siddarth, and Prasanta Kumar Ghosh
    ICASSP, 2023
    speech production
  3. Real-time mri video synthesis from time aligned phonemes with sequence-to-sequence networks
    Sathvik Udupa and Prasanta Kumar Ghosh
    ICASSP, 2023
    speech production
  4. Exploring a classification approach using quantised articulatory movements for acoustic to articulatory inversion
    Jesuraj Bandekar, Sathvik Udupa, and Prasanta Kumar Ghosh
    Interspeech, 2023
    speech production
  5. Speaking rate attention-based duration prediction for speed control TTS
    Jesuraj Bandekar, Sathvik Udupa, Abhayjeet Singh, Anjali Jayakumar, Sandhya Badiger, Saurabh Kumar, Pooja VH, Prasanta Kumar Ghosh, and others
    preprint, 2023
    speech synthesis

2022

  1. Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi
    Anish Bhanushali, Grant Bridgman, Deekshitha G, Prasanta Ghosh, Pratik Kumar, Saurabh Kumar, Adithya Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, Vrunda Sukhadia, Umesh S, Sathvik Udupa, and Lodagala V. S. V. Durga Prasad
    Interspeech, 2022
    speech recognition
  2. Streaming model for Acoustic to Articulatory Inversion with transformer networks.
    Sathvik Udupa, Aravind Illa, and Prasanta Kumar Ghosh
    Interspeech, 2022
    speech production
  3. Watch Me Speak: 2D Visualization of Human Mouth during Speech.
    C Siddarth, Sathvik Udupa, and Prasanta Kumar Ghosh
    Interspeech, 2022
    speech production

2021

  1. Estimating Articulatory Movements in Speech Production with Transformer Networks
    Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, and Prasanta Kumar Ghosh
    Interspeech, 2021
    speech production
  2. Web Interface for Estimating Articulatory Movements in Speech Production from Acoustics and Text
    Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, and Prasanta Kumar Ghosh
    Interspeech, 2021
    speech production

2020

  1. Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson’s Disease and Healthy Controls with CNN-LSTM using transfer learning
    Jhansi Mallela, Aravind Illa, Suhas B. N., Sathvik Udupa, Yamini Belur, Nalini Atchayaram, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, and Prasanta Kumar Ghosh
    ICASSP, 2020
    speech and health