publications
2026
- ORCA: Open-ended Response Correctness Assessment for Audio Question AnsweringUnder submission, 2026audio understanding evaluation
- MMAU-Pro: A challenging and comprehensive benchmark for holistic evaluation of audio general intelligenceAAAI, 2026audio understanding dataset
2025
- Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed TrainingASRU, 2025spoken dialogue system
- Improving Dialect Identification in Indian Languages Using Multimodal Features from Dialect Informed ASRICASSP, 2025speech recognition
- RESPIN-S1.0: A read speech corpus of 10000+ hours in dialects of nine Indian LanguagesNeurIPS, 2025speech recognition dataset
- LIMMITS’24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice CloningIEEE Open Journal of Signal Processing, 2025speech synthesis
2024
- Adapter pre-training for improved speech recognition in unseen domains using low resource adapter tuning of self-supervised modelsInterspeech, 2024speech recognition
- IndicMOS: Multilingual MOS Prediction for 7 Indian languagesInterspeech, 2024speech quality estimation
- Articulatory synthesis using representations learnt through phonetic label-aware contrastive lossInterspeech, 2024speech production
- Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-SpeechIEEE Open Journal of Signal Processing, 2024speech synthesis
- A machine-learning tool to identify bistable states from calcium imaging dataThe Journal of Physiology, 2024neuroscience
2023
- Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian LanguagesASRU, 2023speech recognition
- Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning modelsICASSP, 2023speech production
- Real-time mri video synthesis from time aligned phonemes with sequence-to-sequence networksICASSP, 2023speech production
- Exploring a classification approach using quantised articulatory movements for acoustic to articulatory inversionInterspeech, 2023speech production
- Speaking rate attention-based duration prediction for speed control TTSpreprint, 2023speech synthesis
2022
- Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of HindiInterspeech, 2022speech recognition
- Streaming model for Acoustic to Articulatory Inversion with transformer networks.Interspeech, 2022speech production
- Watch Me Speak: 2D Visualization of Human Mouth during Speech.Interspeech, 2022speech production
2021
- Estimating Articulatory Movements in Speech Production with Transformer NetworksInterspeech, 2021speech production
- Web Interface for Estimating Articulatory Movements in Speech Production from Acoustics and TextInterspeech, 2021speech production
2020
- Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson’s Disease and Healthy Controls with CNN-LSTM using transfer learningICASSP, 2020speech and health