Sunil Kumar Kopparapu
Tata Consultancy Services
Machine learningSpeaker recognitionSpeaker diarisationMel-frequency cepstrumIntelligent word recognitionPattern recognitionBiometricsNatural languageNatural language processingSpoken languageMobile phoneSentenceDevanagariSpeech recognitionFeature (machine learning)Computer scienceFeature extractionSpeech analyticsSpeech processingRobustness (computer science)
125Publications
8H-index
370Citations
Publications 99
Newest
Gaussian Mixture Model-Universal Background Model (GMM‑UBM) supervectors are used to identify spoken Indian languages. The supervectors are calculated from short-time MFCC, its first and second derivatives. The UBM builds a generalized Indian language model, and mean adaptation transforms it to a duration normalized language-specific GMM. Multi-class support vector machine and artificial neural network classifiers are used to identify language labels from the supervectors. Experimental evaluatio...
Source
#1Ayush TripathiH-Index: 3
Last. Sunil Kumar Kopparapu (Tata Consultancy Services)H-Index: 8
view all 3 authors...
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. This is primarily due to the tendency of the classifier to be biased towards the majority classes in the imbalanced dataset. In this paper, we propose a novel three step technique to address imbalanced data. As a first step we significantly oversample the minority class distribution by employing the traditional Synthetic Minority OverSampling Technique (...
#1Swapnil BhosaleH-Index: 3
Last. Sunil Kumar Kopparapu (Tata Consultancy Services)H-Index: 8
view all 3 authors...
Few-shot learning aims to generalize unseen classes that appear during testing but are unavailable during training. Prototypical networks incorporate few-shot metric learning, by constructing a class prototype in the form of a mean vector of the embedded support points within a class. The performance of prototypical networks in extreme few-shot scenarios (like one-shot) degrades drastically, mainly due to the desuetude of variations within the clusters while constructing prototypes. In this pape...
#1Sri Harsha Dumpala (Tata Consultancy Services)H-Index: 1
#2Rupayan Chakraborty (Tata Consultancy Services)H-Index: 1
Last. Sunil Kumar Kopparapu (Tata Consultancy Services)H-Index: 8
view all 3 authors...
Binary class imbalance problem refers to the scenario where the number of training samples in one class is much lower compared with the number of samples in the other class. This imbalance hinders the applicability of conventional machine learning algorithms to classify accurately. Moreover, many real world training datasets often fall in the category where data is not only imbalanced but also low-resourced. In this paper we introduce a novel technique to handle the class imbalance problem, even...
Source
Oct 25, 2020 in INTERSPEECH (Conference of the International Speech Communication Association)
#2Karan Nathwani (IITs: Indian Institutes of Technology)H-Index: 5
Last. Sunil Kumar Kopparapu (Tata Consultancy Services)H-Index: 8
view all 4 authors...
Source
May 4, 2020 in ICASSP (International Conference on Acoustics, Speech, and Signal Processing)
#1Swapnil Bhosale (Tata Consultancy Services)H-Index: 3
#2Rupayan Chakraborty (Tata Consultancy Services)H-Index: 1
Last. Sunil Kumar Kopparapu (Tata Consultancy Services)H-Index: 8
view all 3 authors...
An End-to-End model with convolutional layers and multi-head self attention mechanism is proposed for Speech Emotion Recognition (SER) task. As inputs, we propose to use both the deep encoded linguistic features that carry the language related context of emotion and the audio spectrogram that are representatives of acoustic cues. To facilitate the deep linguistic feature representation, we use outputs from the intermediate layers of a pre-trained Automatic Speech Recognition (ASR) model, where t...
Source
May 4, 2020 in ICASSP (International Conference on Acoustics, Speech, and Signal Processing)
#1Ayush TripathiH-Index: 3
#2Swapnil BhosaleH-Index: 3
Last. Sunil Kumar KopparapuH-Index: 8
view all 3 authors...
Dysarthria is a motor speech impairment caused by muscle weakness. Individuals, with this condition, are unable to control rapid movement of the velum leading to reduction in intelligibility, audibility, naturalness and efficiency of vocal communication. Systems that can assess intelligibility of dysarthric speech can help clinicians diagnose the impact of therapy and medication. In the paper, we propose a usable novel method to assess intelligibility of dysarthric speakers. The approach is base...
Source
May 4, 2020 in ICASSP (International Conference on Acoustics, Speech, and Signal Processing)
#1Ayush TripathiH-Index: 3
#2Swapnil BhosaleH-Index: 3
Last. Sunil Kumar KopparapuH-Index: 8
view all 3 authors...
Individuals with dysarthria are unable to control rapid movement of the velum leading to reduction in intelligibility, audibility, naturalness and efficiency of vocal communication. Automatic intelligibility assessment of dysarthric patients allows clinicians diagnose the impact of therapy and medication and also to plan future course of action. Earlier works have concentrated on building speaker dependent machine learning systems for intelligibility assessment, due to limited availability of da...
1 CitationsSource
12345678910
Close Researchers