Deep Encoded Linguistic and Acoustic Cues for Attention Based End to End Speech Emotion Recognition

Published: May 1, 2020
Abstract
An End-to-End model with convolutional layers and multi-head self attention mechanism is proposed for Speech Emotion Recognition (SER) task. As inputs, we propose to use both the deep encoded linguistic features that carry the language related context of emotion and the audio spectrogram that are representatives of acoustic cues. To facilitate the deep linguistic feature representation, we use outputs from the intermediate layers of a...
Paper Details
Title
Deep Encoded Linguistic and Acoustic Cues for Attention Based End to End Speech Emotion Recognition
Published Date
May 1, 2020
Citation AnalysisPro
  • Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
  • Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.