Deep learning approaches for speech emotion recognition

Loading...
Thumbnail Image
Authors
Srinivasan, Sriram
Advisors
Kshirsagar, Shruti
Issue Date
2024-05
Type
Thesis
Keywords
Research Projects
Organizational Units
Journal Issue
Citation
Abstract

This thesis addresses the challenge of speech emotion recognition, focusing on contin- uous emotion estimation using deep learning techniques. Emotion detection plays a vital role in various domains, including healthcare, human-computer interaction, and affective com- puting. However, traditional approaches often struggle with accurately recognizing emotions across noise and reverberation, leading to limited diagnostic accuracy and applicability. To overcome these limitations, our study proposes a novel approach that integrates speech enhancement as a preprocessing step using advanced deep learning techniques. Our exper- imentation utilizes the AVEC 2018 challenge datasets, comprising audio/video recordings from diverse cultural backgrounds. The experimental pipeline involves several key components, including feature extrac- tion, model training, and data/speech enhancement techniques. We employ LSTM (Long Short-Term Memory) models for temporal dependency modeling and investigate the effec- tiveness of different hyperparameters, such as batch size, learning rate, and optimizer choice. We aim to evaluate the effectiveness of speech enhancement methods and explore the impact of various hyperparameters on emotion recognition performance. The results of our experi- ments demonstrate promising performance improvements when leveraging data/speech en- hancement techniques, such as single Spectral Enhancement (SSE) and Speech enhancement Generative adversarial network (SEGAN) show potential for capturing complex temporal relationships and contextual information, leading to enhanced emotion recognition capabilities. Overall, this research contributes to advancing the field of speech emotion recognition by providing insights into the effectiveness of different deep learning techniques and hyper- parameters. By improving emotion detection accuracy, our work lays the groundwork for future developments in healthcare monitoring technologies and human-computer interaction systems, ultimately enhancing patient outcomes and user experiences.

Table of Contents
Description
Thesis (M.S.)-- Wichita State University, College of Engineering, School of Computing
Publisher
Wichita State University
Journal
Book Title
Series
PubMed ID
DOI
ISSN
EISSN