Investigating speech enhancement towards robust synthetic audio spoofing detection in the wild
Authors
Advisors
Avila, Anderson
Issue Date
Type
Keywords
Citation
Abstract
Logical access (LA) attacks involve the use of Text-to-Speech (TTS) or voice conversion (VC) techniques to generate spoofed speech data. This represents a serious threat to automatic speaker verification as intruders can use such attacks to bypass biometric security systems. In this study, we train a state-of-the-art model to distinguish between bonafide and spoofed speech samples, and we investigate its performance in the wild. For that, we used the LA data provided in the ASVspoof 2019 Challenge in the presence of different levels and types of background noises. We also explored two enhancement algorithms, namely SEGAN and MetricGAN+, to mitigate the detrimental effects of noisy speech. Results show that applying enhancement prior to the LA task can improve performance in more degraded scenarios. We also found that quality measures, such as PESQ, can be an important asset as indicator of enhancement algorithms performance.
Table of Contents
Description
Research completed in the School of Computing, Wichita State University and the Department of Computer Science, INRS-Canada.
Publisher
Journal
Book Title
Series
v. 21

