Date

August 30, 2024

Time

10:30

Track

CommSec Track

COMMSEC: Breaking Fake Voice Detection with Speaker-Irrelative Features

PhD Student

Lanzhou University

Associate Professor

Lanzhou University

Master Student

Lanzhou University

Professor

Zhejiang University

PRESENTATION SLIDES

Voice is a vital medium for transmitting information. The advancement of speech synthesis technology has resulted in high-quality synthesized voices indistinguishable from human ears. These fake voices have been widely used in natural Deepfake production and other malicious activities, raising serious concerns regarding security and privacy. To deal with this situation, there have been many studies working on detecting fake voices and reporting excellent performance. However, is the story really over?

In this research, we propose SiFDetectCracker, a black-box adversarial attack framework based on Speaker-Irrelative Features (SiFs) against fake voice detection. We select background noise and mute parts before and after the speakers’ voice as the primary attack features. Manipulating these features in synthesized speech results in the fake speech detector making erroneous judgments. Experimental results indicate that SiFDetectCracker achieved a success rate exceeding 80% in circumventing existing state-of-the-art fake voice detection systems. Furthermore, we provide an analysis elucidating why current detectors exhibit sensitivity to our adversarial attack.