Science Cast

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Ruibo FuJune 7, 2024 4:14pm

Views (850)
Comments (0)

Voice is AI-generated

Description

For the deepfake audio dataset of Audio Laguage Model (ALM) and the corresponding high generalization deepfake audio detection method, the code and data have been open-source. Code: https://github.com/xieyuankun/Codecfake Dataset: https://zenodo.org/records/11171708

Connected to paperThis paper is a preprint and has not been certified by peer review

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

arXivPDFMay 8, 2024 12:00am

Authors

Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun

Abstract

With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including 2 languages, over 1M audio samples, and various test conditions, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online.

TwitterandLinkedIn

0 comments

Add comment

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Data Room (1 record)

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

AI-powered Paper ChatBeta

0 comments