Critical Information Only: A Content Privacy-Preserving Framework for Detecting Audio Deepfakes

Abstract

Recent text-to-speech and voice-conversion systems can generate highly realistic speech, making audio deepfakes a serious security threat. Most existing countermeasures rely on full speech content, which is problematic in privacy-sensitive applications. We propose SafeEar, a content-privacy-preserving framework that detects deepfakes using acoustic cues while suppressing semantic content exposure. SafeEar uses a neural audio codec and decoupling model to separate semantic and acoustic information, then performs detection on acoustic representations with real-world augmentations such as codecs and reverberation. Across five benchmark datasets, SafeEar achieves strong detection performance while preventing both machine and human recovery of speech content.

Publication
In IEEE Transactions on Dependable and Secure Computing