
Recent text-to-speech and voice-conversion systems can generate highly realistic speech, making audio deepfakes a serious security threat. Most existing countermeasures rely on full speech content, which is problematic in privacy-sensitive applications. We propose SafeEar, a content-privacy-preserving framework that detects deepfakes using acoustic cues while suppressing semantic content exposure. SafeEar uses a neural audio codec and decoupling model to separate semantic and acoustic information, then performs detection on acoustic representations with real-world augmentations such as codecs and reverberation. Across five benchmark datasets, SafeEar achieves strong detection performance while preventing both machine and human recovery of speech content.