Deepfake Detection Across Image, Video, and Audio: A Comprehensive Survey with Empirical Evaluation of Generalization and Robustness

Deepfakes (DFs) have emerged as a significant threat in recent years. They are exploited for malicious purposes such as impersonation, misinformation dissemination, and artistic style imitation, raising critical ethical and security concerns. This survey presents a comprehensive cross-modality analysis of passive DF detection, examining image, video, and audio modalities simultaneously. Distinct from multimodal detection which fuses multiple data streams (e.g., audio-visual, text-visual), our cross-modality analysis investigates the interconnected relationships, shared methodological principles, and common vulnerabilities across independent modalities. We systematically categorize detection approaches based on their underlying methodologies: forensic-based, data-driven, fingerprint-based, and hybrid techniques for visual modalities, and handcrafted versus learnable features for audio. We also extend our analysis beyond mere detection accuracy to include essential performance dimensions for real-world deployment, including generalization and robustness. Additionally, this survey provides a unified evaluation protocol across 10 popular datasets to assess detection accuracy, generalization, and robustness in each modality. Specifically, we conduct extensive empirical evaluations across three critical dimensions: (1) verification of reported within-domain accuracy for 13 unimodal detectors, (2) cross-domain generalization assessment of 33 methods specifically designed to enhance generalization capability, and (3) robustness evaluation of 6 methods against adversarial attacks. Our experiments reveal a persistent generalization gap, with performance degradations of 10-15% in out-of-distribution scenarios, and vulnerability to white-box adversarial attacks exceeding 80% success rates. We also analyze the advantages and limitations of existing datasets, benchmarks, and evaluation metrics for passive DF detection. Finally, we propose future research directions to address these unexplored and emerging issues in the field of passive DF detection. This survey serves as a comprehensive resource for researchers and practitioners, providing insights into the current landscape, methodological approaches, and promising future directions in this rapidly evolving field.