A Comparative Benchmark of Deep Learning Architectures for AI-Assisted Breast Cancer Detection in Mammography Using the MammosighTR Dataset: A Nationwide Turkish Screening Study (2016–2022)


Creative Commons License

Azgınoğlu N.

CMES - COMPUTER MODELING IN ENGINEERING AND SCIENCES, cilt.146, sa.1, ss.38-39, 2026 (SCI-Expanded, Scopus)

Özet

Breast cancer screening programs rely heavily on mammography for early detection; however, diagnostic performance is strongly affected by inter-reader variability, breast density, and the limitations of conventional computer-aided detection systems. Recent advances in deep learning have enabled more robust and scalable solutions for large-scale screening, yet a systematic comparison of modern object detection architectures on nationally representative datasets remains limited. This study presents a comprehensive quantitative comparison of prominent deep learning–based object detection architectures for Artificial Intelligence-assisted mammography analysis using the MammosighTR dataset, developed within the Turkish National Breast Cancer Screening Program. The dataset comprises 12,740 patient cases collected between 2016 and 2022, annotated with BI-RADS categories, breast density levels, and lesion localization labels. A total of 31 models were evaluated, including One-Stage, Two-Stage, and Transformer-based architectures, under a unified experimental framework at both patient and breast levels. The results demonstrate that Two-Stage architectures consistently outperform One-Stage models, achieving approximately 2%–4% higher Macro F1-Scores and more balanced precision–recall trade-offs, with Double-Head R-CNN and Dynamic R-CNN yielding the highest overall performance (Macro F1  0.84–0.86). This advantage is primarily attributed to the region proposal mechanism and improved class balance inherent to Two-Stage designs. One-Stage detectors exhibited higher sensitivity and faster inference, reaching Recall values above 0.88, but experienced minor reductions in Precision and overall accuracy (1%–2%) compared with Two-Stage models. Among Transformer-based architectures, Deformable DEtection TRansformer demonstrated strong robustness and consistency across datasets, achieving Macro F1-Scores comparable to CNN-based detectors (0.83–0.85) while exhibiting minimal performance degradation under distributional shifts. Breast density–based analysis revealed increased misclassification rates in medium-density categories (types B and C), whereas Transformer-based architectures maintained more stable performance in high-density type D tissue. These findings quantitatively confirm that both architectural design and tissue characteristics play a decisive role in diagnostic accuracy. Overall, the study provides a reproducible benchmark and highlights the potential of hybrid approaches that combine the accuracy of Two-Stage detectors with the contextual modeling capability of Transformer architectures for clinically reliable breast cancer screening systems.