A Novel Multi-Head Attention Framework for COVID-19 Detection: Hybrid Integration of MobileNet and VGG19 with Enhanced Feature Learning


Creative Commons License

Kılıç Ş.

Çukurova Üniversitesi Mühendislik Fakültesi dergisi, vol.3, no.40, pp.655-670, 2025 (TRDizin)

Abstract

The COVID-19 pandemic has underscored the urgent need for rapid, accurate, and affordable diagnostic tools to complement RT-PCR testing. This study proposes a novel multi-head attention framework that integrates VGG19 and MobileNet for automated COVID-19 detection from chest X-rays. The model employs a hybrid mechanism combining spatial, channel, and self-attention components, enhancing feature representation while preserving efficiency.
Evaluations on 7,132 chest X-ray images across four categories (COVID-19, Normal, Pneumonia, Tuberculosis) demonstrated outstanding performance: 99.0% accuracy, 99.0% macro and weighted F1-scores, with near-perfect class-specific results (100% Tuberculosis, 99.7% COVID-19, 99.5% Normal, 96.0% Pneumonia). Inference time was only 63 ms per image, with a compact 14.8 MB model size.
These results surpass baseline MobileNet and DenseNet121 by 2.63% and 4.32%, respectively. The proposed framework offers reliable rapid screening and differential diagnosis, supported by interpretable attention maps, making it highly suitable for deployment in resource-limited healthcare and point-of-care settings.

The COVID-19 pandemic has underscored the urgent need for rapid, accurate, and affordable diagnostic tools to complement RT-PCR testing. This study proposes a novel multi-head attention framework that integrates VGG19 and MobileNet for automated COVID-19 detection from chest X-rays. The model employs a hybrid mechanism combining spatial, channel, and self-attention components, enhancing feature representation while preserving efficiency.
Evaluations on 7,132 chest X-ray images across four categories (COVID-19, Normal, Pneumonia, Tuberculosis) demonstrated outstanding performance: 99.0% accuracy, 99.0% macro and weighted F1-scores, with near-perfect class-specific results (100% Tuberculosis, 99.7% COVID-19, 99.5% Normal, 96.0% Pneumonia). Inference time was only 63 ms per image, with a compact 14.8 MB model size.
These results surpass baseline MobileNet and DenseNet121 by 2.63% and 4.32%, respectively. The proposed framework offers reliable rapid screening and differential diagnosis, supported by interpretable attention maps, making it highly suitable for deployment in resource-limited healthcare and point-of-care settings.