eprintid: 4905 rev_number: 9 eprint_status: archive userid: 2 dir: disk0/00/00/49/05 datestamp: 2022-12-05 23:30:10 lastmod: 2023-07-17 23:30:38 status_changed: 2022-12-05 23:30:10 type: article metadata_visibility: show creators_name: Rustam, Furqan creators_name: Ashraf, Imran creators_name: Jabbar, Shehbaz creators_name: Tutusaus, Kilian creators_name: Mazas Pérez-Oleaga, Cristina creators_name: Pascual Barrera, Alina Eugenia creators_name: de la Torre Diez, Isabel creators_id: creators_id: creators_id: creators_id: kilian.tutusaus@uneatlantico.es creators_id: cristina.mazas@uneatlantico.es creators_id: alina.pascual@unini.edu.mx creators_id: title: Prediction β-Thalassemia carriers using complete blood count features ispublished: pub subjects: uneat_eng divisions: uneatlantico_produccion_cientifica divisions: unincol_produccion_cientifica divisions: uninimx_produccion_cientifica divisions: uninipr_produccion_cientifica divisions: unic_produccion_cientifica full_text_status: public keywords: Computational biology andbioinformatics; Health care abstract: β-Thalassemia is one of the dangerous causes of the high mortality rate in the Mediterranean countries. Substantial resources are required to save a β-Thalassemia carriers’ life and early detection of thalassemia patients can help appropriate treatment to increase the carrier’s life expectancy. Being a genetic disease, it can not be prevented however the analysis of several indicators in parents’ blood can be used to detect disorders causing Thalassemia. Laboratory tests for Thalassemia are time-consuming and expensive like high-performance liquid chromatography, Complete Blood Count (CBC) with peripheral smear, genetic test, etc. Red blood indices from CBC can be used with machine learning models for the same task. Despite the available approaches for Thalassemia carriers from CBC data, gaps exist between the desired and achieved accuracy. Moreover, the data imbalance problem is studied well which makes the models less generalizable. This study proposes a highly accurate approach for β-Thalassemia detection using red blood indices from CBC augmented by supervised machine learning. In view of the fact that all the features do not carry predictive information regarding the target variable, this study employs a unified framework of two features selection techniques including Principal Component Analysis (PCA) and Singular Vector Decomposition (SVD). The data imbalance between β-Thalassemia carrier and non-carriers is handled by Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic (ADASYN). Extensive experiments are performed using many state-of-the-art machine learning models and deep learning models. Experimental results indicate the superiority of the proposed approach over existing approaches with an accuracy score of 0.96. date: 2022-11 publication: Scientific Reports volume: 12 number: 1 id_number: doi:10.1038/s41598-022-22011-8 refereed: TRUE issn: 2045-2322 official_url: http://doi.org/10.1038/s41598-022-22011-8 access: open language: en citation: Artículo Materias > Ingeniería Universidad Europea del Atlántico > Investigación > Producción Científica Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica Abierto Inglés β-Thalassemia is one of the dangerous causes of the high mortality rate in the Mediterranean countries. Substantial resources are required to save a β-Thalassemia carriers’ life and early detection of thalassemia patients can help appropriate treatment to increase the carrier’s life expectancy. Being a genetic disease, it can not be prevented however the analysis of several indicators in parents’ blood can be used to detect disorders causing Thalassemia. Laboratory tests for Thalassemia are time-consuming and expensive like high-performance liquid chromatography, Complete Blood Count (CBC) with peripheral smear, genetic test, etc. Red blood indices from CBC can be used with machine learning models for the same task. Despite the available approaches for Thalassemia carriers from CBC data, gaps exist between the desired and achieved accuracy. Moreover, the data imbalance problem is studied well which makes the models less generalizable. This study proposes a highly accurate approach for β-Thalassemia detection using red blood indices from CBC augmented by supervised machine learning. In view of the fact that all the features do not carry predictive information regarding the target variable, this study employs a unified framework of two features selection techniques including Principal Component Analysis (PCA) and Singular Vector Decomposition (SVD). The data imbalance between β-Thalassemia carrier and non-carriers is handled by Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic (ADASYN). Extensive experiments are performed using many state-of-the-art machine learning models and deep learning models. Experimental results indicate the superiority of the proposed approach over existing approaches with an accuracy score of 0.96. metadata Rustam, Furqan; Ashraf, Imran; Jabbar, Shehbaz; Tutusaus, Kilian; Mazas Pérez-Oleaga, Cristina; Pascual Barrera, Alina Eugenia y de la Torre Diez, Isabel mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, kilian.tutusaus@uneatlantico.es, cristina.mazas@uneatlantico.es, alina.pascual@unini.edu.mx, SIN ESPECIFICAR (2022) Prediction β-Thalassemia carriers using complete blood count features. Scientific Reports, 12 (1). ISSN 2045-2322 document_url: http://repositorio.uneatlantico.es/id/eprint/4905/1/s41598-022-22011-8.pdf