eprintid: 12751
rev_number: 8
eprint_status: archive
userid: 2
dir: disk0/00/01/27/51
datestamp: 2024-06-17 23:30:29
lastmod: 2024-06-17 23:30:30
status_changed: 2024-06-17 23:30:29
type: article
metadata_visibility: show
creators_name: Shaha, Tumpa Rani
creators_name: Begum, Momotaz
creators_name: Uddin, Jia
creators_name: Yélamos Torres, Vanessa
creators_name: Alemany Iturriaga, Josep
creators_name: Ashraf, Imran
creators_name: Samad, Md. Abdus
creators_id: 
creators_id: 
creators_id: 
creators_id: vanessa.yelamos@funiber.org
creators_id: josep.alemany@uneatlantico.es
creators_id: 
creators_id: 
title: Feature group partitioning: an approach for depression severity prediction with class balancing using machine learning algorithms
ispublished: pub
subjects: uneat_eng
divisions: uneatlantico_produccion_cientifica
divisions: uninimx_produccion_cientifica
divisions: uninipr_produccion_cientifica
divisions: unic_produccion_cientifica
divisions: uniromana_produccion_cientifica
full_text_status: public
keywords: Machine learning; Depression prediction; Class balancing; Oversampling; SMOTE; ADASYN; Stratified cross validation; Burn depression checklist; Feature group partitioning
abstract: In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.
date: 2024-06
publication: BMC Medical Research Methodology
volume: 24
number: 1
id_number: doi:10.1186/s12874-024-02249-8
refereed: TRUE
issn: 1471-2288
official_url: http://doi.org/10.1186/s12874-024-02249-8
access: open
language: en
citation:   Artículo Materias > Ingeniería <http://repositorio.uneatlantico.es/view/subjects/uneat=5Feng.html> Universidad Europea del Atlántico > Investigación > Artículos y libros <http://repositorio.uneatlantico.es/view/divisions/uneatlantico=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana México > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/uninimx=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/uninipr=5Fproduccion=5Fcientifica.html>
Universidad Internacional do Cuanza > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/unic=5Fproduccion=5Fcientifica.html>
Universidad de La Romana > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/uniromana=5Fproduccion=5Fcientifica.html> Abierto Inglés In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research. metadata Shaha, Tumpa Rani; Begum, Momotaz; Uddin, Jia; Yélamos Torres, Vanessa; Alemany Iturriaga, Josep; Ashraf, Imran y Samad, Md. Abdus mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, vanessa.yelamos@funiber.org, josep.alemany@uneatlantico.es, SIN ESPECIFICAR, SIN ESPECIFICAR     <http://repositorio.uneatlantico.es/id/eprint/12751/1/s12874-024-02249-8.pdf>     (2024) Feature group partitioning: an approach for depression severity prediction with class balancing using machine learning algorithms.  BMC Medical Research Methodology, 24 (1).   ISSN 1471-2288     
document_url: http://repositorio.uneatlantico.es/id/eprint/12751/1/s12874-024-02249-8.pdf