eprintid: 6067 rev_number: 10 eprint_status: archive userid: 2 dir: disk0/00/00/60/67 datestamp: 2023-02-27 23:30:11 lastmod: 2023-03-16 23:30:07 status_changed: 2023-02-27 23:30:11 type: article metadata_visibility: show creators_name: Rodríguez Velasco, Carmen Lilí creators_name: García Villena, Eduardo creators_name: Brito Ballester, Julién creators_name: Durántez Prados, Frigdiano Álvaro creators_name: Silva Alvarado, Eduardo René creators_name: Crespo Álvarez, Jorge creators_id: carmen.rodriguez@uneatlantico.es creators_id: eduardo.garcia@uneatlantico.es creators_id: julien.brito@uneatlantico.es creators_id: durantez@uneatlantico.es creators_id: eduardo.silva@funiber.org creators_id: jorge.crespo@uneatlantico.es title: Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data ispublished: pub subjects: uneat_eng subjects: uneat_fp divisions: uneatlantico_produccion_cientifica divisions: uninimx_produccion_cientifica divisions: uninipr_produccion_cientifica divisions: unic_produccion_cientifica full_text_status: public keywords: optimal likelihood threshold,, imbalanced data, student dropout prediction, resample techniques, distance learning courses abstract: The purpose of this research article was to contrast the benefits of the optimal probability threshold adjustment technique with other imbalanced data processing techniques, in its application to the prediction of post-graduate students’ late dropout from distance learning courses in two universities in the Ibero-American space. In this context, the optimization of the Logistic Regression, Random Forest, and Neural Network classifiers, together with different techniques, attributes, and algorithms (Hyperparameters, SMOTE, SMOTE_SVM, and ADASYN) resulted in a set of metrics for decision-making, prioritizing the reduction of false negatives. The best model was the Neural Network model in combination with SMOTE_SVM, obtaining a recall index of 0.75 and an f1-Score of 0.60. Likewise, the robustness of the Random Forest classifier for imbalanced data was demonstrated by achieving, with an optimal threshold of 0.427, very similar metrics to those obtained by the consensus of the three best models found. This demonstrates that, for Random Forest, the optimal prediction probability threshold is an excellent alternative to resampling techniques with different optimal thresholds. Finally, it is hoped that this research paper will contribute to boost the application of this simple but powerful technique, which is highly underrated with respect to data resampling techniques for imbalanced data. date: 2023 publication: International Journal of Emerging Technologies in Learning (iJET) volume: 18 number: 04 pagerange: 120-155 id_number: doi:10.3991/ijet.v18i04.34825 refereed: TRUE issn: 1863-0383 official_url: http://doi.org/10.3991/ijet.v18i04.34825 access: open language: en citation: Artículo Materias > Ingeniería Materias > Educación Universidad Europea del Atlántico > Investigación > Producción Científica Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica Abierto Inglés The purpose of this research article was to contrast the benefits of the optimal probability threshold adjustment technique with other imbalanced data processing techniques, in its application to the prediction of post-graduate students’ late dropout from distance learning courses in two universities in the Ibero-American space. In this context, the optimization of the Logistic Regression, Random Forest, and Neural Network classifiers, together with different techniques, attributes, and algorithms (Hyperparameters, SMOTE, SMOTE_SVM, and ADASYN) resulted in a set of metrics for decision-making, prioritizing the reduction of false negatives. The best model was the Neural Network model in combination with SMOTE_SVM, obtaining a recall index of 0.75 and an f1-Score of 0.60. Likewise, the robustness of the Random Forest classifier for imbalanced data was demonstrated by achieving, with an optimal threshold of 0.427, very similar metrics to those obtained by the consensus of the three best models found. This demonstrates that, for Random Forest, the optimal prediction probability threshold is an excellent alternative to resampling techniques with different optimal thresholds. Finally, it is hoped that this research paper will contribute to boost the application of this simple but powerful technique, which is highly underrated with respect to data resampling techniques for imbalanced data. metadata Rodríguez Velasco, Carmen Lilí; García Villena, Eduardo; Brito Ballester, Julién; Durántez Prados, Frigdiano Álvaro; Silva Alvarado, Eduardo René y Crespo Álvarez, Jorge mail carmen.rodriguez@uneatlantico.es, eduardo.garcia@uneatlantico.es, julien.brito@uneatlantico.es, durantez@uneatlantico.es, eduardo.silva@funiber.org, jorge.crespo@uneatlantico.es (2023) Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data. International Journal of Emerging Technologies in Learning (iJET), 18 (04). pp. 120-155. ISSN 1863-0383 document_url: http://repositorio.uneatlantico.es/id/eprint/6067/1/document.pdf