eprintid: 17864 rev_number: 7 eprint_status: archive userid: 2 dir: disk0/00/01/78/64 datestamp: 2025-10-27 09:57:27 lastmod: 2025-10-30 18:35:10 status_changed: 2025-10-27 09:57:27 type: article metadata_visibility: show creators_name: Aslam, Zahid creators_name: Missen, Malik Muhammad Saad creators_name: Ghaffar, Arslan Abdul creators_name: Mehmood, Arif creators_name: Gracia Villar, Mónica creators_name: Silva Alvarado, Eduardo René creators_name: Ashraf, Imran creators_id: creators_id: creators_id: creators_id: creators_id: monica.gracia@uneatlantico.es creators_id: eduardo.silva@funiber.org creators_id: title: Advancing fake news combating using machine learning: a hybrid model approach ispublished: pub subjects: uneat_eng divisions: uneatlantico_produccion_cientifica divisions: unincol_produccion_cientifica divisions: uninimx_produccion_cientifica divisions: unic_produccion_cientifica divisions: uniromana_produccion_cientifica full_text_status: public keywords: Information processing; Fake news detection; Natural language processing; Machine learning; Ensemble model; Social media news abstract: The digital era, while offering unparalleled access to information, has also seen the rapid proliferation of fake news, a phenomenon with the potential to distort public perception and influence sociopolitical events. The need to identify and mitigate the spread of such disinformation is crucial for maintaining the integrity of public discourse. This research introduces a multi-view learning framework that achieves high precision by systematically integrating diverse feature perspectives. Using a diverse dataset of news articles, the approach combines several feature extraction methods, including TF-IDF for individual words (unigrams) and word pairs (bigrams), and counts vectorization to represent text in multiple ways. To capture additional linguistic and semantic information, advanced features, such as readability scores, sentiment scores, and topic distributions generated by latent Dirichlet allocation (LDA), are also extracted. The framework implements a multi-view learning strategy, where separate views focus on basic text, linguistic, and semantic features, feeding into a final ensemble model. Models like logistic regression, random forest, and LightGBM are employed to analyze each view, and a stacked ensemble integrates their outputs. Through rigorous tenfold cross-validation, our proposed multi-view ensemble achieves a state-of-the-art accuracy of 0.9994, outperforming strong baselines, including single-view models and a BERT-based classifier. Robustness testing confirms the model maintains high accuracy even under data perturbations, establishing the value of structured feature separation and intelligent ensemble techniques. date: 2025-09 publication: Knowledge and Information Systems id_number: doi:10.1007/s10115-025-02588-y refereed: TRUE issn: 0219-1377 official_url: http://doi.org/10.1007/s10115-025-02588-y access: open language: en citation: Artículo Materias > Ingeniería Universidad Europea del Atlántico > Investigación > Artículos y libros Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica Universidad de La Romana > Investigación > Producción Científica Abierto Inglés The digital era, while offering unparalleled access to information, has also seen the rapid proliferation of fake news, a phenomenon with the potential to distort public perception and influence sociopolitical events. The need to identify and mitigate the spread of such disinformation is crucial for maintaining the integrity of public discourse. This research introduces a multi-view learning framework that achieves high precision by systematically integrating diverse feature perspectives. Using a diverse dataset of news articles, the approach combines several feature extraction methods, including TF-IDF for individual words (unigrams) and word pairs (bigrams), and counts vectorization to represent text in multiple ways. To capture additional linguistic and semantic information, advanced features, such as readability scores, sentiment scores, and topic distributions generated by latent Dirichlet allocation (LDA), are also extracted. The framework implements a multi-view learning strategy, where separate views focus on basic text, linguistic, and semantic features, feeding into a final ensemble model. Models like logistic regression, random forest, and LightGBM are employed to analyze each view, and a stacked ensemble integrates their outputs. Through rigorous tenfold cross-validation, our proposed multi-view ensemble achieves a state-of-the-art accuracy of 0.9994, outperforming strong baselines, including single-view models and a BERT-based classifier. Robustness testing confirms the model maintains high accuracy even under data perturbations, establishing the value of structured feature separation and intelligent ensemble techniques. metadata Aslam, Zahid; Missen, Malik Muhammad Saad; Ghaffar, Arslan Abdul; Mehmood, Arif; Gracia Villar, Mónica; Silva Alvarado, Eduardo René y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, eduardo.silva@funiber.org, SIN ESPECIFICAR (2025) Advancing fake news combating using machine learning: a hybrid model approach. Knowledge and Information Systems. ISSN 0219-1377 document_url: http://repositorio.uneatlantico.es/id/eprint/17864/1/s10115-025-02588-y.pdf