eprintid: 17864
rev_number: 7
eprint_status: archive
userid: 2
dir: disk0/00/01/78/64
datestamp: 2025-10-27 09:57:27
lastmod: 2025-10-30 18:35:10
status_changed: 2025-10-27 09:57:27
type: article
metadata_visibility: show
creators_name: Aslam, Zahid
creators_name: Missen, Malik Muhammad Saad
creators_name: Ghaffar, Arslan Abdul
creators_name: Mehmood, Arif
creators_name: Gracia Villar, Mónica
creators_name: Silva Alvarado, Eduardo René
creators_name: Ashraf, Imran
creators_id: 
creators_id: 
creators_id: 
creators_id: 
creators_id: monica.gracia@uneatlantico.es
creators_id: eduardo.silva@funiber.org
creators_id: 
title: Advancing fake news combating using machine learning: a hybrid model approach
ispublished: pub
subjects: uneat_eng
divisions: uneatlantico_produccion_cientifica
divisions: unincol_produccion_cientifica
divisions: uninimx_produccion_cientifica
divisions: unic_produccion_cientifica
divisions: uniromana_produccion_cientifica
full_text_status: public
keywords: Information processing; Fake news detection; Natural language processing; Machine learning; Ensemble model; Social media news
abstract: The digital era, while offering unparalleled access to information, has also seen the rapid proliferation of fake news, a phenomenon with the potential to distort public perception and influence sociopolitical events. The need to identify and mitigate the spread of such disinformation is crucial for maintaining the integrity of public discourse. This research introduces a multi-view learning framework that achieves high precision by systematically integrating diverse feature perspectives. Using a diverse dataset of news articles, the approach combines several feature extraction methods, including TF-IDF for individual words (unigrams) and word pairs (bigrams), and counts vectorization to represent text in multiple ways. To capture additional linguistic and semantic information, advanced features, such as readability scores, sentiment scores, and topic distributions generated by latent Dirichlet allocation (LDA), are also extracted. The framework implements a multi-view learning strategy, where separate views focus on basic text, linguistic, and semantic features, feeding into a final ensemble model. Models like logistic regression, random forest, and LightGBM are employed to analyze each view, and a stacked ensemble integrates their outputs. Through rigorous tenfold cross-validation, our proposed multi-view ensemble achieves a state-of-the-art accuracy of 0.9994, outperforming strong baselines, including single-view models and a BERT-based classifier. Robustness testing confirms the model maintains high accuracy even under data perturbations, establishing the value of structured feature separation and intelligent ensemble techniques.
date: 2025-09
publication: Knowledge and Information Systems
id_number: doi:10.1007/s10115-025-02588-y
refereed: TRUE
issn: 0219-1377
official_url: http://doi.org/10.1007/s10115-025-02588-y
access: open
language: en
citation:   Artículo Materias > Ingeniería <http://repositorio.uneatlantico.es/view/subjects/uneat=5Feng.html> Universidad Europea del Atlántico > Investigación > Artículos y libros <http://repositorio.uneatlantico.es/view/divisions/uneatlantico=5Fproduccion=5Fcientifica.html>
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/unincol=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana México > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/uninimx=5Fproduccion=5Fcientifica.html>
Universidad Internacional do Cuanza > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/unic=5Fproduccion=5Fcientifica.html>
Universidad de La Romana > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/uniromana=5Fproduccion=5Fcientifica.html> Abierto Inglés The digital era, while offering unparalleled access to information, has also seen the rapid proliferation of fake news, a phenomenon with the potential to distort public perception and influence sociopolitical events. The need to identify and mitigate the spread of such disinformation is crucial for maintaining the integrity of public discourse. This research introduces a multi-view learning framework that achieves high precision by systematically integrating diverse feature perspectives. Using a diverse dataset of news articles, the approach combines several feature extraction methods, including TF-IDF for individual words (unigrams) and word pairs (bigrams), and counts vectorization to represent text in multiple ways. To capture additional linguistic and semantic information, advanced features, such as readability scores, sentiment scores, and topic distributions generated by latent Dirichlet allocation (LDA), are also extracted. The framework implements a multi-view learning strategy, where separate views focus on basic text, linguistic, and semantic features, feeding into a final ensemble model. Models like logistic regression, random forest, and LightGBM are employed to analyze each view, and a stacked ensemble integrates their outputs. Through rigorous tenfold cross-validation, our proposed multi-view ensemble achieves a state-of-the-art accuracy of 0.9994, outperforming strong baselines, including single-view models and a BERT-based classifier. Robustness testing confirms the model maintains high accuracy even under data perturbations, establishing the value of structured feature separation and intelligent ensemble techniques. metadata Aslam, Zahid; Missen, Malik Muhammad Saad; Ghaffar, Arslan Abdul; Mehmood, Arif; Gracia Villar, Mónica; Silva Alvarado, Eduardo René y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, eduardo.silva@funiber.org, SIN ESPECIFICAR     <http://repositorio.uneatlantico.es/id/eprint/17864/1/s10115-025-02588-y.pdf>     (2025) Advancing fake news combating using machine learning: a hybrid model approach.  Knowledge and Information Systems.   ISSN 0219-1377     
document_url: http://repositorio.uneatlantico.es/id/eprint/17864/1/s10115-025-02588-y.pdf