eprintid: 17792
rev_number: 8
eprint_status: archive
userid: 2
dir: disk0/00/01/77/92
datestamp: 2025-05-15 23:30:07
lastmod: 2025-05-15 23:30:09
status_changed: 2025-05-15 23:30:07
type: article
metadata_visibility: show
creators_name: Azim, Komal
creators_name: Tahir, Alishba
creators_name: Shahroz, Mobeen
creators_name: Karamti, Hanen
creators_name: Vázquez, Annia A.
creators_name: Rojas Vistorte, Angel Olider
creators_name: Ashraf, Imran
creators_id: 
creators_id: 
creators_id: 
creators_id: 
creators_id: annia.almeyda@uneatlantico.es
creators_id: angel.rojas@uneatlantico.es
creators_id: 
title: Ensemble stacked model for enhanced identification of sentiments from IMDB reviews
ispublished: pub
subjects: uneat_eng
divisions: uneatlantico_produccion_cientifica
divisions: unincol_produccion_cientifica
divisions: uninimx_produccion_cientifica
divisions: uninipr_produccion_cientifica
divisions: unic_produccion_cientifica
divisions: uniromana_produccion_cientifica
full_text_status: public
keywords: Sentiment analysis, Text classification, Urdu text analysis, Machine learning, Ensemble learning
abstract: The emergence of social media platforms led to the sharing of ideas, thoughts, events, and reviews. The shared views and comments contain people’s sentiments and analysis of these sentiments has emerged as one of the most popular fields of study. Sentiment analysis in the Urdu language is an important research problem similar to other languages, however, it is not investigated very well. On social media platforms like X (Twitter), billions of native Urdu speakers use the Urdu script which makes sentiment analysis in the Urdu language important. In this regard, an ensemble model RRLS is proposed that stacks random forest, recurrent neural network, logistic regression (LR), and support vector machine (SVM). The Internet Movie Database (IMDB) movie reviews and Urdu tweets are examined in this study using Urdu sentiment analysis. The Urdu hack library was used to preprocess the Urdu data, which includes preprocessing operations including normalizing individual letters, merging them, including spaces, etc. concerning punctuation. The problem of accurately encoding Urdu characters and replacing Arabic letters with their Urdu equivalents is fixed by the normalization module. Several models are adopted in this study for extensive evaluation of their accuracy for Urdu sentiment analysis. While the results promising, among machine learning models, the SVM and LR attained an accuracy of 87%, according to performance criteria such as F-measure, accuracy, recall, and precision. The accuracy of the long short-term memory (LSTM) and bidirectional LSTM (BiLSTM) was 84%. The suggested ensemble RRLS model performs better than other learning algorithms and achieves a 90% accuracy rate, outperforming current methods. The use of the synthetic minority oversampling technique (SMOTE) is observed to improve the performance and lead to 92.77% accuracy.
date: 2025-04
publication: Scientific Reports
volume: 15
number: 1
id_number: doi:10.1038/s41598-025-97561-8
refereed: TRUE
issn: 2045-2322
official_url: http://doi.org/10.1038/s41598-025-97561-8
access: open
language: en
citation:   Artículo Materias > Ingeniería <http://repositorio.uneatlantico.es/view/subjects/uneat=5Feng.html> Universidad Europea del Atlántico > Investigación > Artículos y libros <http://repositorio.uneatlantico.es/view/divisions/uneatlantico=5Fproduccion=5Fcientifica.html>
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/unincol=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana México > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/uninimx=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/uninipr=5Fproduccion=5Fcientifica.html>
Universidad Internacional do Cuanza > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/unic=5Fproduccion=5Fcientifica.html>
Universidad de La Romana > Investigación > Producción Científica <http://repositorio.uneatlantico.es/view/divisions/uniromana=5Fproduccion=5Fcientifica.html> Abierto Inglés The emergence of social media platforms led to the sharing of ideas, thoughts, events, and reviews. The shared views and comments contain people’s sentiments and analysis of these sentiments has emerged as one of the most popular fields of study. Sentiment analysis in the Urdu language is an important research problem similar to other languages, however, it is not investigated very well. On social media platforms like X (Twitter), billions of native Urdu speakers use the Urdu script which makes sentiment analysis in the Urdu language important. In this regard, an ensemble model RRLS is proposed that stacks random forest, recurrent neural network, logistic regression (LR), and support vector machine (SVM). The Internet Movie Database (IMDB) movie reviews and Urdu tweets are examined in this study using Urdu sentiment analysis. The Urdu hack library was used to preprocess the Urdu data, which includes preprocessing operations including normalizing individual letters, merging them, including spaces, etc. concerning punctuation. The problem of accurately encoding Urdu characters and replacing Arabic letters with their Urdu equivalents is fixed by the normalization module. Several models are adopted in this study for extensive evaluation of their accuracy for Urdu sentiment analysis. While the results promising, among machine learning models, the SVM and LR attained an accuracy of 87%, according to performance criteria such as F-measure, accuracy, recall, and precision. The accuracy of the long short-term memory (LSTM) and bidirectional LSTM (BiLSTM) was 84%. The suggested ensemble RRLS model performs better than other learning algorithms and achieves a 90% accuracy rate, outperforming current methods. The use of the synthetic minority oversampling technique (SMOTE) is observed to improve the performance and lead to 92.77% accuracy. metadata Azim, Komal; Tahir, Alishba; Shahroz, Mobeen; Karamti, Hanen; Vázquez, Annia A.; Rojas Vistorte, Angel Olider y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, annia.almeyda@uneatlantico.es, angel.rojas@uneatlantico.es, SIN ESPECIFICAR     <http://repositorio.uneatlantico.es/id/eprint/17792/1/s41598-025-97561-8.pdf>     (2025) Ensemble stacked model for enhanced identification of sentiments from IMDB reviews.  Scientific Reports, 15 (1).   ISSN 2045-2322     
document_url: http://repositorio.uneatlantico.es/id/eprint/17792/1/s41598-025-97561-8.pdf