Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization
Artículo
Materias > Ingeniería
Universidad Europea del Atlántico > Investigación > Artículos y libros
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Universidad de La Romana > Investigación > Producción Científica
Abierto
Inglés
With the rapid increase of users over social media, cyberbullying, and hate speech problems have arisen over the past years. Automatic hate speech detection (HSD) from text is an emerging research problem in natural language processing (NLP). Researchers developed various approaches to solve the automatic hate speech detection problem using different corpora in various languages, however, research on the Urdu language is rather scarce. This study aims to address the HSD task on Twitter using Roman Urdu text. The contribution of this research is the development of a hybrid model for Roman Urdu HSD, which has not been previously explored. The novel hybrid model integrates deep learning (DL) and transformer models for automatic feature extraction, combined with machine learning algorithms (MLAs) for classification. To further enhance model performance, we employ several hyperparameter optimization (HPO) techniques, including Grid Search (GS), Randomized Search (RS), and Bayesian Optimization with Gaussian Processes (BOGP). Evaluation is carried out on two publicly available benchmarks Roman Urdu corpora comprising HS-RU-20 corpus and RUHSOLD hate speech corpus. Results demonstrate that the Multilingual BERT (MBERT) feature learner, paired with a Support Vector Machine (SVM) classifier and optimized using RS, achieves state-of-the-art performance. On the HS-RU-20 corpus, this model attained an accuracy of 0.93 and an F1 score of 0.95 for the Neutral-Hostile classification task, and an accuracy of 0.89 with an F1 score of 0.88 for the Hate Speech-Offensive task. On the RUHSOLD corpus, the same model achieved an accuracy of 0.95 and an F1 score of 0.94 for the Coarse-grained task, alongside an accuracy of 0.87 and an F1 score of 0.84 for the Fine-grained task. These results demonstrate the effectiveness of our hybrid approach for Roman Urdu hate speech detection.
metadata
Ashiq, Waqar; Kanwal, Samra; Rafique, Adnan; Waqas, Muhammad; Khurshaid, Tahir; Caro Montero, Elizabeth; Bustamante Alonso, Alicia y Ashraf, Imran
mail
SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, elizabeth.caro@uneatlantico.es, alicia.bustamante@uneatlantico.es, SIN ESPECIFICAR
(2024)
Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization.
Scientific Reports, 14 (1).
ISSN 2045-2322
|
Texto
s41598-024-79106-7.pdf Available under License Creative Commons Attribution Non-commercial No Derivatives. Descargar (10MB) | Vista Previa |
Resumen
With the rapid increase of users over social media, cyberbullying, and hate speech problems have arisen over the past years. Automatic hate speech detection (HSD) from text is an emerging research problem in natural language processing (NLP). Researchers developed various approaches to solve the automatic hate speech detection problem using different corpora in various languages, however, research on the Urdu language is rather scarce. This study aims to address the HSD task on Twitter using Roman Urdu text. The contribution of this research is the development of a hybrid model for Roman Urdu HSD, which has not been previously explored. The novel hybrid model integrates deep learning (DL) and transformer models for automatic feature extraction, combined with machine learning algorithms (MLAs) for classification. To further enhance model performance, we employ several hyperparameter optimization (HPO) techniques, including Grid Search (GS), Randomized Search (RS), and Bayesian Optimization with Gaussian Processes (BOGP). Evaluation is carried out on two publicly available benchmarks Roman Urdu corpora comprising HS-RU-20 corpus and RUHSOLD hate speech corpus. Results demonstrate that the Multilingual BERT (MBERT) feature learner, paired with a Support Vector Machine (SVM) classifier and optimized using RS, achieves state-of-the-art performance. On the HS-RU-20 corpus, this model attained an accuracy of 0.93 and an F1 score of 0.95 for the Neutral-Hostile classification task, and an accuracy of 0.89 with an F1 score of 0.88 for the Hate Speech-Offensive task. On the RUHSOLD corpus, the same model achieved an accuracy of 0.95 and an F1 score of 0.94 for the Coarse-grained task, alongside an accuracy of 0.87 and an F1 score of 0.84 for the Fine-grained task. These results demonstrate the effectiveness of our hybrid approach for Roman Urdu hate speech detection.
Tipo de Documento: | Artículo |
---|---|
Palabras Clave: | s Hate speech detection, Deep learning, Model optimization, Urdu text classification |
Clasificación temática: | Materias > Ingeniería |
Divisiones: | Universidad Europea del Atlántico > Investigación > Artículos y libros Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica Universidad de La Romana > Investigación > Producción Científica |
Depositado: | 28 Nov 2024 23:30 |
Ultima Modificación: | 28 Nov 2024 23:30 |
URI: | https://repositorio.uneatlantico.es/id/eprint/15444 |
Acciones (logins necesarios)
![]() |
Ver Objeto |
en
close
Enzymatic treatment shapes in vitro digestion pattern of phenolic compounds in mulberry juice
The health benefits of mulberry fruit are closely associated with its phenolic compounds. However, the effects of enzymatic treatments on the digestion patterns of these compounds in mulberry juice remain largely unknown. This study investigated the impact of pectinase (PE), pectin lyase (PL), and cellulase (CE) on the release of phenolic compounds in whole mulberry juice. The digestion patterns were further evaluated using an in vitro simulated digestion model. The results revealed that PE significantly increased chlorogenic acid content by 77.8 %, PL enhanced cyanidin-3-O-glucoside by 20.5 %, and CE boosted quercetin by 44.5 %. Following in vitro digestion, the phenolic compound levels decreased differently depending on the treatment, while cyanidin-3-O-rutinoside content increased across all groups. In conclusion, the selected enzymes effectively promoted the release of phenolic compounds in mulberry juice. However, during gastrointestinal digestion, the degradation of phenolic compounds surpassed their enhanced release, with effects varying based on the compound's structure.
Peihuan Luo mail , Jian Ai mail , Qiongyao Wang mail , Yihang Lou mail , Zhiwei Liao mail , Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Maurizio Battino mail maurizio.battino@uneatlantico.es, Elwira Sieniawska mail , Weibin Bai mail , Lingmin Tian mail ,
Luo
<a href="/17819/1/1-s2.0-S2214804325000679-main%20%281%29.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
What works in financial education? Experimental evidence on program impact
Financial education is increasingly essential for safeguarding both individual and corporate well-being. This study systematically reviews global financial education experiments using a dual-method framework that integrates a deep learning classifier with advanced multivariate statistical techniques. Our analysis indicates that while short-term improvements in financial literacy are common, such gains tend to diminish over time without ongoing reinforcement. Moreover, the limited impact of digital innovations and monetary incentives suggests that successful financial education depends on more than simply deploying technological solutions or extrinsic rewards. Overall, this review provides valuable insights into the evolving landscape of financial education in a dynamic economic context and underscores the need for sustainable strategies that secure lasting improvements in financial literacy.
Gonzalo Llamosas García mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es,
García
<a href="/17813/1/s12094-025-03950-w.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Background Before the incorporation of enfortumab vedotin with pembrolizumab, the standard of care for patients with locally advanced or metastatic urothelial carcinoma who do not progress after platinum-based chemotherapy was avelumab maintenance therapy, as demonstrated by the JAVELIN 100 trial. However, real-world European data remain scarce. Patients and Methods AVEBLADDER is a retrospective study conducted at 14 hospitals in Northern Spain, including patients with locally advanced or metastatic urothelial carcinoma diagnosed between January 2021 and June 2023. Outcomes of overall survival (OS) and progression-free survival (PFS) were analyzed for patients treated with platinum-based chemotherapy, with and without subsequent avelumab maintenance therapy. non-avelumab patients. Median PFS was 11.33 months (95% CI: 10–13.6) with avelumab and 6.43 months (95% CI: 6–7.6) without. One-year OS probabilities were 81.6% vs. 45.6% (p < 0.001) in the avelumab and non-avelumab groups, respectively. No unexpected toxicities were reported. Conclusions Despite proven survival benefits, avelumab uptake in real-world practice is limited by barriers like access, reimbursement, and awareness. These findings align with JAVELIN 100 and underscore the need for further real-world studies to address treatment disparities.
Marta Sotelo mail , Mireia Peláez mail mireia.pelaez@uneatlantico.es, Laura Basterretxea mail , Estrella Varga mail , Ricardo Sánchez-Escribano mail , Eduardo Pujol Obis mail , Carmen Santander mail , Mireia Martínez Kareaga mail , Mikel Arruti Ibarbia mail , Inmaculada Rodríguez Ledesma mail , Carlos Álvarez Fernández mail , Pablo Piedra mail , Verónica Calderero Aragón mail , Nuria Lainez mail , Juan Antonio Verdún Aguilar mail , Irene Gil Arnáiz mail , Ricardo Fernández mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Ignacio Duran mail ,
Sotelo
<a href="/17814/1/45-58_Alexeeva-Alexeev_Kaminska_Ementor_2_109_2025.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Although financial literacy would seem relevant to university students’ education, it is not currently offered as a transversal subject within European academic curricula. It should therefore come as no surprise that a common solution are ad-hoc specific courses, with students often additionally acquiring valuable learning through their own experiences in business environments. With this and the recent literature on the drivers of financial literacy in mind, the authors decided to explore the context shaped by socio-demographic, academic and work-related factors that either promote or prevent European university students from developing appropriate financial skills, such as managing personal finances, planning for short- and long-term needs, and distinguishing among different sources of non-traditional funding. The study used a sample of 881 undergraduate and postgraduate university students from Romania, Poland and Spain from different studies, with information obtained through an anonymous online survey. The applied econometric model was cumulative regression with location-scale estimation using the R software, version 4.3.2, with variables associated directly with the development of basic financial skills being age, gender, country, but also specific training as well as work and entrepreneurial experience. The authors stress the importance of providing financial management education connected to the reality, especially the business and entrepreneurial environment.
Inna Alexeeva-Alexeev mail inna.alexeeva@uneatlantico.es, Ana Kaminska mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Sorin Gabriel Anton mail ,
Alexeeva-Alexeev
<a href="/17818/1/Art-14-MH_Salud%2B22-1%2B%281%29.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Introduction: Aging is a biological and inevitable phenomenon associated with molecular and cellular damage over time. This process significantly increases the risk of various clinical syndromes, such as frailty and cognitive decline. Consequently, various tools, including physical exercise, have been developed to reduce or prevent these issues in the older population. Objective: The objective of this study is to assess the effectiveness of multicomponent exercise programs in individuals over 65 years, focusing on their effects in reducing signs of frailty and cognitive decline. Methods: Following PRISMA guidelines, searches were conducted in four databases: Pubmed, Google Scholar, Scielo, and Dialnet, selecting a total of twenty-two articles published between 2014 and 2024. Eight studies were chosen where multicomponent training was used to address frailty and cognitive decline. Results: The results from this systematic review indicate that engaging in a multicomponent exercise program for a minimum duration of 8-12 weeks improves signs of frailty and cognitive decline in older individuals. Conclusions: Multicomponent exercise also appears to be an effective tool in preventing and/or reducing disability, frailty, and cognitive decline.
Andrea Charda Colina mail , Marta Victoria Santiago García mail , Susana Pulgar mail susana.pulgar@uneatlantico.es,
Charda Colina