A deep learning approach for Named Entity Recognition in Urdu language

Artículo Materias > Ingeniería Universidad Europea del Atlántico > Investigación > Artículos y libros
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica Abierto Inglés Named Entity Recognition (NER) is a natural language processing task that has been widely explored for different languages in the recent decade but is still an under-researched area for the Urdu language due to its rich morphology and language complexities. Existing state-of-the-art studies on Urdu NER use various deep-learning approaches through automatic feature selection using word embeddings. This paper presents a deep learning approach for Urdu NER that harnesses FastText and Floret word embeddings to capture the contextual information of words by considering the surrounding context of words for improved feature extraction. The pre-trained FastText and Floret word embeddings are publicly available for Urdu language which are utilized to generate feature vectors of four benchmark Urdu language datasets. These features are then used as input to train various combinations of Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), CRF, and deep learning models. The results show that our proposed approach significantly outperforms existing state-of-the-art studies on Urdu NER, achieving an F-score of up to 0.98 when using BiLSTM+GRU with Floret embeddings. Error analysis shows a low classification error rate ranging from 1.24% to 3.63% across various datasets showing the robustness of the proposed approach. The performance comparison shows that the proposed approach significantly outperforms similar existing studies. metadata Khan, Hikmat Ullah; Anam, Rimsha; Anwar, Muhammad Waqas; Jamal, Muhammad Hasan; Bajwa, Usama Ijaz; Diez, Isabel de la Torre; Silva Alvarado, Eduardo René; Soriano Flores, Emmanuel y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, eduardo.silva@funiber.org, emmanuel.soriano@uneatlantico.es, SIN ESPECIFICAR (2024) A deep learning approach for Named Entity Recognition in Urdu language. PLOS ONE, 19 (3). e0300725. ISSN 1932-6203

Texto
journal.pone.0300725.pdf
Available under License Creative Commons Attribution.
Descargar (1MB)

URL Oficial: http://doi.org/10.1371/journal.pone.0300725

Resumen

Named Entity Recognition (NER) is a natural language processing task that has been widely explored for different languages in the recent decade but is still an under-researched area for the Urdu language due to its rich morphology and language complexities. Existing state-of-the-art studies on Urdu NER use various deep-learning approaches through automatic feature selection using word embeddings. This paper presents a deep learning approach for Urdu NER that harnesses FastText and Floret word embeddings to capture the contextual information of words by considering the surrounding context of words for improved feature extraction. The pre-trained FastText and Floret word embeddings are publicly available for Urdu language which are utilized to generate feature vectors of four benchmark Urdu language datasets. These features are then used as input to train various combinations of Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), CRF, and deep learning models. The results show that our proposed approach significantly outperforms existing state-of-the-art studies on Urdu NER, achieving an F-score of up to 0.98 when using BiLSTM+GRU with Floret embeddings. Error analysis shows a low classification error rate ranging from 1.24% to 3.63% across various datasets showing the robustness of the proposed approach. The performance comparison shows that the proposed approach significantly outperforms similar existing studies.

Tipo de Documento:	Artículo
Clasificación temática:	Materias > Ingeniería
Divisiones:	Universidad Europea del Atlántico > Investigación > Artículos y libros Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica
Depositado:	30 May 2024 20:51
Ultima Modificación:	09 Dic 2024 23:30
URI:	https://repositorio.uneatlantico.es/id/eprint/12369

Acciones (logins necesarios)

Ver Objeto

Enzymatic treatment shapes in vitro digestion pattern of phenolic compounds in mulberry juice

The health benefits of mulberry fruit are closely associated with its phenolic compounds. However, the effects of enzymatic treatments on the digestion patterns of these compounds in mulberry juice remain largely unknown. This study investigated the impact of pectinase (PE), pectin lyase (PL), and cellulase (CE) on the release of phenolic compounds in whole mulberry juice. The digestion patterns were further evaluated using an in vitro simulated digestion model. The results revealed that PE significantly increased chlorogenic acid content by 77.8 %, PL enhanced cyanidin-3-O-glucoside by 20.5 %, and CE boosted quercetin by 44.5 %. Following in vitro digestion, the phenolic compound levels decreased differently depending on the treatment, while cyanidin-3-O-rutinoside content increased across all groups. In conclusion, the selected enzymes effectively promoted the release of phenolic compounds in mulberry juice. However, during gastrointestinal digestion, the degradation of phenolic compounds surpassed their enhanced release, with effects varying based on the compound's structure.

Artículos y libros

Peihuan Luo mail , Jian Ai mail , Qiongyao Wang mail , Yihang Lou mail , Zhiwei Liao mail , Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Maurizio Battino mail maurizio.battino@uneatlantico.es, Elwira Sieniawska mail , Weibin Bai mail , Lingmin Tian mail ,

Luo

open

Avelumab maintenance in advanced urothelial carcinoma: real-world data from Northern Spain (AVEBLADDER study)

Background Before the incorporation of enfortumab vedotin with pembrolizumab, the standard of care for patients with locally advanced or metastatic urothelial carcinoma who do not progress after platinum-based chemotherapy was avelumab maintenance therapy, as demonstrated by the JAVELIN 100 trial. However, real-world European data remain scarce. Patients and Methods AVEBLADDER is a retrospective study conducted at 14 hospitals in Northern Spain, including patients with locally advanced or metastatic urothelial carcinoma diagnosed between January 2021 and June 2023. Outcomes of overall survival (OS) and progression-free survival (PFS) were analyzed for patients treated with platinum-based chemotherapy, with and without subsequent avelumab maintenance therapy. non-avelumab patients. Median PFS was 11.33 months (95% CI: 10–13.6) with avelumab and 6.43 months (95% CI: 6–7.6) without. One-year OS probabilities were 81.6% vs. 45.6% (p < 0.001) in the avelumab and non-avelumab groups, respectively. No unexpected toxicities were reported. Conclusions Despite proven survival benefits, avelumab uptake in real-world practice is limited by barriers like access, reimbursement, and awareness. These findings align with JAVELIN 100 and underscore the need for further real-world studies to address treatment disparities.

Artículos y libros

Marta Sotelo mail , Mireia Peláez mail mireia.pelaez@uneatlantico.es, Laura Basterretxea mail , Estrella Varga mail , Ricardo Sánchez-Escribano mail , Eduardo Pujol Obis mail , Carmen Santander mail , Mireia Martínez Kareaga mail , Mikel Arruti Ibarbia mail , Inmaculada Rodríguez Ledesma mail , Carlos Álvarez Fernández mail , Pablo Piedra mail , Verónica Calderero Aragón mail , Nuria Lainez mail , Juan Antonio Verdún Aguilar mail , Irene Gil Arnáiz mail , Ricardo Fernández mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Ignacio Duran mail ,

Sotelo

open

More than Socio- and Geo-demographics: How Complementary Education and Business Experience Shape Students' Financial Behaviour in Europe

Although financial literacy would seem relevant to university students’ education, it is not currently offered as a transversal subject within European academic curricula. It should therefore come as no surprise that a common solution are ad-hoc specific courses, with students often additionally acquiring valuable learning through their own experiences in business environments. With this and the recent literature on the drivers of financial literacy in mind, the authors decided to explore the context shaped by socio-demographic, academic and work-related factors that either promote or prevent European university students from developing appropriate financial skills, such as managing personal finances, planning for short- and long-term needs, and distinguishing among different sources of non-traditional funding. The study used a sample of 881 undergraduate and postgraduate university students from Romania, Poland and Spain from different studies, with information obtained through an anonymous online survey. The applied econometric model was cumulative regression with location-scale estimation using the R software, version 4.3.2, with variables associated directly with the development of basic financial skills being age, gender, country, but also specific training as well as work and entrepreneurial experience. The authors stress the importance of providing financial management education connected to the reality, especially the business and entrepreneurial environment.

Artículos y libros

Inna Alexeeva-Alexeev mail inna.alexeeva@uneatlantico.es, Ana Kaminska mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Sorin Gabriel Anton mail ,

Alexeeva-Alexeev

A novel machine learning-based proposal for early prediction of endometriosis disease

Background Endometriosis is one of the causes of female infertility, with some studies estimating its prevalence at around 10 % of reproductive-age women worldwide and between 30 and 50 % in symptomatic women. However, its diagnosis is complex and often delayed, highlighting the need for more accessible and accurate diagnostic methods. The difficulty lies in its diverse etiology and the variability of symptoms among those affected. Methods This study proposes a predictive model based on supervised machine learning for the early identification of endometriosis, providing support for decision-making by healthcare professionals. For this purpose, an anonymised dataset of 5,143 female patients diagnosed with endometriosis at the private fertility clinic Inebir was used. The model integrates clinical records and genetic analysis through supervised machine learning algorithms, focusing on clinical variables and pathogenic and potentially pathogenic genetic variants. Results The developed predictive model achieves high accuracy in identifying the presence of endometriosis, highlighting the importance of combining clinical and genetic data in diagnosis. The integration of this data into the DELFOS platform, a clinical decision support system, demonstrates the utility of machine learning in improving the diagnosis of endometriosis. Conclusions The findings underscore the potential of clinical and genetic factors in the early diagnosis of endometriosis using supervised machine learning algorithms. This study contributes to the classification of clinical variables that influence endometriosis, offering a valuable tool for clinicians in making therapeutic and management decisions for their female patients.

Artículos y libros

Elena Enamorado-Díaz mail , Leticia Morales-Trujillo mail , Julián-Alberto García-García mail , Ana Teresa Marcos Rodríguez mail anateresa.marcos@uneatlantico.es, José Manuel Navarro-Pando mail jose.navarro@uneatlantico.es, María-José Escalona-Cuaresma mail ,

Enamorado-Díaz

open

Detecting hate in diversity: a survey of multilingual code-mixed image and video analysis

The proliferation of damaging content on social media in today’s digital environment has increased the need for efficient hate speech identification systems. A thorough examination of hate speech detection methods in a variety of settings, such as code-mixed, multilingual, visual, audio, and textual scenarios, is presented in this paper. Unlike previous research focusing on single modalities, our study thoroughly examines hate speech identification across multiple forms. We classify the numerous types of hate speech, showing how it appears on different platforms and emphasizing the unique difficulties in multi-modal and multilingual settings. We fill research gaps by assessing a variety of methods, including deep learning, machine learning, and natural language processing, especially for complicated data like code-mixed and cross-lingual text. Additionally, we offer key technique comparisons, suggesting future research avenues that prioritize multi-modal analysis and ethical data handling, while acknowledging its benefits and drawbacks. This study attempts to promote scholarly research and real-world applications on social media platforms by acting as an essential resource for improving hate speech identification across various data sources.

Artículos y libros

Hafiz Muhammad Raza Ur Rehman mail , Mahpara Saleem mail , Muhammad Zeeshan Jhandir mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Helena Garay mail helena.garay@uneatlantico.es, Imran Ashraf mail ,

Raza Ur Rehman

Enlaces de interés

Enlaces de interés

TEMÁTICA

ACCESO

IDIOMA

A deep learning approach for Named Entity Recognition in Urdu language

Resumen

Acciones (logins necesarios)

Filtros