Contextual Urdu Lemmatization Using Recurrent Neural Network Models

Artículo Materias > Ingeniería Universidad Europea del Atlántico > Investigación > Artículos y libros
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica Abierto Inglés In the field of natural language processing, machine translation is a colossally developing research area that helps humans communicate more effectively by bridging the linguistic gap. In machine translation, normalization and morphological analyses are the first and perhaps the most important modules for information retrieval (IR). To build a morphological analyzer, or to complete the normalization process, it is important to extract the correct root out of different words. Stemming and lemmatization are techniques commonly used to find the correct root words in a language. However, a few studies on IR systems for the Urdu language have shown that lemmatization is more effective than stemming due to infixes found in Urdu words. This paper presents a lemmatization algorithm based on recurrent neural network models for the Urdu language. However, lemmatization techniques for resource-scarce languages such as Urdu are not very common. The proposed model is trained and tested on two datasets, namely, the Urdu Monolingual Corpus (UMC) and the Universal Dependencies Corpus of Urdu (UDU). The datasets are lemmatized with the help of recurrent neural network models. The Word2Vec model and edit trees are used to generate semantic and syntactic embedding. Bidirectional long short-term memory (BiLSTM), bidirectional gated recurrent unit (BiGRU), bidirectional gated recurrent neural network (BiGRNN), and attention-free encoder–decoder (AFED) models are trained under defined hyperparameters. Experimental results show that the attention-free encoder-decoder model achieves an accuracy, precision, recall, and F-score of 0.96, 0.95, 0.95, and 0.95, respectively, and outperforms existing models metadata Hafeez, Rabab; Anwar, Muhammad Waqas; Jamal, Muhammad Hasan; Fatima, Tayyaba; Martínez Espinosa, Julio César; Dzul López, Luis Alonso; Bautista Thompson, Ernesto y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, ulio.martinez@unini.edu.mx, luis.dzul@uneatlantico.es, ernesto.bautista@unini.edu.mx, SIN ESPECIFICAR (2023) Contextual Urdu Lemmatization Using Recurrent Neural Network Models. Mathematics, 11 (2). p. 435. ISSN 2227-7390

Vista Previa

Texto
mathematics-11-00435.pdf
Available under License Creative Commons Attribution.
Descargar (1MB) | Vista Previa

URL Oficial: http://doi.org/10.3390/math11020435

Resumen

In the field of natural language processing, machine translation is a colossally developing research area that helps humans communicate more effectively by bridging the linguistic gap. In machine translation, normalization and morphological analyses are the first and perhaps the most important modules for information retrieval (IR). To build a morphological analyzer, or to complete the normalization process, it is important to extract the correct root out of different words. Stemming and lemmatization are techniques commonly used to find the correct root words in a language. However, a few studies on IR systems for the Urdu language have shown that lemmatization is more effective than stemming due to infixes found in Urdu words. This paper presents a lemmatization algorithm based on recurrent neural network models for the Urdu language. However, lemmatization techniques for resource-scarce languages such as Urdu are not very common. The proposed model is trained and tested on two datasets, namely, the Urdu Monolingual Corpus (UMC) and the Universal Dependencies Corpus of Urdu (UDU). The datasets are lemmatized with the help of recurrent neural network models. The Word2Vec model and edit trees are used to generate semantic and syntactic embedding. Bidirectional long short-term memory (BiLSTM), bidirectional gated recurrent unit (BiGRU), bidirectional gated recurrent neural network (BiGRNN), and attention-free encoder–decoder (AFED) models are trained under defined hyperparameters. Experimental results show that the attention-free encoder-decoder model achieves an accuracy, precision, recall, and F-score of 0.96, 0.95, 0.95, and 0.95, respectively, and outperforms existing models

Tipo de Documento:	Artículo
Palabras Clave:	neural networks; natural language processing; inflectional morphology; derivational morphology; MSC: 68T50
Clasificación temática:	Materias > Ingeniería
Divisiones:	Universidad Europea del Atlántico > Investigación > Artículos y libros Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica
Depositado:	01 Feb 2023 23:30
Ultima Modificación:	21 Oct 2024 23:30
URI:	https://repositorio.uneatlantico.es/id/eprint/5660

Acciones (logins necesarios)

Ver Objeto

open

Effects of a 12-week multicomponent exercise programme on physical function in older adults with cancer: Study protocol for the ONKO-FRAIL randomised controlled trial

Introduction Cancer in older adults is often associated with functional limitations, geriatric syndromes, poor self-rated health, vulnerability, and frailty, and these conditions might worsen treatment-related side effects. Recent guidelines for patients with cancer during and after treatment have documented the beneficial effects of exercise to counteract certain side effects; however, little is known about the role of exercise during cancer treatment in older adults. Materials and Methods This is a multicentre randomised controlled trial in which 200 participants will be allocated to a control group or an intervention group (the sample size has been calculated to detect a clinical difference of 1 point in Short Physical Performance Battery (SPPB) score, assuming an α error of 0.05, a β error of 0.20, and a 10 % loss rate). Patients aged ≥70 years, diagnosed with any type of solid cancer and candidates for systemic treatment are eligible. Subjects in the intervention group are invited to participate in a 12-week supervised multicomponent exercise programme in addition to receiving usual care. Study assessments are conducted at baseline and three months. The primary outcome measure is physical function as assessed by the SPPB. Secondary outcome measures include comprehensive geriatric assessment scores (including social situation, basic and instrumental activities of daily living, cognitive function, depression, nutritional status, polypharmacy, geriatric syndromes, pain, and emotional distress), anthropometric characteristics, frailty status, physical fitness, physical activity, cognitive function, quality of life, fatigue, and nutritional status. Study assessments also include analysis of inflammatory, endocrine, and nutritional mediators in serum and plasma as potential frailty biomarkers at mRNA and protein levels and multiparametric flow cytometric analysis to measure immunosenescence markers on T and NK cells. Discussion This study seeks to extend our knowledge on exercise interventions during systemic anticancer treatment in patients over 70 years of age. Results from this research will guide the management of older adults during systemic treatment in hospitals seeking to enhance the standard of care.

Artículos y libros

Julia García-García mail , Ana Rodriguez-Larrad mail , Maren Martinez de Rituerto Zeberio mail , Jenifer Gómez Mediavilla mail , Borja López-San Vicente mail , Nuria Torrego Artola mail , Izaskun Zeberio Etxetxipia mail , Irati Garmendia mail , Ainhoa Alberro mail , David Otaegui mail , Francisco Borrego Rabasco mail , María M. Caffarel mail , Kalliopi Vrotsou mail , Jon Irazusta mail , Haritz Arrieta mail , Mireia Peláez mail mireia.pelaez@uneatlantico.es, Jon Belloso mail , Laura Basterretxea mail ,

García-García

open

Benchmarking multiple instance learning architectures from patches to pathology for prostate cancer detection and grading using attention-based weak supervision

Histopathological evaluation is necessary for the diagnosis and grading of prostate cancer, which is still one of the most common cancers in men globally. Traditional evaluation is time-consuming, prone to inter-observer variability, and challenging to scale. The clinical usefulness of current AI systems is limited by the need for comprehensive pixel-level annotations. The objective of this research is to develop and evaluate a large-scale benchmarking study on a weakly supervised deep learning framework that minimizes the need for annotation and ensures interpretability for automated prostate cancer diagnosis and International Society of Urological Pathology (ISUP) grading using whole slide images (WSIs). This study rigorously tested six cutting-edge multiple instance learning (MIL) architectures (CLAM-MB, CLAM-SB, ILRA-MIL, AC-MIL, AMD-MIL, WiKG-MIL), three feature encoders (ResNet50, CTransPath, UNI2), and four patch extraction techniques (varying sizes and overlap) using the PANDA dataset (10,616 WSIs), yielding 72 experimental configurations. The methodology used distributed cloud computing to process over 31 million tissue patches, implementing advanced attention mechanisms to ensure clinical interpretability through Grad-CAM visualizations. The optimum configuration (UNI2 encoder with ILRA-MIL, 256 256 patches, 50% overlap) achieved 78.75% accuracy and 90.12% quadratic weighted kappa (QWK), outperforming traditional methods and approaching expert pathologist-level diagnostic capability. Overlapping smaller patches offered the best balance of spatial resolution and contextual information, while domain-specific foundation models performed noticeably better than generic encoders. This work is the first large-scale, comprehensive comparison of weekly supervised MIL methods for prostate cancer diagnosis and grading. The proposed approach has excellent clinical diagnostic performance, scalability, practical feasibility through cloud computing, and interpretability using visualization tools.

Artículos y libros

Naveed Anwer Butt mail , Dilawaiz Sarwat mail , Irene Delgado Noya mail irene.delgado@uneatlantico.es, Kilian Tutusaus mail kilian.tutusaus@uneatlantico.es, Nagwan Abdel Samee mail , Imran Ashraf mail ,

Butt

open

Securing internet of things devices using a hybrid approach

With increased Internet of Things (IoT) devices, complexity and protection are more challenging. Lightweight cryptographic algorithms are secure and suitable for limited-resource environments; however, their hash functions provide encrypted data but not integrity. Strong security features are available, but setup is difficult and expensive. Network security mechanisms increase power consumption and latency. As IoT networks grow, managing cryptographic keys and securely authenticating large numbers of devices become complex tasks. Efficient key management strategies are required to ensure the scalability required. Existing state-of-the-art solutions lack standardization, scalability, complex and costly. Thus, this research proposes a secure solution for IoT resource-constrained devices, combining strong data integrity and lightweight encryption, and is thus named a hybrid. This hybrid approach integrates SHA-512 and the present cipher in our proposed approach and thus ensuring higher security than state-of-the-art models. This intelligent combination not only enhances the algorithm’s resistance against cryptographic attacks but also improves its processing speed. The proposed approach is used to reduce the processing time for encryption in the IoT platform and to preserve the trade-off between security and efficiency. In terms of memory use, execution time, and precision, the proposed approach is compared with recent state-of-the-art research. The experimental results indicate that our approach is efficient using the avalanche, authentication success rate, collision events, and execution time. The efficiency is 53% to 65%, and the avalanche effect indicates sensitivity to input variations, suggesting moderate-to-considerable reactivity to small data changes. The experimental tests conducted across 10,000 and 80,000 runs reveal no collisions and found that the proposed approach is resilient in managing unique IDs. Moreover, our approach performs consistently, with an average execution time of 0.088246 s, ranging from 0.075954 to 0.094583 s. Finally, our approach provides a practical and scalable solution for securing IoT devices in resource-constrained environments, addressing practical problems for IoT devices.

Artículos y libros

R. Sherine Jenny mail , N. Sugirtham mail , B. Thiyaneswaran mail , S. Kumarganesh mail , Martin Sagayam K. mail , Syed Immamul Ansarullah mail , Farhan Amin mail , Isabel de la Torre Díez mail , Carlos Manuel Osorio García mail carlos.osorio@uneatlantico.es, Alina Eugenia Pascual Barrera mail alina.pascual@unini.edu.mx,

Jenny

open

Constraint of Lignin–Carbohydrate Complex Orchestrated on Polyphenol in Oil–Water Interface Targeting Ulcerative Colitis Therapy

The therapeutic potential of polyphenols in ulcerative colitis (UC), mediated through immune modulation and gut microbiota homeostasis. To enhance the oral bioavailability of polyphenols, we architected a colon–targeted W1/O/W2 emulsion system featuring a rationally designed lignin–carbohydrate complex (LCC) as a dual–functional emulsifier system for the first time. Based on the innate structural duality of LCC, which comprising hydrophobic lignin and hydrophilic carbohydrates, we employed LCC for O/W emulsifier. This inherent amphiphilicity was further engineered via laccase–mediated grafting of isovanillin, yielding a modified LCC with tailored lipophilicity for effective W/O interfacial stabilization. The W1/O/W2 emulsion ensured the stability of the encapsulated polyphenols with divergent polarity but also enabled pH–responsive payload release under colonic conditions (pH >7.0). In DSS–induced colitis, the system demonstrated a synergistic effect, the LCC itself acted as a prebiotic to modulate the gut microbiota, specifically enriching short chain fatty acid–producing bacteria, while the released polyphenols reinforced the intestinal barrier, which collectively accelerated mucosal healing. This research proposes a carbon–neutral therapeutic strategy for colitis, not only establishing a proof–of–concept for replacing synthetic emulsifiers with engineered biomass, but also as a multi–functional platform to stabilize colon–targeted co–delivery system and microbiome regulation in colitis.

Artículos y libros

Qian Wu mail , Xingyu Zhang mail , Jingjia Zhang mail , Gaohui Huang mail , Chen Zhou mail , Chunlin Li mail , Xiaojun Huang mail , Jianbo Xiao mail , Nianjie Feng mail , Yuanbin She mail ,

open

A Systematic Literature Review on Integrated Deep Learning and Multi-Agent Vision-Language Frameworks for Pathology Image Analysis and Report Generation

This systematic literature review (SLR) investigates the integration of deep learning (DL), vision-language models(VLMs), and multi-agent systems in the analysis of pathology images and automated report generation. The rapidadvancement of whole-slide imaging (WSI) technologies has posed new challenges in pathology, especially due to thescale and complexity of the data. DL techniques in general and convolutional neural networks (CNNs) and transform-ers in particular have signiﬁcantly enhanced image analysis tasks including segmentation, classiﬁcation, and detection.However, these models often lack generalizability to generate coherent, clinically relevant text, thus necessitating theintegration of VLMs and large language models (LLMs). This review examines the eﬀectiveness of VLMs and LLMsin bridging the gap between visual data and clinical text, focusing on their potential for automating the generationof pathology reports. Additionally, multi-agent systems, which leverage specialized artiﬁcial intelligence (AI) agentsto collaboratively perform diagnostic tasks, are explored for their contributions to improving diagnostic accuracy andscalability. Through a synthesis of recent studies, this review highlights the successes, challenges, and future direc-tions of these AI technologies in pathology diagnostics, oﬀering a comprehensive foundation for the development ofintegrated, AI-driven diagnostic workﬂows.

Artículos y libros

Usama Ali mail , Imran Shafi mail , Jamil Ahmad mail , Arlette Zárate Cáceres mail , Thania Chio Montero mail , Hafiz Muhammad Raza ur Rehman mail , Imran Ashraf mail ,

Ali

Enlaces de interés

Enlaces de interés

TEMÁTICA

ACCESO

IDIOMA

Contextual Urdu Lemmatization Using Recurrent Neural Network Models

Resumen

Acciones (logins necesarios)

Filtros