®®®® SIIA Público

Título del libro: Proceedings Of The 18th International Workshop On Semantic Evaluation, Semeval-2024
Título del capítulo: iimasNLP at SemEval-2024 Task 8: Unveiling structure-aware language models for automatic generated text identification

Autores UNAM:
HELENA MONTSERRAT GOMEZ ADORNO; GEMMA BEL ENGUIX;
Autores externos:

Idioma:

Año de publicación:
2024
Palabras clave:

Classification (of information); Computational linguistics; Computer aided language translation; Artificial intelligence systems; Classification tasks; Human like; Language levels; Language model; Lexical semantics; Multilevels; Structure-aware; Text identification; Writing style; Semantics


Resumen:

Large language models (LLMs) are artificial intelligence systems that can generate text, translate languages, and answer questions in a human-like way. While these advances are impressive, there is concern that LLMs could also be used to generate fake or misleading content. In this work, as a part of our participation in SemEval-2024 Task-8, we investigate the ability of LLMs to identify whether a given text was written by a human or by a specific AI. We believe that human and machine writing style patterns are different from each other, so integrating features at different language levels can help in this classification task. For this reason, we evaluate several LLMs that aim to extract valuable multilevel information (such as lexical, semantic, and syntactic) from the text in their training processing. Our best scores on SubtaskA (monolingual) and SubtaskB were 71.5% and 38.2% in accuracy, respectively (both using the ConvBERT LLM); for both subtasks, the baseline (RoBERTa) achieved an accuracy of 74%.


Entidades citadas de la UNAM: