®®®® SIIA Público

Título del libro: Annual Conference Of The North American Fuzzy Information Processing Society - Nafips
Título del capítulo: Computing text similarity using Tree Edit Distance

Autores UNAM:
HELENA MONTSERRAT GOMEZ ADORNO;
Autores externos:

Idioma:

Año de publicación:
2015
Palabras clave:

Cost functions; Forestry; Heuristic algorithms; Information retrieval; Information science; Modeling languages; Natural language processing systems; Semantics; Syntactics; Text processing; Vector spaces; Calculation of similarities; Computational model; Dependency trees; NAtural language processing; Research fields; Text similarity; Tree edit distance; Vector space models; Computational linguistics


Resumen:

In this paper, we propose the application of the Tree Edit Distance (TED) for calculation of similarity between syntactic n-grams for further detection of soft similarity between texts. The computation of text similarity is the basic task for many natural language processing problems, and it is an open research field. Syntactic n-grams are text features for Vector Space Model construction extracted from dependency trees. Soft similarity is application of Vector Space Model taking into account similarity of features. First, we discuss the advantages of the application of the TED to syntactic n-grams. Then, we present a procedure based on the TED and syntactic n-grams for calculating soft similarity between texts. © 2015 IEEE.


Entidades citadas de la UNAM: