A Low-Resourced Peruvian Language Identification Model

Linares A.E.; Oncevay-Marcos A.

Publicación:

A Low-Resourced Peruvian Language Identification Model

Fecha

2017

Autores

Linares A.E.

Oncevay-Marcos A.

Editor

CEUR-WS

Abstracto

Due to the linguistic revitalization in Peru´ through the last years, there is a growing interest to reinforce the bilingual education in the country and to increase the research focused in its native languages. From the computer science perspective, one of the first steps to support the languages study is the implementation of an automatic language identification tool using machine learning methods. Therefore, this work focuses in two steps: (1) the building of a digital and annotated corpus for 16 Peruvian native languages extracted from documents in web repositories, and (2) the fit of a supervised learning model for the language identification task using features identified from related studies in the state of the art, such as ngrams. The obtained results were promising (97% in average precision), and it is expected to take advantage of the corpus and the model for more complex tasks in the future.

Palabras clave

Learning systems, Big data, Education, Information management, Automatic language identification, Bilingual education, Complex task

URI

https://hdl.handle.net/20.500.12390/488

Colecciones

1.1 Eventos institucionales
6.1 Proyectos de investigación científica

Página completa del artículo

Publicación:

A Low-Resourced Peruvian Language Identification Model

Fecha

Autores

Título de la revista

Revista ISSN

Título del volumen

Editor

Proyectos de investigación

Unidades organizativas

Número de la revista

Abstracto

Descripción

Palabras clave

Citación

URI

Colecciones

Publicación: A Low-Resourced Peruvian Language Identification Model

context-menu.actions.label

Fecha

Autores

Título de la revista

Revista ISSN

Título del volumen

Editor

Proyectos de investigación

Unidades organizativas

Número de la revista

Abstracto

Descripción

Palabras clave

Citación

URI

Colecciones

Publicación:

A Low-Resourced Peruvian Language Identification Model