Página 1 dos resultados de 8 itens digitais encontrados em 0.010 segundos
Resultados filtrados por Publicador: Universidade do Minho

Parallel corpora word alignment and applications

Simões, Alberto
Fonte: Universidade do Minho Publicador: Universidade do Minho
Tipo: Dissertação de Mestrado
Publicado em //2004 Português
Relevância na Pesquisa
Parallel corpora are valuable resources on natural language processing and, in special, on the translation area. They can be used not only by translators, but also analyzed and processed by computers to learn and extract information about the languages. In this document, we talk about some processes related with the parallel corpora life cycle. We will focus on the parallel corpora word alignment. The necessity for a robust word aligner arrived with the TerminUM project which goal is to gather parallel corpora from different sources, align, analyze and use them to create bilingual resources like terminology or translation memories for machine translation. Aligner, an open-source word aligner developed by Djoerd Hiemstra. Its results were interesting but it worked only for small sized corpora. The work done began with the reengineering of Twente-Aligner, followed by the analysis of the alignment results and the development of several tools based on the extracted probabilistic dictionaries. The re-engineering process was based on formal methods: the algorithms and data structures were formalized, optimized and re-implemented. The timings and alignment results were analysed. The speed improvement derived from the re-engineering process and the scale-up derived of the alignment by chunks...