Tagging Romanian Texts: a Case Study for QTAG, a Language Independent Probabilistic Tagger. This paper describes an experiment on tagging Romanian using QTAG, a parts-of-speech tagger that has been developed originally for English, but with a clear separation between the (probabilistic) processing engine and the (language specific)resource data. This way, the tagger is usable across various languages as shown by successful experiments on three quite different languages: English, Swedish and Romanian. After a brief presentation of the QTAG tagger, the paper dwells on language resources for Romanian and the evaluation of the results. A complexity metrics for tagging experiments is proposed which considers the performance of a tagger with respect to the ”difficulty” of a text. Introduction Lexical ambiguity resolution is a key task in natural language processing (Baayen & Sproat, 1996). It can be regarded as a classification problem: an ambiguous lexical item is one that in different contexts can be classified differently and given a specified context the disambiguator /classi...
References in zbMATH (referenced in 3 articles )
Showing results 1 to 3 of 3.
- Zou, Xuchang; Settimi, Raffaella; Cleland-Huang, Jane: Improving automated requirements trace retrieval: a study of term-based enhancement methods (2010) ioport
- De La Calle, Guillermo; García-Remesal, Miguel; Chiesa, Stefano; De La Iglesia, Diana; Maojo, Victor: BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature (2009) ioport
- Nadeau, David; Turney, Peter D.: A supervised learning approach to acronym identification (2005)