QTAG

Tagging Romanian Texts: a Case Study for QTAG, a Language Independent Probabilistic Tagger. This paper describes an experiment on tagging Romanian using QTAG, a parts-of-speech tagger that has been developed originally for English, but with a clear separation between the (probabilistic) processing engine and the (language specific)resource data. This way, the tagger is usable across various languages as shown by successful experiments on three quite different languages: English, Swedish and Romanian. After a brief presentation of the QTAG tagger, the paper dwells on language resources for Romanian and the evaluation of the results. A complexity metrics for tagging experiments is proposed which considers the performance of a tagger with respect to the ”difficulty” of a text. Introduction Lexical ambiguity resolution is a key task in natural language processing (Baayen & Sproat, 1996). It can be regarded as a classification problem: an ambiguous lexical item is one that in different contexts can be classified differently and given a specified context the disambiguator /classi...