Stempel - Algorithmic Stemmer for Polish Language. A method for conflation of different inflected word forms is an important component of many Information Retrieval systems. It helps to improve the system’s recall and can significantly reduce the index size. This is especially true for highly-inflectional languages like those from the Slavic language family (Czech, Slovak, Polish, Russian, Bulgarian, etc). This page describes a software package consisting of high-quality stemming tables for Polish, and a universal algorithmic stemmer, which operates using these tables. The stemmer code is taken virtually unchanged from the Egothor project. You can download both the Java software distribution and the stemmer tables prepared using an extensive corpus of Polish language (see details below). This work is available under Apache-style Open Source license - the stemmer code is covered by Egothor License, the tables and other additions are covered by Apache License 2.0. Both licenses allow to use the code in Open Source as well as commercial (closed source) projects.
References in zbMATH (referenced in 1 article )
Showing result 1 of 1.
- Jędrzejewski, Krzysztof; Zamorski, Maurycy: Performance of $k$-nearest neighbors algorithm in opinion classification (2013)