An open-source toolkit for mining Wikipedia. The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipediaʼs rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipediaʼs content and structure, and includes a Java API to provide access to them. Wikipediaʼs articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machine-learned semantic relatedness measures and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques.
Keywords for this software
References in zbMATH (referenced in 10 articles , 1 standard article )
Showing results 1 to 10 of 10.
- Jiang, Yuncheng: A formal model of semantic computing (2019)
- Dinov, Ivo D.; Siegrist, Kyle; Pearl, Dennis K.; Kalinin, Alexandr; Christou, Nicolas: Probability \textitDistributome: a web computational infrastructure for exploring the properties, interrelations, and applications of probability distributions (2016)
- Flati, Tiziano; Vannella, Daniele; Pasini, Tommaso; Navigli, Roberto: MultiWiBi: the multilingual Wikipedia bitaxonomy project (2016)
- Astrakhantsev, N. A.; Fedorenko, D. G.; Turdakov, D. Yu.: Methods for automatic term recognition in domain-specific text collections: A survey (2015) ioport
- David Milne; Ian H. Witten: An open-source toolkit for mining Wikipedia (2013) not zbMATH
- Milne, David; Witten, Ian H.: An open-source toolkit for mining wikipedia (2013) ioport
- Navigli, Roberto; Ponzetto, Simone Paolo: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network (2012)
- Medelyan, Olena; Milne, David; Legg, Catherine; Witten, Ian H.: Mining meaning from Wikipedia (2009) ioport
- Wang, Pu; Hu, Jian; Zeng, Hua-Jun; Chen, Zheng: Using Wikipedia knowledge to improve text classification (2009) ioport
- Medelyan, Olena; Legg, Catherine; Milne, David N.; Witten, Ian H.: Mining meaning from Wikipedia (2008) ioport