MedPost: A part-of-speech tagger for BioMedical text. The MedPost/SKR POS Tagger is an Java implementation of the MedPost/SKR Part of Speech Tagger for BioMedical Text. The MedPost Tagger was originally developed by Larry Smith, Tom RindFlesch, and W. John Wilbur from the National Center for Biotechnology Information (NCBI) [Smith, Wilbur], and Lister Hill National Center for Biomedical Communications (LHNCBC) [Rindflesch]. MedPost is currently written in a combination of C++ and Perl. The paper is accessible via the following URL: MedPost: A Part of Speech Tagger for BioMedical Text. Smith et al. Bioinformatics 2004;0:2271-0.. The MedPost/SKR Tagger is a Java-based implementation of the MedPost Tagger specifically formulated for the Semantic Knowledge Representation (SKR) work. MedPost/SKR has modified functionality and only produces SPECIALIST lexicon tags. The base algorithms are consistent between MedPost and MedPost/SKR. MedPost is a stochastic part of speech tagger employing a hidden Markov model (HMM) to combine contextual information with lexical information to improve on baseline tagging accuracy. MedPost breaks down the original text into sentences and then tokenizes each sentence before finally tagging the text. A static table of bigrams derived during the initial training phase is used to estimate the transition probabilities. The output probabilities of the HMM are determined for words in the lexicon assuming equal probability for the possible tags. Output probabilities for unknown words are based on word orthography (e.g., uper or lowercase, numerics, etc), and word endings up to 4 letters long. The Viterbi algorithm is used to find the most likely tag sequence in the HMM matching the tokens. MedPost was trained specifically for tagging biological text by using MEDLINE abstracts as the training corpus.
Keywords for this software
References in zbMATH (referenced in 3 articles )
Showing results 1 to 3 of 3.
- Dai, Hong-Jie; Chang, Yen-Ching; Tsai, Richard Tzong-Han; Hsu, Wen-Lian: New challenges for biological text-mining in the next decade (2009)
- Rokach, Lior; Romano, Roni; Maimon, Oded: Negation recognition in medical narrative reports (2008)
- Smith, L.; Wilbur, W.J.: Retrieving definitional content for ontology development (2004)