The CMU Statistical Language Modeling (SLM) Toolkit. The Carnegie Mellon Statistical Language Modeling (CMU SLM) Toolkit is a set of unix software tools designed to facilitate language modeling work in the research community. Some of the tools are used to process general textual data into: word frequency lists and vocabularies; word bigram and trigram counts; vocabulary-specific word bigram and trigram counts; bigram- and trigram-related statistics; various Backoff bigram and trigram language models ...

