TMG
TMG: A MATLAB Toolbox for Generating Term-Document Matrices from Text Collections A wide range of computational kernels in data mining and information retrieval from text collections involve techniques from linear algebra. These kernels typically operate on data that are presented in the form of large sparse term-document matrices (tdm). We present TMG, a research and teaching toolbox for the generation of sparse tdms from text collections and for the incremental modification of these tdms by means of additions or deletions. The toolbox is written entirely in MATLAB, a popular problem-solving environment that is powerful in computational linear algebra, in order to streamline document preprocessing and prototyping of algorithms for information retrieval. Several design issues that concern the use of MATLAB sparse infrastructure and data structures are addressed. We illustrate the use of the tool in numerical explorations of the effect of stemming and different term-weighting policies on the performance of querying and clustering tasks
Keywords for this software
References in zbMATH (referenced in 10 articles )
Showing results 1 to 10 of 10.
Sorted by year (- Scott, Tony C.; Therani, Madhusudan; Wang, Xing M.: Data clustering with quantum mechanics (2017)
- Vecharynski, Eugene; Saad, Yousef: Fast updating algorithms for latent semantic indexing (2014)
- Koessler, Denise R.; Martin, Benjamin W.; Kiefer, Bruce E.; Berry, Michael W.: The effects of tabular-based content extraction on patent document clustering (2012)
- Cai, Ruichu; Zhang, Zhenjie; Hao, Zhifeng: BASSUM: a Bayesian semi-supervised method for classification feature selection (2011)
- Dunlavy, Daniel M.; Kolda, Tamara G.; Kegelmeyer, W. Philip: Multilinear algebra for analyzing data with multiple linkages (2011)
- de Castro, Pablo A. D.; de França, Fabrício O.; Ferreira, Hamilton M.; Palermo Coelho, Guilherme; Von Zuben, Fernando J.: Query expansion using an immune-inspired biclustering algorithm (2010)
- Boutsidis, C.; Gallopoulos, E.: SVD based initialization: A head start for nonnegative matrix factorization (2008)
- Fritzsche, David; Mehrmann, Volker; Szyld, Daniel B.; Virnik, Elena: An SVD approach to identifying metastable states of Markov chains (2008)
- Laflamme-Sanders, Alexandra; Zhu, Mu: LAGO on the unit sphere (2008)
- Zhu, Mu; Ghodsi, Ali: Automatic dimensionality selection from the scree plot via the use of profile likelihood (2006)