gensim

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible. Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents. Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.


References in zbMATH (referenced in 13 articles )

Showing results 1 to 13 of 13.
Sorted by year (citations)

  1. Bioglio, Livio; Rho, Valentina; Pensa, Ruggero G.: Ranking by inspiration: a network science approach (2020)
  2. Veale, Tony: Changing channels: divergent approaches to the creative streaming of texts (2020)
  3. Mortveit, Henning S.; Pederson, Ryan D.: Attractor stability in finite asynchronous biological system models (2019)
  4. Aggarwal, Charu C.: Machine learning for text (2018)
  5. Doss, Hani; Park, Yeonhee: An MCMC approach to empirical Bayes inference and Bayesian sensitivity analysis via empirical processes (2018)
  6. George, Clint P.; Doss, Hani: Principled selection of hyperparameters in the latent Dirichlet allocation model (2018)
  7. Schreiber, Jacob: pomegranate: fast and flexible probabilistic modeling in Python (2018)
  8. Zhang, Yazhou; Song, Dawei; Zhang, Peng; Wang, Panpan; Li, Jingfei; Li, Xiang; Wang, Benyou: A quantum-inspired multimodal sentiment analysis framework (2018)
  9. Azqueta-Gavaldón, Andrés: Developing news-based economic policy uncertainty index with unsupervised machine learning (2017)
  10. Kaliszyk, Cezary; Urban, Josef: MizAR 40 for Mizar 40 (2015)
  11. Gerlach, Martin; Altmann, Eduardo G.: Scaling laws and fluctuations in the statistics of word frequencies (2014)
  12. Mu, Tingting; Miwa, Makoto; Tsujii, Junichi; Ananiadou, Sophia: Discovering robust embeddings in (dis)similarity space for high-dimensional linguistic features (2014)
  13. Borbinha, José; Bouche, Thierry; Nowiński, Aleksander; Sojka, Petr: Project EuDML -- a first year demonstration (2011)