MLlib: machine learning in apache spark. Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLlib supports several languages and provides a high-level API that leverages Spark’s rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLlib has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.

References in zbMATH (referenced in 9 articles , 1 standard article )

Showing results 1 to 9 of 9.
Sorted by year (citations)

  1. Kordík, Pavel; Černý, Jan; Frýda, Tomáš: Discovering predictive ensembles for transfer learning and meta-learning (2018)
  2. Andrea Esuli, Tiziano Fagni, Alejandro Moreo Fernandez: JaTeCS an open-source JAva TExt Categorization System (2017) arXiv
  3. Bacciu, Davide; Carta, Antonio; Gnesi, Stefania; Semini, Laura: An experience in using machine learning for short-term predictions in smart transportation systems (2017)
  4. Ghesmoune, Mohammed; Azzag, Hanene; Benbernou, Salima; Lebbah, Mustapha; Duong, Tarn; Ouziri, Mourad: Big Data: from collection to visualization (2017)
  5. Masegosa, Andrés R.; Martinez, Ana M.; Langseth, Helge; Nielsen, Thomas D.; Salmerón, Antonio; Ramos-López, Darío; Madsen, Anders L.: Scaling up Bayesian variational inference using distributed computing clusters (2017)
  6. Ralf Mikut, Andreas Bartschat, Wolfgang Doneit, Jorge Angel Gonzalez Ordiano, Benjamin Schott, Johannes Stegmaier, Simon Waczowicz, Markus Reischl: The MATLAB Toolbox SciXMiner: User’s Manual and Programmer’s Guide (2017) arXiv
  7. Iwen, M.A.; Ong, B.W.: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks (2016)
  8. Meng, Xiangrui; Bradley, Joseph; Yavuz, Burak; Sparks, Evan; Venkataraman, Shivaram; Liu, Davies; Freeman, Jeremy; Tsai, Db; Amde, Manish; Owen, Sean; Xin, Doris; Xin, Reynold; Franklin, Michael J.; Zadeh, Reza; Zaharia, Matei; Talwalkar, Ameet: MLlib: machine learning in Apache Spark (2016)
  9. Tianqi Chen, Carlos Guestrin: XGBoost: A Scalable Tree Boosting System (2016) arXiv