MLlib: machine learning in apache spark. Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLlib supports several languages and provides a high-level API that leverages Spark’s rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLlib has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.
Keywords for this software
References in zbMATH (referenced in 9 articles , 1 standard article )
Showing results 1 to 9 of 9.
- Kordík, Pavel; Černý, Jan; Frýda, Tomáš: Discovering predictive ensembles for transfer learning and meta-learning (2018)
- Andrea Esuli, Tiziano Fagni, Alejandro Moreo Fernandez: JaTeCS an open-source JAva TExt Categorization System (2017) arXiv
- Bacciu, Davide; Carta, Antonio; Gnesi, Stefania; Semini, Laura: An experience in using machine learning for short-term predictions in smart transportation systems (2017)
- Ghesmoune, Mohammed; Azzag, Hanene; Benbernou, Salima; Lebbah, Mustapha; Duong, Tarn; Ouziri, Mourad: Big Data: from collection to visualization (2017)
- Masegosa, Andrés R.; Martinez, Ana M.; Langseth, Helge; Nielsen, Thomas D.; Salmerón, Antonio; Ramos-López, Darío; Madsen, Anders L.: Scaling up Bayesian variational inference using distributed computing clusters (2017)
- Ralf Mikut, Andreas Bartschat, Wolfgang Doneit, Jorge Angel Gonzalez Ordiano, Benjamin Schott, Johannes Stegmaier, Simon Waczowicz, Markus Reischl: The MATLAB Toolbox SciXMiner: User’s Manual and Programmer’s Guide (2017) arXiv
- Iwen, M.A.; Ong, B.W.: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks (2016)
- Meng, Xiangrui; Bradley, Joseph; Yavuz, Burak; Sparks, Evan; Venkataraman, Shivaram; Liu, Davies; Freeman, Jeremy; Tsai, Db; Amde, Manish; Owen, Sean; Xin, Doris; Xin, Reynold; Franklin, Michael J.; Zadeh, Reza; Zaharia, Matei; Talwalkar, Ameet: MLlib: machine learning in Apache Spark (2016)
- Tianqi Chen, Carlos Guestrin: XGBoost: A Scalable Tree Boosting System (2016) arXiv