MLlib: machine learning in apache spark. Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLlib supports several languages and provides a high-level API that leverages Spark’s rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLlib has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.
References in zbMATH (referenced in 5 articles , 1 standard article )
Showing results 1 to 5 of 5.
- Andrea Esuli, Tiziano Fagni, Alejandro Moreo Fernandez: JaTeCS an open-source JAva TExt Categorization System (2017) arXiv
- Ralf Mikut, Andreas Bartschat, Wolfgang Doneit, Jorge Angel Gonzalez Ordiano, Benjamin Schott, Johannes Stegmaier, Simon Waczowicz, Markus Reischl: The MATLAB Toolbox SciXMiner: User’s Manual and Programmer’s Guide (2017) arXiv
- Iwen, M.A.; Ong, B.W.: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks (2016)
- Meng, Xiangrui; Bradley, Joseph; Yavuz, Burak; Sparks, Evan; Venkataraman, Shivaram; Liu, Davies; Freeman, Jeremy; Tsai, Db; Amde, Manish; Owen, Sean; Xin, Doris; Xin, Reynold; Franklin, Michael J.; Zadeh, Reza; Zaharia, Matei; Talwalkar, Ameet: MLlib: machine learning in Apache Spark (2016)
- Tianqi Chen, Carlos Guestrin: XGBoost: A Scalable Tree Boosting System (2016) arXiv