MLlib: machine learning in apache spark. Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLlib supports several languages and provides a high-level API that leverages Spark’s rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLlib has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.

References in zbMATH (referenced in 25 articles , 1 standard article )

Showing results 1 to 20 of 25.
Sorted by year (citations)

1 2 next

  1. Avalos, Omar: GSA for machine learning problems: a comprehensive overview (2021)
  2. Berthold, Michael R.; Fillbrunn, Alexander; Siebes, Arno: Widening: using parallel resources to improve model quality (2021)
  3. Fernandez-Basso, Carlos; Ruiz, M. Dolores; Martin-Bautista, Maria J.: Spark solutions for discovering fuzzy association rules in big data (2021)
  4. Haiping Lu, Xianyuan Liu, Robert Turner, Peizhen Bai, Raivo E Koot, Shuo Zhou, Mustafa Chasmai, Lawrence Schobs: PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python (2021) arXiv
  5. Anil, Robin; Capan, Gokhan; Drost-Fromm, Isabel; Dunning, Ted; Friedman, Ellen; Grant, Trevor; Quinn, Shannon; Ranjan, Paritosh; Schelter, Sebastian; Yılmazel, Özgür: Apache Mahout: machine learning on distributed dataflow systems (2020)
  6. David B. Dahl: Integration of R and Scala Using rscala (2020) not zbMATH
  7. Lu, Haihao; Mazumder, Rahul: Randomized gradient boosting machine (2020)
  8. Pérez-Chacón, R.; Asencio-Cortés, G.; Martínez-Álvarez, F.; Troncoso, A.: Big data time series forecasting based on pattern sequence similarity and its application to the electricity demand (2020)
  9. Salehi, Abbas; Masoumi, Behrooz: KATZ centrality with biogeography-based optimization for influence maximization problem (2020)
  10. Raissi, Maziar; Babaee, Hessam; Karniadakis, George Em: Parametric Gaussian process regression for big data (2019)
  11. Roy, Asim; Qureshi, Shiban; Pande, Kartikeya; Nair, Divitha; Gairola, Kartik; Jain, Pooja; Singh, Suraj; Sharma, Kirti; Jagadale, Akshay; Lin, Yi-Yang; Sharma, Shashank; Gotety, Ramya; Zhang, Yuexin; Tang, Ji; Mehta, Tejas; Sindhanuru, Hemanth; Okafor, Nonso; Das, Santak; Gopal, Chidambara N.; Rudraraju, Srinivasa B.; Kakarlapudi, Avinash V.: Performance comparison of machine learning platforms (2019)
  12. Tsamardinos, Ioannis; Borboudakis, Giorgos; Katsogridakis, Pavlos; Pratikakis, Polyvios; Christophides, Vassilis: A greedy feature selection algorithm for big data of high dimensionality (2019)
  13. Xiao, Lin; Yu, Adams Wei; Lin, Qihang; Chen, Weizhu: DSCOVR: randomized primal-dual block coordinate algorithms for asynchronous distributed optimization (2019)
  14. Yu, Hong; Chen, Yun; Lingras, Pawan; Wang, Guoyin: A three-way cluster ensemble approach for large-scale data (2019)
  15. Gudivada, Venkat N.; Arbabifard, Kamyar: Open-source libraries, application frameworks, and workflow systems for NLP (2018)
  16. Smith, Virginia; Forte, Simone; Ma, Chenxin; Takáč, Martin; Jordan, Michael I.; Jaggi, Martin: CoCoA: a general framework for communication-efficient distributed optimization (2018)
  17. Andrea Esuli, Tiziano Fagni, Alejandro Moreo Fernandez: JaTeCS an open-source JAva TExt Categorization System (2017) arXiv
  18. Bacciu, Davide; Carta, Antonio; Gnesi, Stefania; Semini, Laura: An experience in using machine learning for short-term predictions in smart transportation systems (2017)
  19. Ghesmoune, Mohammed; Azzag, Hanene; Benbernou, Salima; Lebbah, Mustapha; Duong, Tarn; Ouziri, Mourad: Big data: from collection to visualization (2017)
  20. Kanavos, Andreas; Nodarakis, Nikolaos; Sioutas, Spyros; Tsakalidis, Athanasios; Tsolis, Dimitrios; Tzimas, Giannis: Large scale implementations for Twitter sentiment classification (2017)

1 2 next