Apache Spark

Apache Spark: Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.


References in zbMATH (referenced in 41 articles )

Showing results 1 to 20 of 41.
Sorted by year (citations)

1 2 3 next

  1. Anil, Robin; Capan, Gokhan; Drost-Fromm, Isabel; Dunning, Ted; Friedman, Ellen; Grant, Trevor; Quinn, Shannon; Ranjan, Paritosh; Schelter, Sebastian; Yılmazel, Özgür: Apache Mahout: machine learning on distributed dataflow systems (2020)
  2. Kalina, Jan; Vidnerová, Petra: Regression neural networks with a highly robust loss function (2020)
  3. Ketsman, Bas; Albarghouthi, Aws; Koutris, Paraschos: Distribution policies for Datalog (2020)
  4. Lu, Haihao; Mazumder, Rahul: Randomized gradient boosting machine (2020)
  5. Salehi, Abbas; Masoumi, Behrooz: KATZ centrality with biogeography-based optimization for influence maximization problem (2020)
  6. Sambasivan, Rajiv; Das, Sourish; Sahu, Sujit K.: A Bayesian perspective of statistical machine learning for big data (2020)
  7. Liu, Heng; Ditzler, Gregory: A semi-parallel framework for greedy information-theoretic feature selection (2019)
  8. Pan, Xianli; Xu, Yitian: A safe reinforced feature screening strategy for Lasso based on feasible solutions (2019)
  9. Raissi, Maziar; Babaee, Hessam; Karniadakis, George Em: Parametric Gaussian process regression for big data (2019)
  10. Rodrigo, Enrique G.; Aledo, Juan A.; Gámez, José A.: spark-crowd: a spark package for learning from crowdsourced big data (2019)
  11. Rompf, Tiark; Amin, Nada: A SQL to C compiler in 500 lines of code (2019)
  12. Roy, Asim; Qureshi, Shiban; Pande, Kartikeya; Nair, Divitha; Gairola, Kartik; Jain, Pooja; Singh, Suraj; Sharma, Kirti; Jagadale, Akshay; Lin, Yi-Yang; Sharma, Shashank; Gotety, Ramya; Zhang, Yuexin; Tang, Ji; Mehta, Tejas; Sindhanuru, Hemanth; Okafor, Nonso; Das, Santak; Gopal, Chidambara N.; Rudraraju, Srinivasa B.; Kakarlapudi, Avinash V.: Performance comparison of machine learning platforms (2019)
  13. Sainudiin, Raazesh; Teng, Gloria: Minimum distance histograms with universal performance guarantees (2019)
  14. Tsamardinos, Ioannis; Borboudakis, Giorgos; Katsogridakis, Pavlos; Pratikakis, Polyvios; Christophides, Vassilis: A greedy feature selection algorithm for big data of high dimensionality (2019)
  15. Viroli, Mirko; Beal, Jacob; Damiani, Ferruccio; Audrito, Giorgio; Casadei, Roberto; Pianini, Danilo: From distributed coordination to field calculus and aggregate computing (2019)
  16. Yu, Hong; Chen, Yun; Lingras, Pawan; Wang, Guoyin: A three-way cluster ensemble approach for large-scale data (2019)
  17. Chung, Moo K.: Statistical challenges of big brain network data (2018)
  18. Convolbo, Moïse W.; Chou, Jerry; Hsu, Ching-Hsien; Chung, Yeh Ching: GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers (2018)
  19. Karim, Md. Rezaul; Cochez, Michael; Beyan, Oya Deniz; Ahmed, Chowdhury Farhan; Decker, Stefan: Mining maximal frequent patterns in transactional databases and dynamic data streams: a Spark-based approach (2018)
  20. Ketsman, Bas; Albarghouthi, Aws; Koutris, Paraschos: Distribution policies for Datalog (2018)

1 2 3 next