Apache Spark

Apache Spark: Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

References in zbMATH (referenced in 29 articles )

Showing results 1 to 20 of 29.
Sorted by year (citations)

1 2 next

  1. Salehi, Abbas; Masoumi, Behrooz: KATZ centrality with biogeography-based optimization for influence maximization problem (2020)
  2. Raissi, Maziar; Babaee, Hessam; Karniadakis, George Em: Parametric Gaussian process regression for big data (2019)
  3. Rodrigo, Enrique G.; Aledo, Juan A.; Gámez, José A.: spark-crowd: a spark package for learning from crowdsourced big data (2019)
  4. Rompf, Tiark; Amin, Nada: A SQL to C compiler in 500 lines of code (2019)
  5. Sainudiin, Raazesh; Teng, Gloria: Minimum distance histograms with universal performance guarantees (2019)
  6. Tsamardinos, Ioannis; Borboudakis, Giorgos; Katsogridakis, Pavlos; Pratikakis, Polyvios; Christophides, Vassilis: A greedy feature selection algorithm for big data of high dimensionality (2019)
  7. Viroli, Mirko; Beal, Jacob; Damiani, Ferruccio; Audrito, Giorgio; Casadei, Roberto; Pianini, Danilo: From distributed coordination to field calculus and aggregate computing (2019)
  8. Yu, Hong; Chen, Yun; Lingras, Pawan; Wang, Guoyin: A three-way cluster ensemble approach for large-scale data (2019)
  9. Chung, Moo K.: Statistical challenges of big brain network data (2018)
  10. Convolbo, Moïse W.; Chou, Jerry; Hsu, Ching-Hsien; Chung, Yeh Ching: GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers (2018)
  11. Kocsis, Zoltan A.; Swan, Jerry: Genetic programming (+) proof search (=) automatic improvement (2018)
  12. Kordík, Pavel; Černý, Jan; Frýda, Tomáš: Discovering predictive ensembles for transfer learning and meta-learning (2018)
  13. Neukirchen, Helmut: Elephant against Goliath: performance of big data versus high-performance computing DBSCAN clustering implementations (2018)
  14. Nghiem, Peter P.: Best trade-off point method for efficient resource provisioning in spark (2018)
  15. Pelucchi, Mauro; Psaila, Giuseppe; Toccu, Maurizio: Hadoop vs. Spark: impact on performance of the Hammer query engine for open data corpora (2018)
  16. Sainudiin, Raazesh; Véber, Amandine: Full likelihood inference from the site frequency spectrum based on the optimal tree resolution (2018)
  17. Smith, Virginia; Forte, Simone; Ma, Chenxin; Takáč, Martin; Jordan, Michael I.; Jaggi, Martin: CoCoA: a general framework for communication-efficient distributed optimization (2018)
  18. Bacciu, Davide; Carta, Antonio; Gnesi, Stefania; Semini, Laura: An experience in using machine learning for short-term predictions in smart transportation systems (2017)
  19. García, José; Pope, Christopher; Altimiras, Francisco: A distributed (K)-means segmentation algorithm applied to \textitLobesiabotrana recognition (2017)
  20. Ghesmoune, Mohammed; Azzag, Hanene; Benbernou, Salima; Lebbah, Mustapha; Duong, Tarn; Ouziri, Mourad: Big Data: from collection to visualization (2017)

1 2 next