Apache Spark

Apache Spark: Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.


References in zbMATH (referenced in 63 articles )

Showing results 41 to 60 of 63.
Sorted by year (citations)
  1. Viroli, Mirko; Beal, Jacob; Damiani, Ferruccio; Audrito, Giorgio; Casadei, Roberto; Pianini, Danilo: From distributed coordination to field calculus and aggregate computing (2019)
  2. Yu, Hong; Chen, Yun; Lingras, Pawan; Wang, Guoyin: A three-way cluster ensemble approach for large-scale data (2019)
  3. Chung, Moo K.: Statistical challenges of big brain network data (2018)
  4. Convolbo, Moïse W.; Chou, Jerry; Hsu, Ching-Hsien; Chung, Yeh Ching: GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers (2018)
  5. Karim, Md. Rezaul; Cochez, Michael; Beyan, Oya Deniz; Ahmed, Chowdhury Farhan; Decker, Stefan: Mining maximal frequent patterns in transactional databases and dynamic data streams: a Spark-based approach (2018)
  6. Ketsman, Bas; Albarghouthi, Aws; Koutris, Paraschos: Distribution policies for Datalog (2018)
  7. Kocsis, Zoltan A.; Swan, Jerry: Genetic programming (+) proof search (=) automatic improvement (2018)
  8. Maria Luiza Mondelli, Thiago Magalhães, Guilherme Loss, Michael Wilde, Ian Foster, Marta Mattoso, Daniel S. Katz, Helio J. C. Barbosa, Ana Tereza R. Vasconcelos, Kary Ocaña, Luiz M. R. Gadelha Jr: BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments (2018) arXiv
  9. Neukirchen, Helmut: Elephant against Goliath: performance of big data versus high-performance computing DBSCAN clustering implementations (2018)
  10. Sainudiin, Raazesh; Véber, Amandine: Full likelihood inference from the site frequency spectrum based on the optimal tree resolution (2018)
  11. Smith, Virginia; Forte, Simone; Ma, Chenxin; Takáč, Martin; Jordan, Michael I.; Jaggi, Martin: CoCoA: a general framework for communication-efficient distributed optimization (2018)
  12. Bacciu, Davide; Carta, Antonio; Gnesi, Stefania; Semini, Laura: An experience in using machine learning for short-term predictions in smart transportation systems (2017)
  13. García, José; Pope, Christopher; Altimiras, Francisco: A distributed (K)-means segmentation algorithm applied to \textitLobesiabotrana recognition (2017)
  14. Ghesmoune, Mohammed; Azzag, Hanene; Benbernou, Salima; Lebbah, Mustapha; Duong, Tarn; Ouziri, Mourad: Big data: from collection to visualization (2017)
  15. Kanavos, Andreas; Nodarakis, Nikolaos; Sioutas, Spyros; Tsakalidis, Athanasios; Tsolis, Dimitrios; Tzimas, Giannis: Large scale implementations for Twitter sentiment classification (2017)
  16. Masegosa, Andrés R.; Martinez, Ana M.; Langseth, Helge; Nielsen, Thomas D.; Salmerón, Antonio; Ramos-López, Darío; Madsen, Anders L.: Scaling up Bayesian variational inference using distributed computing clusters (2017)
  17. Cuzzocrea, Alfredo; Cosulschi, Mirel; de Virgilio, Roberto: An effective and efficient MapReduce algorithm for computing BFS-based traversals of large-scale RDF graphs (2016)
  18. Hupel, Lars; Kuncak, Viktor: Translating Scala programs to Isabelle/HOL. System description (2016)
  19. Iwen, M. A.; Ong, B. W.: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks (2016)
  20. Meng, Xiangrui; Bradley, Joseph; Yavuz, Burak; Sparks, Evan; Venkataraman, Shivaram; Liu, Davies; Freeman, Jeremy; Tsai, Db; Amde, Manish; Owen, Sean; Xin, Doris; Xin, Reynold; Franklin, Michael J.; Zadeh, Reza; Zaharia, Matei; Talwalkar, Ameet: MLlib: machine learning in Apache Spark (2016)