Apache Spark

Apache Spark: Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

References in zbMATH (referenced in 59 articles )

Showing results 1 to 20 of 59.
Sorted by year (citations)

1 2 3 next

  1. Iwasaki, Hideya; Emoto, Kento; Morihata, Akimasa; Matsuzaki, Kiminori; Hu, Zhenjiang: Fregel: a functional domain-specific language for vertex-centric large-scale graph processing (2022)
  2. Nanongkai, Danupon; Scquizzato, Michele: Equivalence classes and conditional hardness in massively parallel computations (2022)
  3. Ahlawat, Khyati; Chug, Anuradha; Singh, Amit Prakash: A novel hybrid sampling algorithm for solving class imbalance problem in big data (2021)
  4. Avalos, Omar: GSA for machine learning problems: a comprehensive overview (2021)
  5. Azhari, Mourad; Abarda, Abdallah; Ettaki, Badia; Zerouaoui, Jamal; Dakkon, Mohamed: Using machine learning with PySpark and MLib for solving a binary classification problem: case of searching for exotic particles (2021)
  6. Berthold, Michael R.; Fillbrunn, Alexander; Siebes, Arno: Widening: using parallel resources to improve model quality (2021)
  7. Chelly, Dagdia Zaineb; Zarges, Christine: A detailed study of the distributed rough set based locality sensitive hashing feature selection technique (2021)
  8. Dmitry Soshnikov, Yana Valieva: mPyPl: Python Monadic Pipeline Library for Complex Functional Data Processing (2021) arXiv
  9. Dong, Bin; Wu, Kesheng; Byna, Suren: User-defined tensor data analysis (2021)
  10. Dutta, R., Schoengens, M., Pacchiardi, L., Ummadisingu, A., Widmer, N., Künzli, P., Onnela, J.-P., Mira, A: ABCpy: A High-Performance Computing Perspective to Approximate Bayesian Computation (2021) not zbMATH
  11. Fernandez-Basso, Carlos; Ruiz, M. Dolores; Martin-Bautista, Maria J.: Spark solutions for discovering fuzzy association rules in big data (2021)
  12. Gong, Chaoyu; Su, Zhi-gang; Wang, Pei-hong; Wang, Qian; You, Yang: Evidential instance selection for (K)-nearest neighbor classification of big data (2021)
  13. Kappelman, Ashton Conrad; Sinha, Ashesh Kumar: Optimal control in dynamic food supply chains using big data (2021)
  14. Maté, Carlos G.: Combining interval time series forecasts. A first step in a long way (research agenda) (2021)
  15. Młodak, Andrzej: (k)-means, Ward and probabilistic distance-based clustering methods with contiguity constraint (2021)
  16. Tayarani N., Mohammad-H.: Applications of artificial intelligence in battling against COVID-19: a literature review (2021)
  17. Zhu, Xuening; Li, Feng; Wang, Hansheng: Least-square approximation for a distributed system (2021)
  18. Anil, Robin; Capan, Gokhan; Drost-Fromm, Isabel; Dunning, Ted; Friedman, Ellen; Grant, Trevor; Quinn, Shannon; Ranjan, Paritosh; Schelter, Sebastian; Yılmazel, Özgür: Apache Mahout: machine learning on distributed dataflow systems (2020)
  19. Feng, Jun; Yang, Laurence T.; Gati, Nicholaus J.; Xie, Xia; Gavuna, Benard S.: Privacy-preserving computation in cyber-physical-social systems: a survey of the state-of-the-art and perspectives (2020)
  20. Kalina, Jan; Vidnerová, Petra: Regression neural networks with a highly robust loss function (2020)

1 2 3 next