Apache Spark
Apache Spark: Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.
Keywords for this software
References in zbMATH (referenced in 41 articles )
Showing results 1 to 20 of 41.
Sorted by year (- Anil, Robin; Capan, Gokhan; Drost-Fromm, Isabel; Dunning, Ted; Friedman, Ellen; Grant, Trevor; Quinn, Shannon; Ranjan, Paritosh; Schelter, Sebastian; Yılmazel, Özgür: Apache Mahout: machine learning on distributed dataflow systems (2020)
- Kalina, Jan; Vidnerová, Petra: Regression neural networks with a highly robust loss function (2020)
- Ketsman, Bas; Albarghouthi, Aws; Koutris, Paraschos: Distribution policies for Datalog (2020)
- Lu, Haihao; Mazumder, Rahul: Randomized gradient boosting machine (2020)
- Salehi, Abbas; Masoumi, Behrooz: KATZ centrality with biogeography-based optimization for influence maximization problem (2020)
- Sambasivan, Rajiv; Das, Sourish; Sahu, Sujit K.: A Bayesian perspective of statistical machine learning for big data (2020)
- Liu, Heng; Ditzler, Gregory: A semi-parallel framework for greedy information-theoretic feature selection (2019)
- Pan, Xianli; Xu, Yitian: A safe reinforced feature screening strategy for Lasso based on feasible solutions (2019)
- Raissi, Maziar; Babaee, Hessam; Karniadakis, George Em: Parametric Gaussian process regression for big data (2019)
- Rodrigo, Enrique G.; Aledo, Juan A.; Gámez, José A.: spark-crowd: a spark package for learning from crowdsourced big data (2019)
- Rompf, Tiark; Amin, Nada: A SQL to C compiler in 500 lines of code (2019)
- Roy, Asim; Qureshi, Shiban; Pande, Kartikeya; Nair, Divitha; Gairola, Kartik; Jain, Pooja; Singh, Suraj; Sharma, Kirti; Jagadale, Akshay; Lin, Yi-Yang; Sharma, Shashank; Gotety, Ramya; Zhang, Yuexin; Tang, Ji; Mehta, Tejas; Sindhanuru, Hemanth; Okafor, Nonso; Das, Santak; Gopal, Chidambara N.; Rudraraju, Srinivasa B.; Kakarlapudi, Avinash V.: Performance comparison of machine learning platforms (2019)
- Sainudiin, Raazesh; Teng, Gloria: Minimum distance histograms with universal performance guarantees (2019)
- Tsamardinos, Ioannis; Borboudakis, Giorgos; Katsogridakis, Pavlos; Pratikakis, Polyvios; Christophides, Vassilis: A greedy feature selection algorithm for big data of high dimensionality (2019)
- Viroli, Mirko; Beal, Jacob; Damiani, Ferruccio; Audrito, Giorgio; Casadei, Roberto; Pianini, Danilo: From distributed coordination to field calculus and aggregate computing (2019)
- Yu, Hong; Chen, Yun; Lingras, Pawan; Wang, Guoyin: A three-way cluster ensemble approach for large-scale data (2019)
- Chung, Moo K.: Statistical challenges of big brain network data (2018)
- Convolbo, Moïse W.; Chou, Jerry; Hsu, Ching-Hsien; Chung, Yeh Ching: GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers (2018)
- Karim, Md. Rezaul; Cochez, Michael; Beyan, Oya Deniz; Ahmed, Chowdhury Farhan; Decker, Stefan: Mining maximal frequent patterns in transactional databases and dynamic data streams: a Spark-based approach (2018)
- Ketsman, Bas; Albarghouthi, Aws; Koutris, Paraschos: Distribution policies for Datalog (2018)