MapReduce

MapReduce is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large datasets. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely multiple group by query. We first study the communication cost of the MapReduce model, then we give an initial implementation of multiple group by query. We then propose an optimized version which addresses and improves the communication cost issues. Our optimized version shows a better accelerating ability and a better scalability than the other version


References in zbMATH (referenced in 230 articles , 1 standard article )

Showing results 1 to 20 of 230.
Sorted by year (citations)

1 2 3 ... 10 11 12 next

  1. Tang, Lu; Zhou, Ling; Song, Peter X.-K.: Distributed simultaneous inference in generalized linear models via confidence distribution (2020)
  2. Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario: Parallel extraction of association rules from genomics data (2019)
  3. Ali, Syed Muhammad Fawad; Mey, Johannes; Thiele, Maik: Parallelizing user-defined functions in the ETL workflow using orchestration style sheets (2019)
  4. Biletskyy, Borys: Distributed Bayesian machine learning procedures (2019)
  5. Brefeld, Ulf; Lasek, Jan; Mair, Sebastian: Probabilistic movement models and zones of control (2019)
  6. Claesson, Anders; Guðmundsson, Bjarki Ágúst: Enumerating permutations sortable by (k) passes through a pop-stack (2019)
  7. Dhaenens, Clarisse; Jourdan, Laetitia: Metaheuristics for data mining (2019)
  8. Gyssens, Marc; Hellings, Jelle; Paredaens, Jan; Van Gucht, Dirk; Wijsen, Jef; Wu, Yuqing: Calculi for symmetric queries (2019)
  9. Jiang, Yiwei; Zhou, Ping; Cheng, T. C. E.; Ji, Min: Optimal online algorithms for MapReduce scheduling on two uniform machines (2019)
  10. Jiang, Yiwei; Zhou, Ping; Zhou, Wei: MapReduce machine covering problem on a small number of machines (2019)
  11. Jiang, Yun; Zhuo, Junyu; Zhang, Juan; Xiao, Xiao: The optimization of parallel convolutional RBM based on Spark (2019)
  12. Pericini, Matheus H. M.; Leite, Lucas G. M.; De Carvalho-Junior, Francisco H.; Machado, Javam C.; Rezende, Cenez A.: \textscMAPSkew: metaheuristic approaches for partitioning skew in MapReduce (2019)
  13. Quiroz, Matias; Kohn, Robert; Villani, Mattias; Tran, Minh-Ngoc: Speeding up MCMC by efficient data subsampling (2019)
  14. Raissi, Maziar; Babaee, Hessam; Karniadakis, George Em: Parametric Gaussian process regression for big data (2019)
  15. Wang, Weina; Harchol-Balter, Mor; Jiang, Haotian; Scheller-Wolf, Alan; Srikant, R.: Delay asymptotics and bounds for multitask parallel jobs (2019)
  16. Afrati, Foto N.; Sharma, Shantanu; Ullman, Jonathan R.; Ullman, Jeffrey D.: Computing marginals using MapReduce (2018)
  17. Caballero, Rafael; Martin-Martin, Enrique; Riesco, Adrián; Tamarit, Salvador: Declarative debugging of concurrent Erlang programs (2018)
  18. Convolbo, Moïse W.; Chou, Jerry; Hsu, Ching-Hsien; Chung, Yeh Ching: GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers (2018)
  19. Fischetti, Matteo; Monaci, Michele; Salvagnin, Domenico: SelfSplit parallelization for mixed-integer linear programming (2018)
  20. Gonen, Yaron; Gudes, Ehud; Kandalov, Kirill: New and efficient algorithms for producing frequent itemsets with the Map-Reduce framework (2018)

1 2 3 ... 10 11 12 next