MapReduce is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large datasets. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely multiple group by query. We first study the communication cost of the MapReduce model, then we give an initial implementation of multiple group by query. We then propose an optimized version which addresses and improves the communication cost issues. Our optimized version shows a better accelerating ability and a better scalability than the other version

References in zbMATH (referenced in 206 articles , 1 standard article )

Showing results 1 to 20 of 206.
Sorted by year (citations)

1 2 3 ... 9 10 11 next

  1. Brefeld, Ulf; Lasek, Jan; Mair, Sebastian: Probabilistic movement models and zones of control (2019)
  2. Afrati, Foto N.; Sharma, Shantanu; Ullman, Jonathan R.; Ullman, Jeffrey D.: Computing marginals using MapReduce (2018)
  3. Caballero, Rafael; Martin-Martin, Enrique; Riesco, Adrián; Tamarit, Salvador: Declarative debugging of concurrent Erlang programs (2018)
  4. Convolbo, Moïse W.; Chou, Jerry; Hsu, Ching-Hsien; Chung, Yeh Ching: GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers (2018)
  5. Fischetti, Matteo; Monaci, Michele; Salvagnin, Domenico: SelfSplit parallelization for mixed-integer linear programming (2018)
  6. Gonen, Yaron; Gudes, Ehud; Kandalov, Kirill: New and efficient algorithms for producing frequent itemsets with the Map-Reduce framework (2018)
  7. Haller, Philipp; Miller, Heather; Müller, Normen: A programming model and foundation for lineage-based distributed computation (2018)
  8. Huang, Jidan; Zheng, Feifeng; Xu, Yinfeng; Liu, Ming: Online MapReduce processing on two identical parallel machines (2018)
  9. Interlandi, Matteo; Tanca, Letizia: A datalog-based computational model for coordination-free, data-parallel systems (2018)
  10. Jiang, Yiwei; Zhou, Wei; Zhou, Ping: An optimal preemptive algorithm for online MapReduce scheduling on two parallel machines (2018)
  11. Li, Guozhi; Guo, Songtao; Liu, Guiyan; Yang, Yuanyuan: Application and analysis of multicast blocking modelling in fat-tree data center networks (2018)
  12. Lin, Shao-Bo; Zhou, Ding-Xuan: Distributed kernel-based gradient descent algorithms (2018)
  13. Liu, Jiapeng; Liao, Xiuwu; Huang, Wei; Yang, Jian-bo: A new decision-making approach for multiple criteria sorting with an imbalanced set of assignment examples (2018)
  14. Lucic, Mario; Faulkner, Matthew; Krause, Andreas; Feldman, Dan: Training Gaussian mixture models at scale via coresets (2018)
  15. Mahajan, Dhruv; Agrawal, Nikunj; Keerthi, S. Sathiya; Sellamanickam, Sundararajan; Bottou, Léon: An efficient distributed learning algorithm based on effective local functional approximations (2018)
  16. Mishra, Deepa; Gunasekaran, Angappa; Papadopoulos, Thanos; Childe, Stephen J.: Big data and supply chain management: a review and bibliometric analysis (2018)
  17. Mukhopadhyay, Subhadeep; Nandi, Shinjini: LPiTrack: eye movement pattern recognition algorithm and application to biometric identification (2018)
  18. Papanagnou, Christos I.; Matthews-Amune, Omeiza: Coping with demand volatility in retail pharmacies with the aid of big data exploration (2018)
  19. Pelucchi, Mauro; Psaila, Giuseppe; Toccu, Maurizio: Hadoop vs. Spark: impact on performance of the Hammer query engine for open data corpora (2018)
  20. Qian, Hang: Big data Bayesian linear regression and variable selection by normal-inverse-gamma summation (2018)

1 2 3 ... 9 10 11 next