MapReduce is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large datasets. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely multiple group by query. We first study the communication cost of the MapReduce model, then we give an initial implementation of multiple group by query. We then propose an optimized version which addresses and improves the communication cost issues. Our optimized version shows a better accelerating ability and a better scalability than the other version

References in zbMATH (referenced in 180 articles , 1 standard article )

Showing results 1 to 20 of 180.
Sorted by year (citations)

1 2 3 ... 7 8 9 next

  1. Afrati, Foto N.; Sharma, Shantanu; Ullman, Jonathan R.; Ullman, Jeffrey D.: Computing marginals using MapReduce (2018)
  2. Convolbo, Moïse W.; Chou, Jerry; Hsu, Ching-Hsien; Chung, Yeh Ching: GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers (2018)
  3. Huang, Jidan; Zheng, Feifeng; Xu, Yinfeng; Liu, Ming: Online MapReduce processing on two identical parallel machines (2018)
  4. Jiang, Yiwei; Zhou, Wei; Zhou, Ping: An optimal preemptive algorithm for online MapReduce scheduling on two parallel machines (2018)
  5. Li, Guozhi; Guo, Songtao; Liu, Guiyan; Yang, Yuanyuan: Application and analysis of multicast blocking modelling in fat-tree data center networks (2018)
  6. Lin, Shao-Bo; Zhou, Ding-Xuan: Distributed kernel-based gradient descent algorithms (2018)
  7. Liu, Jiapeng; Liao, Xiuwu; Huang, Wei; Yang, Jian-bo: A new decision-making approach for multiple criteria sorting with an imbalanced set of assignment examples (2018)
  8. Mukhopadhyay, Subhadeep; Nandi, Shinjini: LPiTrack: eye movement pattern recognition algorithm and application to biometric identification (2018)
  9. Wang, Zi-Hao; Lin, Hong-Wei; Xu, Chen-Kai: Data driven composite shape descriptor design for shape retrieval with a VoR-tree (2018)
  10. Xia, Dawen; Lu, Xiaonan; Li, Huaqing; Wang, Wendong; Li, Yantao; Zhang, Zili: A MapReduce-based parallel frequent pattern growth algorithm for spatiotemporal association analysis of mobile trajectory big data (2018)
  11. Afzal, Asif; Ansari, Zahid; Rimaz Faizabadi, Ahmed; Ramis, M. K.: Parallelization strategies for computational fluid dynamics software: state of the art review (2017)
  12. Annoni, Jennifer; Seiler, Peter: A method to construct reduced-order parameter-varying models (2017)
  13. Chen, Cong; Xu, Yinfeng; Zhu, Yuqing; Sun, Chengyu: Online MapReduce scheduling problem of minimizing the makespan (2017)
  14. Fabisiak, Tomasz; Danilecki, Arkadiusz: Browser-based harnessing of voluntary computational power (2017)
  15. Fuerst, Carlo; Pacut, Maciej; Schmid, Stefan: Data locality and replica aware virtual cluster embeddings (2017)
  16. García, José; Pope, Christopher; Altimiras, Francisco: A distributed $K$-means segmentation algorithm applied to \itLobesia botrana recognition (2017)
  17. Hopf, Michael; Thielen, Clemens; Wendt, Oliver: Competitive algorithms for multistage online scheduling (2017)
  18. Lanza, Daniel; Chávez, F.; Fernandez, Francisco; Garcia-Valdez, M.; Trujillo, Leonardo; Olague, Gustavo: Profiting from several recommendation algorithms using a scalable approach (2017)
  19. Luo, Taibo; Zhu, Yuqing; Wu, Weili; Xu, Yinfeng; Du, Ding-Zhu: Online makespan minimization in MapReduce-like systems with complex reduce tasks (2017)
  20. Masegosa, Andrés R.; Martinez, Ana M.; Langseth, Helge; Nielsen, Thomas D.; Salmerón, Antonio; Ramos-López, Darío; Madsen, Anders L.: Scaling up Bayesian variational inference using distributed computing clusters (2017)

1 2 3 ... 7 8 9 next