MapReduce

MapReduce is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large datasets. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely multiple group by query. We first study the communication cost of the MapReduce model, then we give an initial implementation of multiple group by query. We then propose an optimized version which addresses and improves the communication cost issues. Our optimized version shows a better accelerating ability and a better scalability than the other version


References in zbMATH (referenced in 175 articles , 1 standard article )

Showing results 1 to 20 of 175.
Sorted by year (citations)

1 2 3 ... 7 8 9 next

  1. Afrati, Foto N.; Sharma, Shantanu; Ullman, Jonathan R.; Ullman, Jeffrey D.: Computing marginals using MapReduce (2018)
  2. Huang, Jidan; Zheng, Feifeng; Xu, Yinfeng; Liu, Ming: Online MapReduce processing on two identical parallel machines (2018)
  3. Li, Guozhi; Guo, Songtao; Liu, Guiyan; Yang, Yuanyuan: Application and analysis of multicast blocking modelling in fat-tree data center networks (2018)
  4. Lin, Shao-Bo; Zhou, Ding-Xuan: Distributed kernel-based gradient descent algorithms (2018)
  5. Liu, Jiapeng; Liao, Xiuwu; Huang, Wei; Yang, Jian-bo: A new decision-making approach for multiple criteria sorting with an imbalanced set of assignment examples (2018)
  6. Mukhopadhyay, Subhadeep; Nandi, Shinjini: LPiTrack: eye movement pattern recognition algorithm and application to biometric identification (2018)
  7. Wang, Zi-Hao; Lin, Hong-Wei; Xu, Chen-Kai: Data driven composite shape descriptor design for shape retrieval with a VoR-tree (2018)
  8. Afzal, Asif; Ansari, Zahid; Rimaz Faizabadi, Ahmed; Ramis, M.K.: Parallelization strategies for computational fluid dynamics software: state of the art review (2017)
  9. Annoni, Jennifer; Seiler, Peter: A method to construct reduced-order parameter-varying models (2017)
  10. Chen, Cong; Xu, Yinfeng; Zhu, Yuqing; Sun, Chengyu: Online MapReduce scheduling problem of minimizing the makespan (2017)
  11. Fabisiak, Tomasz; Danilecki, Arkadiusz: Browser-based harnessing of voluntary computational power (2017)
  12. Fuerst, Carlo; Pacut, Maciej; Schmid, Stefan: Data locality and replica aware virtual cluster embeddings (2017)
  13. García, José; Pope, Christopher; Altimiras, Francisco: A distributed $K$-means segmentation algorithm applied to \itLobesia botrana recognition (2017)
  14. Lanza, Daniel; Chávez, F.; Fernandez, Francisco; Garcia-Valdez, M.; Trujillo, Leonardo; Olague, Gustavo: Profiting from several recommendation algorithms using a scalable approach (2017)
  15. Luo, Taibo; Zhu, Yuqing; Wu, Weili; Xu, Yinfeng; Du, Ding-Zhu: Online makespan minimization in MapReduce-like systems with complex reduce tasks (2017)
  16. Masegosa, Andrés R.; Martinez, Ana M.; Langseth, Helge; Nielsen, Thomas D.; Salmerón, Antonio; Ramos-López, Darío; Madsen, Anders L.: Scaling up Bayesian variational inference using distributed computing clusters (2017)
  17. Ni, Eric C.; Ciocan, Dragos F.; Henderson, Shane G.; Hunter, Susan R.: Efficient ranking and selection in parallel computing environments (2017)
  18. Stewart, Iain A.: On the combinatorial design of data centre network topologies (2017)
  19. Zhu, Yao; Gleich, David F.; Grama, Ananth: Erasure coding for fault-oblivious linear system solvers (2017)
  20. Arias, Jacinto; Gamez, Jose A.; Nielsen, Thomas D.; Puerta, Jose M.: A scalable pairwise class interaction framework for multidimensional classification (2016)

1 2 3 ... 7 8 9 next