MapReduce

MapReduce is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large datasets. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely multiple group by query. We first study the communication cost of the MapReduce model, then we give an initial implementation of multiple group by query. We then propose an optimized version which addresses and improves the communication cost issues. Our optimized version shows a better accelerating ability and a better scalability than the other version


References in zbMATH (referenced in 185 articles , 1 standard article )

Showing results 1 to 20 of 185.
Sorted by year (citations)

1 2 3 ... 8 9 10 next

  1. Afrati, Foto N.; Sharma, Shantanu; Ullman, Jonathan R.; Ullman, Jeffrey D.: Computing marginals using MapReduce (2018)
  2. Convolbo, Moïse W.; Chou, Jerry; Hsu, Ching-Hsien; Chung, Yeh Ching: GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers (2018)
  3. Fischetti, Matteo; Monaci, Michele; Salvagnin, Domenico: SelfSplit parallelization for mixed-integer linear programming (2018)
  4. Haller, Philipp; Miller, Heather; Müller, Normen: A programming model and foundation for lineage-based distributed computation (2018)
  5. Huang, Jidan; Zheng, Feifeng; Xu, Yinfeng; Liu, Ming: Online MapReduce processing on two identical parallel machines (2018)
  6. Jiang, Yiwei; Zhou, Wei; Zhou, Ping: An optimal preemptive algorithm for online MapReduce scheduling on two parallel machines (2018)
  7. Li, Guozhi; Guo, Songtao; Liu, Guiyan; Yang, Yuanyuan: Application and analysis of multicast blocking modelling in fat-tree data center networks (2018)
  8. Lin, Shao-Bo; Zhou, Ding-Xuan: Distributed kernel-based gradient descent algorithms (2018)
  9. Liu, Jiapeng; Liao, Xiuwu; Huang, Wei; Yang, Jian-bo: A new decision-making approach for multiple criteria sorting with an imbalanced set of assignment examples (2018)
  10. Mukhopadhyay, Subhadeep; Nandi, Shinjini: LPiTrack: eye movement pattern recognition algorithm and application to biometric identification (2018)
  11. Papanagnou, Christos I.; Matthews-Amune, Omeiza: Coping with demand volatility in retail pharmacies with the aid of big data exploration (2018)
  12. Wang, Zi-Hao; Lin, Hong-Wei; Xu, Chen-Kai: Data driven composite shape descriptor design for shape retrieval with a VoR-tree (2018)
  13. Xia, Dawen; Lu, Xiaonan; Li, Huaqing; Wang, Wendong; Li, Yantao; Zhang, Zili: A MapReduce-based parallel frequent pattern growth algorithm for spatiotemporal association analysis of mobile trajectory big data (2018)
  14. Afzal, Asif; Ansari, Zahid; Rimaz Faizabadi, Ahmed; Ramis, M. K.: Parallelization strategies for computational fluid dynamics software: state of the art review (2017)
  15. Annoni, Jennifer; Seiler, Peter: A method to construct reduced-order parameter-varying models (2017)
  16. Brandt, Jörgen; Reisig, Wolfgang; Leser, Ulf: Computation semantics of the functional scientific workflow language cuneiform (2017)
  17. Chen, Cong; Xu, Yinfeng; Zhu, Yuqing; Sun, Chengyu: Online MapReduce scheduling problem of minimizing the makespan (2017)
  18. Fabisiak, Tomasz; Danilecki, Arkadiusz: Browser-based harnessing of voluntary computational power (2017)
  19. Fuerst, Carlo; Pacut, Maciej; Schmid, Stefan: Data locality and replica aware virtual cluster embeddings (2017)
  20. García, José; Pope, Christopher; Altimiras, Francisco: A distributed $K$-means segmentation algorithm applied to \itLobesia botrana recognition (2017)

1 2 3 ... 8 9 10 next