MapReduce is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large datasets. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely multiple group by query. We first study the communication cost of the MapReduce model, then we give an initial implementation of multiple group by query. We then propose an optimized version which addresses and improves the communication cost issues. Our optimized version shows a better accelerating ability and a better scalability than the other version

References in zbMATH (referenced in 133 articles , 1 standard article )

Showing results 1 to 20 of 133.
Sorted by year (citations)

1 2 3 ... 5 6 7 next

  1. Chen, Cong; Xu, Yinfeng; Zhu, Yuqing; Sun, Chengyu: Online MapReduce scheduling problem of minimizing the makespan (2017)
  2. Lanza, Daniel; Chávez, F.; Fernandez, Francisco; Garcia-Valdez, M.; Trujillo, Leonardo; Olague, Gustavo: Profiting from several recommendation algorithms using a scalable approach (2017)
  3. Luo, Taibo; Zhu, Yuqing; Wu, Weili; Xu, Yinfeng; Du, Ding-Zhu: Online makespan minimization in MapReduce-like systems with complex reduce tasks (2017)
  4. Zhu, Yao; Gleich, David F.; Grama, Ananth: Erasure coding for fault-oblivious linear system solvers (2017)
  5. Arias, Jacinto; Gamez, Jose A.; Nielsen, Thomas D.; Puerta, Jose M.: A scalable pairwise class interaction framework for multidimensional classification (2016)
  6. Bermanis, Amit; Salhov, Moshe; Wolf, Guy; Averbuch, Amir: Measure-based diffusion grid construction and high-dimensional data discretization (2016)
  7. Chawla, Priyanka; Chana, Inderveer; Rana, Ajay: Cloud-based automatic test data generation framework (2016)
  8. Choi, Woohyuk; Hong, Sumin; Jeong, Won-Ki: Vispark: GPU-accelerated distributed visual computing using Spark (2016)
  9. Derbeko, Philip; Dolev, Shlomi; Gudes, Ehud; Sharma, Shantanu: Security and privacy aspects in Mapreduce on clouds: A survey (2016)
  10. Grandi, Umberto; Loreggia, Andrea; Rossi, Francesca; Saraswat, Vijay: A Borda count for collective sentiment analysis (2016)
  11. Hameed, Abdul; Khoshkbarforoushha, Alireza; Ranjan, Rajiv; Jayaraman, Prem Prakash; Kolodziej, Joanna; Balaji, Pavan; Zeadally, Sherali; Malluhi, Qutaibah Marwan; Tziritas, Nikos; Vishnu, Abhinav; Khan, Samee U.; Zomaya, Albert: A survey and taxonomy on energy efficient resource allocation techniques for cloud computing systems (2016) ioport
  12. Lu, Hongyuan; Pang, Guodong: Gaussian limits for a fork-join network with nonexchangeable synchronization in heavy traffic (2016)
  13. Lu, Jing; Hoi, Steven C.H.; Wang, Jialei; Zhao, Peilin; Liu, Zhi-Yong: Large scale online kernel learning (2016)
  14. Meng, Xiangrui; Bradley, Joseph; Yavuz, Burak; Sparks, Evan; Venkataraman, Shivaram; Liu, Davies; Freeman, Jeremy; Tsai, Db; Amde, Manish; Owen, Sean; Xin, Doris; Xin, Reynold; Franklin, Michael J.; Zadeh, Reza; Zaharia, Matei; Talwalkar, Ameet: MLlib: machine learning in Apache Spark (2016)
  15. Mirzasoleiman, Baharan; Karbasi, Amin; Sarkar, Rik; Krause, Andreas: Distributed submodular maximization (2016)
  16. Rizk, Amr; Poloczek, Felix; Ciucu, Florin: Stochastic bounds in Fork-Join queueing systems under full and partial mapping (2016)
  17. Shakiba, A.; Hooshmandasl, M.R.: Data volume reduction in covering approximation spaces with respect to twenty-two types of covering based rough sets (2016)
  18. Wang, Xi; Fan, Jianxi; Jia, Xiaohua; Lin, Cheng-Kuan: An efficient algorithm to construct disjoint path covers of DCell networks (2016)
  19. Zhao, Jiaqi; Tao, Jie; Streit, Achim: Enabling collaborative MapReduce on the cloud with a single-sign-on mechanism (2016) ioport
  20. Afrati, Foto N.; Koutris, Paraschos; Suciu, Dan; Ullman, Jeffrey D.: Parallel skyline queries (2015)

1 2 3 ... 5 6 7 next