HaLoop

HaLoop: Efficient iterative data processing on large clusters. Simply speaking, HaLoop = Ha, Loop:-) HaLoop is a modified version of the Hadoop MapReduce framework, designed to serve these applications. HaLoop not only extends MapReduce with programming support for iterative applications, but also dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms. We evaluate HaLoop on real queries and real datasets and find that, on average, HaLoop reduces query runtimes by 1.85 compared with Hadoop, and shuffles only 4% of the data between mappers and reducers compared with Hadoop. In short, HaLoop has the following features: 1) provide caching options for loop-invariant data access, 2) let users reuse major building blocks from applications’ Hadoop implementations, and 3) have similar intra-job fault-tolerance mechanisms to Hadoop. Also, HaLoop is backward-compatible with Hadoop jobs. Note that at this stage, HaLoop is only a prototype system rather than a production system. We are trying our best to make the system more robust and stable.


References in zbMATH (referenced in 3 articles )

Showing results 1 to 3 of 3.
Sorted by year (citations)

  1. Pelucchi, Mauro; Psaila, Giuseppe; Toccu, Maurizio: Hadoop vs. Spark: impact on performance of the Hammer query engine for open data corpora (2018)
  2. Berlińska, Joanna; Drozdowski, Maciej: Scheduling multilayer divisible computations (2015)
  3. Zhang, Junbo; Wong, Jian-Syuan; Li, Tianrui; Pan, Yi: A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems (2014) ioport