HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm This extended abstract describes and analyses a near-optimal probabilistic algorithm, HyperLogLog, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, “short bytes”), HyperLogLog performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about 1·04/m. This improves on the best previously known cardinality estimator, LogLog, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond 109 with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.

References in zbMATH (referenced in 13 articles , 1 standard article )

Showing results 1 to 13 of 13.
Sorted by year (citations)

  1. Rachkovskij, D.A.: Binary vectors for fast distance and similarity estimation (2017)
  2. Rachkovskij, D.A.: Index structures for fast similarity search for binary vectors (2017)
  3. Woodruff, David P.; Zhang, Qin: When distributed computation is communication expensive (2017)
  4. Brandes, Philipp; Kardas, Marcin; Klonowski, Marek; Pająk, Dominik; Wattenhofer, Roger: Approximating the size of a radio network in beeping model (2016)
  5. Cohen, Reuven; Katzir, Liran; Yehezkel, Aviv: A unified scheme for generalizing cardinality estimators to sum aggregation (2015)
  6. Fuchs, Michael; Hwang, Hsien-Kuei; Zacharovas, Vytas: An analytic approach to the asymptotic variance of trie statistics and related structures (2014)
  7. Li, Rong-Hua; Yu, Jeffrey Xu; Huang, Xin; Cheng, Hong; Shang, Zechao: Measuring the impact of MVC attack in large complex networks (2014)
  8. Csűrös, Miklós: Approximate counting with a floating-point counter (2010)
  9. Lumbroso, Jérémie: An optimal cardinality estimation algorithm based on order statistics and its full analysis (2010)
  10. Giroire, Frédéric: Order statistics and estimating cardinalities of massive data sets (2009)
  11. Nedjar, Sébastien; Casali, Alain; Cicchetti, Rosine; Lakhal, Lotfi: Emerging cubes: borders, size estimations and lossless reductions (2009) ioport
  12. Chabchoub, Yousra; Fricker, Christine; Meunier, Frédéric; Tibi, Danielle: Analysis of an algorithm catching elephants on the Internet (2008)
  13. Flajolet, P.; Fusy, É.; Gandouet, O.; Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (2007)