HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm This extended abstract describes and analyses a near-optimal probabilistic algorithm, HyperLogLog, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, “short bytes”), HyperLogLog performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about 1·04/m. This improves on the best previously known cardinality estimator, LogLog, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond 109 with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.
Keywords for this software
References in zbMATH (referenced in 6 articles , 1 standard article )
Showing results 1 to 6 of 6.
- Cohen, Reuven; Katzir, Liran; Yehezkel, Aviv: A unified scheme for generalizing cardinality estimators to sum aggregation (2015)
- Fuchs, Michael; Hwang, Hsien-Kuei; Zacharovas, Vytas: An analytic approach to the asymptotic variance of trie statistics and related structures (2014)
- Csűrös, Miklós: Approximate counting with a floating-point counter (2010)
- Giroire, Frédéric: Order statistics and estimating cardinalities of massive data sets (2009)
- Nedjar, Sébastien; Casali, Alain; Cicchetti, Rosine; Lakhal, Lotfi: Emerging cubes: borders, size estimations and lossless reductions (2009)
- Flajolet, P.; Fusy, É.; Gandouet, O.; Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (2007)