| | |
Summary: SIAM J. COMPUT. c 2007 Society for Industrial and Applied Mathematics
Vol. 37, No. 2, pp. 359379
RANGE-EFFICIENT COUNTING OF DISTINCT ELEMENTS IN A
MASSIVE DATA STREAM
A. PAVAN AND SRIKANTA TIRTHAPURA
Abstract. Efficient one-pass estimation of F0, the number of distinct elements in a data stream,
is a fundamental problem arising in various contexts in databases and networking. We consider range-
efficient estimation of F0: estimation of the number of distinct elements in a data stream where each
element of the stream is not just a single integer but an interval of integers. We present a randomized
algorithm which yields an ( , )-approximation of F0, with the following time and space complexi-
ties (n is the size of the universe of the items): (1) The amortized processing time per interval is
O(log 1
log n
). (2) The workspace used is O( 1
2 log 1
log n) bits. Our algorithm improves upon a pre-
vious algorithm by Bar-Yossef, Kumar and Sivakumar [Proceedings of the 13th ACMSIAM Sympo-
sium on Discrete Algorithms (SODA), 2002, pp. 623632], which requires O( 1
|