Reservoir Sampling
Reservoir Sampling
Problem Statement
Given a singly linked list, return a random node's value from the linked list. Each node must have the same probability of being chosen.
This is a classic application of Reservoir Sampling, where we need to select a random sample of size k (in this case, k=1) from a list of n items, where n is either a very large or unknown number.
Example
Input: A stream of numbers [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and k = 3
Data Stream (n=10)
Reservoir (k=3)
A random sample of 3 items is selected from the stream.
Output: A random sample of 3 elements from the stream, e.g., [4, 7, 2]
Solution (Algorithm R)
Reservoir sampling is a family of randomized algorithms for choosing a simple random sample of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm.
Algorithm Steps
- Create a reservoir array of size
k
. - Fill the reservoir with the first
k
items from the stream. - For each subsequent item
i
(fromk+1
ton
), generate a random integerj
from 0 toi
. - If
j
is less thank
, replace the element at indexj
in the reservoir with the itemi
. - After iterating through the entire stream, the reservoir contains a random sample of size
k
.
Data Stream
Reservoir (k=3)