Hide sidebar

Reservoir Sampling

Reservoir Sampling

RandomizedData Stream
MediumLeetCode #382
20 min

Problem Statement

Given a singly linked list, return a random node's value from the linked list. Each node must have the same probability of being chosen.

This is a classic application of Reservoir Sampling, where we need to select a random sample of size k (in this case, k=1) from a list of n items, where n is either a very large or unknown number.

Example

Example 1:

Input: A stream of numbers [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and k = 3

Data Stream (n=10)

1
2
3
4
5
6
7
8
9
10

Reservoir (k=3)

?
?
?

A random sample of 3 items is selected from the stream.

Output: A random sample of 3 elements from the stream, e.g., [4, 7, 2]

Solution (Algorithm R)

Reservoir sampling is a family of randomized algorithms for choosing a simple random sample of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm.

Algorithm Steps

  • Create a reservoir array of size k.
  • Fill the reservoir with the first k items from the stream.
  • For each subsequent item i (from k+1 to n), generate a random integer j from 0 to i.
  • If j is less than k, replace the element at index j in the reservoir with the item i.
  • After iterating through the entire stream, the reservoir contains a random sample of size k.

Data Stream

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Reservoir (k=3)

Start with an empty reservoir of size k=3.
Reservoir Sampling Solution

import random

class Solution:
    def __init__(self, head: Optional[ListNode]):
        self.head = head

    def getRandom(self) -> int:
        scope = 1
        chosen_value = 0
        curr = self.head
        
        while curr:
            # decide whether to replace the chosen value with the current node's value
            if random.random() < 1 / scope:
                chosen_value = curr.val
            curr = curr.next
            scope += 1
        return chosen_value