KYCNA

Gum Wrappers

Image: Dragossmeu, CC BY-SA 4.0, via Wikimedia Commons

When I was a kid, bubble gums had fancy wrappers such as sport car models. I don't know if they still have those kind of things. An unpleasant fact was that you could not see which car image you would get until you bought it. Here is the question:

Question:

A kid wants to collect all 50 different wrappers of a gum brand. What is the average number of gums he should chew? Assume that all wrappers have 1/50 probability to appear.

Answer:

We want the expected number of gums until collecting all different wrappers:

\(E[N]=E[N_{1}+N_{2}+...+N_{50}]=E[N_{1}+N_{2}+...+N_{50}]=E[N_{1}]+E[N_{2}]+...+E[N_{50}]\)

Where \(N_i\) is the number of gums bought between i-1'th and i'th wrappers.

Let's think step by step.

First distinct wrapper will be acquired with the first bubble gum.

\(E[N_{1}]=1\)

Once the kid have one wrapper the second gum will have either the same wrapper with probability of 1/50 or have a different wrapper with a probability of 49/50. If the second wrapper has the same image with the first one, then the same probabilites are valid for the new gum. Thus, the process is the same with the new gum until it has a different wrapper. It can be written as a recursive expectation.

\(E[N_{2}] = \frac{1}{50}(E[N_{2}] +1) + \frac{49}{50}(1)\)

\(E[N_{2}] = \frac{50}{49}\)

The same logic applies for the third distinct gum wrapper with probabilities of 2/50 and 48/50.

\(E[N_{3}] = \frac{2}{50}(E[N_{3}] +1) + \frac{48}{50}(1)\)

\(E[N_{2}] = \frac{50}{48}\)

I think the pattern is obvious now.

\(E[N] = 1+\frac{50}{49}+\frac{50}{48}+...+\frac{50}{1}=224.96\)

If you know the geometric distribution, another method is also available. Geometric distribution describes a process which ends with the first succesful outcome among the attempts. If probability of success is p, the distribution can be described as \(f(k)=p(1-p)^{k-1}\). We can easily formulate the expected value of this distribution:

\(E[k]=\sum_{1}^{\infty}kf(k) =\sum_{1}^{\infty}kp(1-p)^{k-1}\)

\(=p\sum_{1}^{\infty}k(1-p)^{k-1}\)

\(=-p\sum_{1}^{\infty}\frac{\partial(1-p)^{k}}{\partial p}\)

\(=-p\frac{\partial \sum_{1}^{\infty}(1-p)^{k}}{\partial p}\)

\(=-p\frac{\partial \frac{1-p}{p}}{\partial p}\)

\(=-p\left( \frac{-p-1+p}{p^2} \right)\)

\(=\frac{1}{p}\)

For the second distinct wrapper, we have a geometric distribution with p = 49/50. Thus, the expected number of gums chewed is 50/49 to get the second distinct wrapper. Once we have two different wrappers, same probability becomes 48/50 for the third one. Thus it takes 50/48 to find new wrapper.

The R code below can be used to simulate the problem.

> n_items <- 50
> ns <- c()
> for (i in 1:1000)
+ {
+ n <- 0
+ remained <- 1:n_items
+ while (length(remained)>0) {
+ remained <- remained[!(remained %in% sample(1:n_items,1))]
+ n <- n + 1
+ }
+ ns <- c(ns,n)
+ }
> mean(ns)
[1] 225.903

statistics probability

K-Means Clustering Simulation

Tags machine-learning

Mean Squared Error

Tags statistics data-analysis data-science machine-learning probability

Gum Wrappers

Tags statistics probability