Derivation of Welford's Algorithm

kylepena

Kyle Pena

Posted on October 28, 2024

Derivation of Welford's Algorithm

This will be somewhat out of context if you're coming here first. It's really a footnote in a much longer series of blog posts on summarizing data distributions in computing environments where storage is at a premium.

In that blog post, I wanted to explain how to derive Welford's Algorithm for the recurrence relation for the second central moment, and found the explanations I could find a little lacking (at least for me).

I found this helpful post to be a great starting point, but the algebra part of the post skipped over so many steps that I couldn't follow it.

So I worked it out in greater detail. I'm hoping this is helpful to other mere mortals who, like myself, couldn't quite connect the dots.

Notation:

  • y represents a new observation.
  • μ' is the updated mean (the mean after incorporating y).
  • M2M_2 is the second central moment - well, not quite. Technically you'd have to divide by N to get the central moment. But we'll call it the central moment here.
  • N is the number of observations including y.

First, some work on the mean update formula:

μ=μ+yμN\mu^\prime = \mu + \frac{y - \mu}{N}
Nμ=Nμ+yμN\mu^\prime = N\mu + y - \mu
(N1)μ+μ=(N1)μ+y(N-1)\mu^\prime + \mu^\prime = (N-1)\mu + y
(N1)μ(N1)μ=yμ(N-1)\mu^\prime - (N-1)\mu = y - \mu^\prime
(N1)μ(N1)μ=μy(N-1)\mu - (N-1)\mu^\prime = \mu^\prime - y

Now, the main derivation:

M2M2=1N(yiμ)21N1(yiμ)2M_2^\prime - M_2 = \sum_1^N{(y_i - \mu^\prime)^2} - \sum_1^{N-1}{(y_i - \mu)^2}
=(yμ)2+1N1(yiμ)2(yiμ)2= (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i - \mu^\prime)^2 - (y_i - \mu)^2}
=(yμ)2+1N1(yi22yiμ+μ2)(yi22yiμ+μ2)= (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i^2 - 2y_i\mu^\prime +\mu^{\prime 2}) - (y_i^2 - 2y_i\mu + \mu^2)}
=(yμ)2+1N12yiμ+2yiμ+(μ2μ2))= (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i\mu^\prime + 2y_i\mu + (\mu^{\prime 2} - \mu^2))}
=(yμ)2+1N12yi(μμ)+(μμ)(μ+μ)= (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i(\mu^\prime - \mu) + (\mu^\prime - \mu)(\mu^\prime + \mu)}
=(yμ)2+1N1(μμ)(2yi+μ+μ)= (y - \mu^\prime)^2 + \sum_1^{N-1}{(\mu^\prime - \mu)(-2y_i + \mu^\prime + \mu)}
=(yμ)2+(μμ)1N1(2yiμμ)= (y - \mu^\prime)^2 + (\mu - \mu^\prime) \sum_1^{N-1}{(2y_i - \mu^\prime - \mu)}
=(yμ)2+(μμ)[2(N1)μ(N1)μ(N1)μ]= (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ 2(N-1)\mu - (N-1)\mu^\prime - (N-1)\mu ]
=(yμ)2+(μμ)[(N1)μ(N1)μ]= (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ (N-1)\mu - (N-1)\mu^\prime ]

Substitute the mean update term we worked out earlier.

=(yμ)2+(μμ)(μy)= (y - \mu^\prime)^2 + (\mu - \mu^\prime) (\mu^\prime - y)
=(yμ)2+(μμ)(yμ)= (y - \mu^\prime)^2 + (\mu^\prime - \mu) (y - \mu^\prime)
=(yμ)(yμ+μμ)= (y - \mu^\prime)(y - \mu^\prime + \mu^\prime - \mu)
=(yμ)(yμ)= (y - \mu^\prime)(y - \mu)

This equals the difference M2M2M_2^\prime - M_2 , and so the complete recurrence relation for the second central moments is:

M2=M2+(yμ)(yμ)M_2^\prime = M_2 + (y - \mu^\prime)(y - \mu)

And thus the recurrence relation for the corrected sample variance based on the second central moment is:

σ2=1N1[M2+(yμ)(yμ)]\sigma^{\prime 2} = \frac{1}{N-1} [ M_2 + (y - \mu^\prime)(y - \mu) ]
💖 💪 🙅 🚩
kylepena
Kyle Pena

Posted on October 28, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Derivation of Welford's Algorithm
statistics Derivation of Welford's Algorithm

October 28, 2024