Bayesian approach to analyzing goalies, Part 4: Regression to the Mean, Luongo vs Schneider, Thomas vs Rask, Hiller vs Fasth

Last time, in Part 3 of this series, we saw what our Bayesian estimate for Luongo's true ESSV% looks like over time; i.e. when we get more and more information and use more and more data.  See here for links to all posts in this series. 


In this article, we'll show results for more goalies than just Luongo. I haven’t really heard much analysis of the goaltending situation in the Vancouver (joke). I’ve felt burdened by this obvious need that the hockey world has (joke), so I thought I’d start with a little statistical analysis of Roberto Luongo and Cory Schneider (not meant to be a joke, though you may disagree after reading this). Really, I just want to use them as an example to help illustrate this approach to analyzing goalies. 


One of the things we noticed in Part 3 is that there is some regression to the mean going on.  Luongo's Bayesian estimate was always between his observed ESSV% and the league average ESSV%.  Let's start with a figure that shows that this "regression to the mean" happens for all goalies. 
 

Bayesian approach to analyzing goalies, Part 3: Updating estimates with more data

In Part 1 of this series, we gave an introduction to the series, and the motivation behind this new method of analyzing goalies.  In Part 2, we started with an example of Luongo's first 10 shots to build some intuition for what estimates this method gives and the advantages it has over traditional SV% or even strength save percentage (ESSV%).  We got a picture like this:


After 10 shots, Luongo's curve (black) is still very similar to what our prior expectation was based on typical ESSV% in the NHL. Since he saved 10 out of his first 10 shots, his curve (and the corresponding dot) is shifted slightly to the right. 

In this article, we'll continue this example and discuss the animation that we showed at the end of Part 2.  The animation shows what Luongo's curve looks like after 20 shots, 30 shots, 40 shots, etc.:

Bayesian approach to analyzing goalies, Part 2: An example

In Part 1 of this series, we gave an introduction to this method of analyzing goalies and the motivation behind it.  We discussed why many analysts use even strength save percentage (ESSV%) instead of traditional SV%, and talked about two problems that ESSV% has which we'd like to deal with.  This series focuses on Problem 1: ESSV% tends to be very inconsistent especially for goalies who have faced a relatively small number of shots.  Our approach is a Bayesian analysis, where we can use "prior information".   

Let's begin with a simple example.  Let's consider the first 10 shots that Roberto Luongo faced during the 2008-09 season.  None of these shots were goals.  If this is the only information we have about Luongo, what should our estimate of his true ESSV% be?  In other words, what should we expect his ESSV% to be going forward?  We knew a lot about Luongo at the beginning of the 2008-09 season, but for the sake of explaining what's going on, let's pretend we didn't.

One approach to answering this question is to use his observed ESSV%. In this case, his ESSV% is 1.000 since he had 10 saves on these 10 shots.  This estimate is pretty high... 1.000 is way higher what any typical NHL goalie would ever have.  This is a pretty extreme example, since we are using only 10 shots, and it's sort of obvious that observed ESSV% should not be our estimate in this case.  But it will help us explain what is going on, and this kind of thing can still happen when using 100, 500, or 1000 shots, though the observed ESSV% won't typically be as extreme. 

Instead of using observed ESSV%, we could use a Bayesian estimate.  Suppose we know that the league's ESSV% are typically distributed like this: