Bayesian approach to analyzing goalies, Part 1

Save percentage (SV%) is a statistic that is commonly used to analyze goalies.  Analysts have started using even strength save percentage (ESSV%) instead, for various reasons. One reason is that there isn't much data for special teams, so it's hard to draw any conclusions about performance or ability on the penalty kill.

Another is that a goalie doesn't control how many special teams situations he has to face, and so doesn't control how many special teams shots he has to face.  If a goalie's team takes a lot of (non-coincidental) minor penalties, that goalie will face a lot of shots on the penalty kill.  Those shots are tougher to stop, and this will tend to drag down the goalie's SV%.  Even if he is one of the league's top goalies, his SV% may not reflect that because he is facing so many more short handed situations than the other top goalies in the league.  ESSV% avoids all this.

There are still a couple issues with ESSV% though. 

Problem 1: ESSV% still tends to be very inconsistent especially for goalies who have faced a relatively small number of shots. This has been noted by several analysts. The top three results in a google search for “save percentage regression to the mean hockey” are a good sample of articles that discuss these issues, and why ignoring these issues can lead an analyst to draw questionable conclusions about the ability of a goalie or about the other players on a goalie’s team.

Problem 2: When we use ESSV%, we treat every shot the same.  Basically, we assume that all shots are created equal. In other words, every shot has an equal probability of being a goal.  A shot from 5 feet is the same as a shot from 50 ft. A shot from the center of the ice is the same as a shot from a wide angle.

Intuitively, this isn't true.  There is also statistical evidence that this isn't true.  Various analysts have shown that things like distance and angle, or alternatively, shot location, are related to the probability that a shot will be a goal. For example, here is goal percentage by distance:

Goal percentage decreases as distance increases. 

In this series we'll focus on Problem 1.  This work is based on joint work with Nick Clark.  We'll save Problem 2 and discussions about shot quality for another series of articles. That will be based on a project that one of my students, Calla Glavin, is working on.  The figure above is courtesy Calla. 

We note that Problem 1 was also discussed in the BUF and OTT team write-ups in the new 2013-14 Hockey Prospectus book.  Information about purchasing that book can be found here.  In this series, we'll go into a little more detail about how the method works and what kind of results we get.  We'll also have additional figures and an animation.

We'll also analyze the goalie situations of some NHL teams.  We won't do BUF and OTT again,  see the book for those.  We'll focus on a couple of other teams, and also focus on a couple of individual players.  

A Bayesian approach 

If we know ESSV% is fairly inconsistent, and we know that extremely high or low save percentages typically regress heavily to the league's mean ESSV%, perhaps we could develop a metric that is more conservative and less subject to large fluctuations.  One approach is to use a Bayesian statistical analysis.  In this kind of analysis, we still use data about the goalie's saves, and the number of shots he's faced.  But we also use some "prior information".  More specifically, we can use information about the known distribution of save percentages in the NHL.  For example, if we know that save percentages are typically distributed something like this, we can use that information:
Almost everyone is between .900 and .940, and most are still between .910 and .930.  If we use this prior information in an appropriate way, a goalie whose save percentage is far away from this region will get pulled in towards the league average. The extent to which he gets pulled towards the league average will depend on how many shots he has faced.  Goalies with lots of shots won't get pulled towards the mean as much as goalies with fewer shots.

That's just the big picture for now.  Next time, we'll use Roberto Luongo as an example to illustrate what kinds of results our Bayesian approach gives.  And we'll talk more about this "regression to the mean" and show some figures illustrating this effect.

And since I can't help myself, I'll give a teaser.  We'll also discuss this animation later in the series:

(If the animation isn't working, try refreshing your browser.)

Links to other parts:
Part 1 - Introduction
Part 2 - An example using only 10 shots
Part 3 - Updating estimates with more information
Part 4 - Regression to the mean, Luongo vs Schneider, Thomas vs Rask, Hiller vs Fasth

No comments:

Post a Comment