After
briefly looking around for a rainbow in the sky, and after overcoming the minor disappointment of not seeing one, I
noticed that the drizzle was becoming more and more like a steady rain:

# Greater Than Plus Minus

### A real life application of Venn Diagrams, or How I spent my Sunday afternoon.

### NHL Heat map

@StatsbyLopez posted a photo on Twitter from a recent presentation where I showed an NHL heat map. So I thought I'd give a quick explanation of said heat map.

First, here's the heat map: it's for all 5-on-5 shots in the NHL:

First, here's the heat map: it's for all 5-on-5 shots in the NHL:

The size of each rectangle
represents the number of shots taken at that location (bigger rectangle = more shots). The color
represents the percentage of shots that were goals (red = high percentage, blue = low percentage).

### Bayesian approach to analyzing goalies, Part 4: Regression to the Mean, Luongo vs Schneider, Thomas vs Rask, Hiller vs Fasth

Last time, in Part 3 of this series, we saw what our Bayesian estimate for Luongo's true ESSV% looks like over time; i.e. when we get more and more information and use more and more data. See here for links to all posts in this series.

One of the things we noticed in Part 3 is that there is some regression to the mean going on. Luongo's Bayesian estimate was always between his observed ESSV% and the league average ESSV%. Let's start with a figure that shows that this "regression to the mean" happens for all goalies.

In this article, we'll show results for more goalies than just Luongo. I haven’t really heard much analysis of the goaltending
situation in the Vancouver (joke). I’ve felt burdened by this obvious need that
the hockey world has (joke), so I thought I’d start with a little
statistical analysis of Roberto Luongo and Cory Schneider (not meant to be a
joke, though you may disagree after reading this). Really, I just want to use them as an example to help
illustrate this approach to analyzing goalies.

One of the things we noticed in Part 3 is that there is some regression to the mean going on. Luongo's Bayesian estimate was always between his observed ESSV% and the league average ESSV%. Let's start with a figure that shows that this "regression to the mean" happens for all goalies.

### Bayesian approach to analyzing goalies, Part 3: Updating estimates with more data

In Part 1 of this series, we gave an introduction to the series, and the motivation behind this new method of analyzing goalies. In Part 2, we started with an example of Luongo's first 10 shots to build some intuition for what estimates this method gives and the advantages it has over traditional SV% or even strength save percentage (ESSV%). We got a picture like this:

After 10 shots, Luongo's curve (black) is still very similar to what our prior expectation was based on typical ESSV% in the NHL. Since he saved 10 out of his first 10 shots, his curve (and the corresponding dot) is shifted slightly to the right.

In this article, we'll continue this example and discuss the animation that we showed at the end of Part 2. The animation shows what Luongo's curve looks like after 20 shots, 30 shots, 40 shots, etc.:

After 10 shots, Luongo's curve (black) is still very similar to what our prior expectation was based on typical ESSV% in the NHL. Since he saved 10 out of his first 10 shots, his curve (and the corresponding dot) is shifted slightly to the right.

In this article, we'll continue this example and discuss the animation that we showed at the end of Part 2. The animation shows what Luongo's curve looks like after 20 shots, 30 shots, 40 shots, etc.:

### Bayesian approach to analyzing goalies, Part 2: An example

In Part 1 of this series, we gave an introduction to this method of analyzing goalies and the motivation behind it. We discussed why many analysts use even strength save percentage (ESSV%) instead of traditional SV%, and talked about two problems that ESSV% has which we'd like to deal with. This series focuses on Problem 1: ESSV% tends to be very inconsistent especially for goalies who have faced a relatively small number of shots. Our approach is a Bayesian analysis, where we can use "prior information".

Let's begin with a simple example. Let's consider the first 10 shots that Roberto Luongo faced during the 2008-09 season. None of these shots were goals. If this is the only information we have about Luongo, what should our estimate of his true ESSV% be? In other words, what should we expect his ESSV% to be going forward? We knew a lot about Luongo at the beginning of the 2008-09 season, but for the sake of explaining what's going on, let's pretend we didn't.

One approach to answering this question is to use his observed ESSV%. In this case, his ESSV% is 1.000 since he had 10 saves on these 10 shots. This estimate is pretty high... 1.000 is way higher what any typical NHL goalie would ever have. This is a pretty extreme example, since we are using only 10 shots, and it's sort of obvious that observed ESSV% should not be our estimate in this case. But it will help us explain what is going on, and this kind of thing can still happen when using 100, 500, or 1000 shots, though the observed ESSV% won't typically be as extreme.

Instead of using observed ESSV%, we could use a Bayesian estimate. Suppose we know that the league's ESSV% are typically distributed like this:

Let's begin with a simple example. Let's consider the first 10 shots that Roberto Luongo faced during the 2008-09 season. None of these shots were goals. If this is the only information we have about Luongo, what should our estimate of his true ESSV% be? In other words, what should we expect his ESSV% to be going forward? We knew a lot about Luongo at the beginning of the 2008-09 season, but for the sake of explaining what's going on, let's pretend we didn't.

One approach to answering this question is to use his observed ESSV%. In this case, his ESSV% is 1.000 since he had 10 saves on these 10 shots. This estimate is pretty high... 1.000 is way higher what any typical NHL goalie would ever have. This is a pretty extreme example, since we are using only 10 shots, and it's sort of obvious that observed ESSV% should not be our estimate in this case. But it will help us explain what is going on, and this kind of thing can still happen when using 100, 500, or 1000 shots, though the observed ESSV% won't typically be as extreme.

Instead of using observed ESSV%, we could use a Bayesian estimate. Suppose we know that the league's ESSV% are typically distributed like this:

### Bayesian approach to analyzing goalies, Part 1

Save percentage (SV%) is a statistic that is commonly used to analyze goalies. Analysts have started using even strength save percentage (ESSV%) instead, for various reasons. One reason is that there isn't much data for special teams, so it's hard to draw any conclusions about performance or ability on the penalty kill.

Another is that a goalie doesn't control how many special teams situations he has to face, and so doesn't control how many special teams shots he has to face. If a goalie's team takes a lot of (non-coincidental) minor penalties, that goalie will face a lot of shots on the penalty kill. Those shots are tougher to stop, and this will tend to drag down the goalie's SV%. Even if he is one of the league's top goalies, his SV% may not reflect that because he is facing so many more short handed situations than the other top goalies in the league. ESSV% avoids all this.

There are still a couple issues with ESSV% though.

**Problem 1**: ESSV% still tends to be very inconsistent especially for goalies who have faced a relatively small number of shots. This has been noted by several analysts. The top three results in a google search for “save percentage regression to the mean hockey” are a good sample of articles that discuss these issues, and why ignoring these issues can lead an analyst to draw questionable conclusions about the ability of a goalie or about the other players on a goalie’s team.

### More places to buy 2013-14 Hockey Prospectus

See here for links to various places to buy the book. There are .pdf versions and paperback versions.

As noted at that link, the cheapest paperback is available from CreateSpace using the discount code

**7B8F3CZ7**. It is $22.95 after discount.

The .pdf version is $12.95.

### Hockey Prospectus 2013-14 now available

Hockey Prospectus 2013-14 is now available for purchase. See here for details on how to make said purchase.

For those interested, I contributed to the BUF, OTT, and WPG pages. For WPG, I wrote about stuff related to realignment, and how WPG's travel will be affected. For BUF and OTT, I analyzed their goalie situations using a new technique.

I'll write more about those techniques in a forthcoming post, but the basic idea is that the approach automagically regresses a goalies estimated true Sv% towards the league average, especially for those goalies who have faced relatively few shots.

For those interested, I contributed to the BUF, OTT, and WPG pages. For WPG, I wrote about stuff related to realignment, and how WPG's travel will be affected. For BUF and OTT, I analyzed their goalie situations using a new technique.

I'll write more about those techniques in a forthcoming post, but the basic idea is that the approach automagically regresses a goalies estimated true Sv% towards the league average, especially for those goalies who have faced relatively few shots.

### PLAY - More analysis and top 10 PLAYers

In Parts 1 thru 3, we developed a new playmaking metric that is (1) more consistent than assists and (2) better than assists at predicting future assists. ~~@~~

**spamventura**had an observation about the results that we'll study a little further in this article. Along the way, we'll give the top 10 PLAYers (capitalization and pun intended) from 2009-10, and how they did in 2010-11.### PLAY: A Playmaking Metric, Part 3 - Results

In Part 2 of this series on a new playmaking metric PLAY, we talked about "altruistic contribution" based on shots. It is basically the difference in shots taken by a player’s

*teammates*(excluding the player’s own shots) when he is on the ice versus off the ice. It’s kind of like a shot-based WOWY that doesn’t include the player's own shots. If a player's

*teammates*take more shots when he is on the ice versus off the ice, his altruistic contribution will be high.

Now we'll use this altruistic contribution to build our playmaking metric PLAY, and show that PLAY is better than assists in two quantifiable ways: (1) it is more consistent than assists, and (2) it is better than assists at predicting future assists.

Subscribe to:
Posts (Atom)