In the March 8th issue of the ESPN the Magazine is an article by John Hollinger on the subject of plus/minus. In “Fuzzy Math: Plus/Minus Tell a Story, Though Not the Whole One”, Hollinger details the problems with the latest addition to the standard box score. Unfortunately I was unable to find an on-line version of the article. So let me try and summarize the issues Hollinger raises.
- The first critique comes from Dean Oliver (author of Basketball on Paper and currently the stats person with the Denver Nuggets). Oliver is quoted saying, “It’s (the plus/minus measure) noisy, uncertain and kind of a black box – you have a hard time understanding why its coming out the way it is.”
- Hollinger also notes the plus/minus stat doesn’thelp us compare players across teams.
- On a related point, the stat also doesn’t take into account substitution patterns.
Much of what Hollinger says in this article was originally stated in an article he first posted at ESPN.com in 2005 (insider access required). That article also noted that a player’s teammates impacted his plus/minus.
Fans of this approach, though, might argue that all that’s needed is adjusted plus/minus. This approach – originally developed by Wayne Winston and Jeff Sagarin – employs regression analysis to control for a player’s teammates. Theoretically, adjusted plus/minus should answer Hollinger’s critiques (but doesn’t – as I will note in moment – Oliver’s criticisms).
When we look at the adjusted plus/minus numbers, though, it doesn’t look like Hollinger’s issues have gone away. Consider the case of Darius Songaila.
Last season Songaila posted a -0.076 WP48 [Wins Produced per 48 minutes] with the Washington Wizards. This result is not unusual. Songaila posted WP48 marks in the negative range in 2005-06, 2006-07, and 2007-08 (he posted a 0.056 mark – his career best – as a rookie at the age of 25 in 2003-04).
Adjusted plus/minus, though, told a very different story. According to basketballvalue.com, Songaila was the third best player on the Washington Wizards in 2008-09. Again, adjusted plus/minus is supposed to control for a player’s teammates. So when the New Orleans Hornets acquired Songaila, they probably expected to see a positive adjusted plus/minus as well. But currently Songaila is posting the lowest adjusted plus/minus in New Orleans. So with completely different teammates, Songaila – according to adjusted plus/minus – is a very different player.
This should be thought of as odd, though, since Songaila’s WP48 with the Hornets is still in the negative range. In other words, his box score numbers are not very different despite the fact his teammates have all changed.
This result is not just confined to Songaila. JC Bradbury and I – in a forthcoming article in the Journal of Sports Economics — report that only 7% of a player’s adjusted plus/minus is explained by what a player did the previous season (oddly enough, unadjusted plus/minus has a stronger – albeit still relatively weak – correlation). In other words, the correlation coefficient for adjusted plus/minus from season-to-season is below 0.30. And when we look at players who switch teams – as Songaila did – we fail to find a statistically significant relationship. In contrast, any measure (PERs, Wages of Wins measures, NBA Efficiency, Win Shares, etc…) based on the box score will have a correlation coefficient of at least 0.65, and often these marks are above 0.80. And that correlation remains strong even when a player changes teams.
What does this mean for decision-makers? Decisions are about the future. Unfortunately – because plus/minus is so inconsistent across time — it doesn’t appear this measure can be relied upon to make decisions about the future.
It’s important to note that inconsistency is not the only problem with this measure. The standard errors associated with this measure – even when multiple years are added – tend to be so large that for many players the results are statistically insignificant (Bradbury and I make this point in our article as well).
Even if the problems of inconsistency and the standard errors could be solved, the critique from Dean remains. As Dean notes, this measure is essentially a “black box.” A decision-maker has no idea why a specific result is obtained. So it’s hard to know what the results mean.
One can state this last critique as follows: What plus/minus can show is a correlation. When a specific player is on the court, a team tends to do good or bad. But it doesn’t show causation. And therefore, it’s hard for a decision-maker to know really what this means.
Of course, all of this doesn’t stop decision-makers from using this information. And as Avery Johnson details, the Golden State Warriors upset of the Dallas Mavericks in 2008 can be partially attributed to Johnson following the dictates of plus/minus analysis.
Let me close with three more observations.
- One senses that people might be able to tell a story about why Songaila’s plus/minus numbers have changed. Such stories, though, are also a problem. Analysis should begin with a story, and then this story should be tested. We should try and avoid looking at a test and then making up a story.
- We should note that adjusted plus/minus analysts have fully acknowledged Dean’s observation that this measure has “noise.” Unfortunately, when specific players are analyzed this observation seems to vanish. In other words, we never seem to see someone argue that a player’s current adjusted plus/minus is just “noise.” But if there is “noise” in the model, some of these results have to also be “noise.”
- The fact that some teams have turned to such measures confirms what has been argued about a traditional approaches to player evaluation. Teams are turning to these measures because the traditional approaches do not appear to work.
So although adjusted plus/minus has problems, it is understandable that teams are turning to this measure. One suspect, though, that the problems – detailed by Hollinger, Oliver, Bradbury, and I — are simply not well understood by everyone.
– DJ
The WoW Journal Comments Policy
Our research on the NBA was summarized HERE.
The Technical Notes at wagesofwins.com provides substantially more information on the published research behind Wins Produced and Win Score
Wins Produced, Win Score, and PAWSmin are also discussed in the following posts:
Simple Models of Player Performance
What Wins Produced Says and What It Does Not Say
Introducing PAWSmin — and a Defense of Box Score Statistics
Finally, A Guide to Evaluating Models contains useful hints on how to interpret and evaluate statistical models.
Jason
March 8, 2010
Hey Dr Berri,
If you are an insider and have the magazine access you can get to the latest version digitally at this link http://insider.espn.go.com/insider/espnmag
Italian Stallion
March 8, 2010
I don’t disagree with anything you are saying, but I think you can find flip side examples of adjusted +/- picking up on attributes that the box score did not even though they were apparent to all impartial observers .
I watch every Knicks game (sometimes twice).
Jared Jeffries is not a particularly strong player from a box score perspective, but he was far and away the Knicks best defender this year and one of the teams more valuable players.
He did many things that didn’t show up in the boxscore.
He took a lot of charges, tipped rebounds when he couldn’t secure them which kept them alive and sometimes lead to a rebound by another Knick, played terrific help defense, guarded multiple positions, deflected balls etc…
It has also been obvious from observing the Knicks this season that David Lee is wildly overmatched defensively against a lot of the bigger Cs and can’t keep up the more athletic players of a similar size. He also doesn’t do much when one of the perimeter defenders gets beat.
Jeffries often covered for Lee’s poor defensive ability and held the defense together as best as possible.
There were endless games where as soon as he came out, the team fell apart even though he was only contributing marginally from a boxscore perspective.
Adjusted +/- has tended to reflect what has been obvious to all Knicks fans this year about both Lee and Jeffries.
Lee’s defensive liabilities as a C are gargantuan and Jeffries held the defense together.
That became even more obvious when Jeffries was traded.
Whether Jeffries has similar value to Houston is highly debatable because that team has “different needs”. I think the interaction of players and team balance is something that no stats that I have seen to date can capture.
While I favor boxscore stats over adjusted +/- and especially favor your model, I still find it silly to dismiss seemingly obvious weaknesses in the current boxscore stats (however marginal). So an analyst might as well look at other tools that have some demonstrable value at capturing what the boxscore misses.
Crow
March 8, 2010
If you look at Songaila’s 2 year Adjusted plus/minus as reported at basketballvalue for the 2008-9 season in Washington (based on the 07-08 and 08-09 seasons) and his 2 year for the 09-10 season in New Orleans (based on the 08-09 and 09-10 seasons) the amount of change in his Adjusted +/- is only about one third of what the 1 year Adjusted +/- comparison shows- a shift from around +3 to a bit below zero rather than from about +2 to -6.
If you look at 1 year Regularized APM at hoopnumbers.com Songaila’s Adjusted estimnate only moved from +0.3 to -1.3.
If you want some ideas to try to help understand “why its coming out the way it is” you can look at the 4 Factor breakout at hoopnumbers.com.
Crow
March 8, 2010
The standard errors for Songaila’s 2 year Adjusted +/- at basketballvalue are about 3 points. The standard errors for Songaila’s 1 year Adjusted +/- at hoopnumbers are also a fairly modest 2-3 points.
The standard errors for Hoopnumbers multi-year estimates tend be about 1.5 points. Using Hoopnumbers multi-year estimates as calculated after 08-09 and as calculated just recently Songaila’s Adjusted +/- also only moved about 1 point.
dberri
March 8, 2010
Crow,
Standard errors are evaluated relative to the corresponding coefficient. These are not evaluated in absolute terms. So I do not know what you mean when you say a standard error of 2-3 points is “fairly modest”.
I would also add, you can’t look at consistency by looking numbers drawn from 2007-08/2008-09 and then 2008-09/2009-10. You are comparing numbers drawn partially from the same source.
IS,
Your analysis of APM ignores the substantial issue with “noise.” How is a decision-maker supposed to know if a specific result means something or is just “noise” from the model? If you cannot answer that question, then I don’t see how this model really helps.
What I think happens is people look at the results from this model, and then look at the player. Once you have that result in your mind, you start to confirm the model by watching how the player plays. Obviously this is not the way to do analysis. Statistical tests are used to test the hypothesis. We don’t do the test, and then start forming the hypothesis.
Crow
March 8, 2010
If you “can’t” or folks don’t want to look at 2 year numbers drawn from 2007-08/2008-09 and then 2008-09/2009-10, then it is still possible to use the independent 1 year Regularized APM at hoopnumbers.com for 08=09 and 09-10 and only see 1 point of movement for Songaila.
Crow
March 8, 2010
Correction: 1.6 points.
dberri
March 8, 2010
Crow,
The RAPM numbers have a much smaller scale (relative to BasketballValue). You can see this if you look at the range of values from RAPM.
I did look at the correlation between 09-10 RAPM and 08-09 numbers. The results are almost the same as we see from BasketballValue (explanatory power of 7.7%).
Crow
March 8, 2010
I know the scale is smaller for RAPM and I think that is appropriate.
That you ran the correlation for RAPM as well and it is close to the same is good to know. Thanks.
VH
March 8, 2010
Is there a place that has the correlation coefficients for all the most common advanced stats in one place? e.g. wp48, per, adj. +/-, ORtg/DRtg, etc.
Westy
March 8, 2010
I wasn’t there, but other accounts have suggested that Avery Johnson said he thought he should have stuck with his Game 1 strategy. That he went away from what APM said and then lost the series would seem to be counter the point you’re attempting to make?
Westy
March 8, 2010
One other thought, and I hope it doesn’t seem harsh, but wanted to note it. I think it’s a little unfair to use Hollinger’s and Oliver’s critiques of the APM systems without acknowledging that both find probably equal fault with elements of yours.
I’m theorizing here, but I think both of them would say that we’re on a journey towards a better understanding of basketball via statistics, and the advanced analysis thereof. But I think both would agree we haven’t arrived. And I’m not sure they’d agree the WP system is yet further down the line than APM in terms of individual player evaluation. I think they would expect work to continue on all fronts.
That said, theirs and your critiques of APM are good observations.
dberri
March 8, 2010
Westy,
Did you follow the above link? Avery Johnson states (and I quote from the article) “But in the Mavs’ infamous first-round loss to Golden State two seasons later? “I got burned when following the advanced stats,” Johnson says.
And I would also add… there are criteria to evaluate the quality of any model. What is missing from the APBRmetrics discussion of any of this is a clear statement of the criteria one adopts to judge a model. Without saying what you are looking for, how will you ever know when you found it?
Crow
March 8, 2010
You note that only 7 to 7.7% of a player’s adjusted plus/minus in a current season is explained by what a player did the previous season.
But is that the right criteria to evaluate the model?
The ability of a minutes weighted sum of the previous season’s 1 year adjusted plus/minus for players to explain next season actual wins is apparently not good, but what about multi-year APM or regularized APM or multi-year RAPM?
And what about Joe Sill’s use of cross-validation to evaluate his RAPM model?
Any commentary about what he does and says here?
http://tinyurl.com/y8v4ssv
Crow
March 8, 2010
If multi-year APM or regularized APM or multi-year RAPM still fall way short of other methods in predicting wins then it shouldn’t be used to predict wins or build winners, at least all by itself.
Could a blend of Adjusted and some other metric do better than either alone? It may be possible. Worth checking.
But maybe it could also still help the thought process associated with player evaluation in some fashion. Where statistical and Adjusted give dramatically different ratings, maybe you look harder. Harder at whether to move a guy on your team or harder at the possibility of finding an undervalued asset on another team.
thecornerthree
March 8, 2010
What you fail to mention is that many people are advocating the use of APM for 5 man units. What are your thoughts on the use of +/- for lineups?
Westy
March 9, 2010
Re: what Johnson said, I take him using the word ‘burned’ only to mean he took heat for his decision from the players and the media. He earlier had given credit to APM for another playoff series victory. It goes on to note in regard to following the APM recommendations in Game 1, “Johnson, though, stands by the decison. ‘It was the right move.'”
As thecornerthree asks about, I take it to mean Johnson was advocating the use of APM in lineup evaluation.
Kent
March 13, 2010
Dr. Berri, what is the correlation of WP48 for players year to year that switch teams? Thanks
dberri
March 13, 2010
Kent,
As with any box score measure, the correlation is 0.6 to 0.7. And the link for players who switch teams is statistically significant. For APM, the link for players who switch teams is not statistically significant at all. So a GM using APM to evaluate a player he wishes to add to his roster isn’t really getting much useful information.
Fred Rempfer
March 25, 2010
“detailed by Hollinger, Oliver, Bradbury, and I.”
Should be, “me” not “I”. Object of the preposition. I hope your math is better than your English :-)
Statement
July 28, 2010
Bruce Bowen is a classic example of a player who has a crappy WP/48 but is generally regarded as a great defender and thus helps his team. This indicates that WP is not great for evaluating EACH AND EVERY player in the league. However, Winston does acknowledge that he feels that WP does a good enough job with most players.
The thing I like about WP is that it is consistent over time. Though it probably won’t value ALL players correctly, it does a good enough job for most and is consistent and is thus a good model for Sports Nerds like me to look at.