Later today I might post something on the LeBron, Kobe, and the MVP award. Then again, I have a pile of grading to finish by tomorrow. So my comments on this award might have to wait.
In the meantime, let me do some lazy blogging. Specifically, rather than write something original, let me re-post two comments with a Wages of Wins theme. The first is from Stacey Brook (co-author of The Wages of Wins). Stacey has a wonderful blog that he calls Hawkonomics, and recently he commented on the link between NBA payroll and performance:
NBA Payroll and Performance for 2008-2009
Recently the USA Today ran a story on NBA team payrolls. The story seemed to conclude that NBA teams that have the highest paid players do the best and thus teams that have lower paid players do not perform so well. They make their point by using player salaries from the LA Lakers and the Boston Celtics as examples. This is a variant on the argument that teams with high payroll will perform better than teams with lower payroll, and I have to disagree that NBA (or for that matter NHL, MLB or NFL) teams that have high payrolls result in higher winning percentages; nor am I the first to say this.
In essence, the main premise of Michael Lewis’ book, Moneyball was to examine how the Oakland A’s did so well with one of the lowest payrolls in Major League Baseball. Additionally, as we state in The Wages of Wins, team payroll does not explain a high degree of team performance. How do we back up this statement statistically? We analyzed team performance and relative team payroll data (to account for increasing overall payrolls over multiple seasons), and calculated the coefficient of determination, also called r-squared or R2. We use R2 since we are interested in the proportion of variance that is in common between NBA team payroll and NBA team performance. Since R2 is between zero and one, the number is the percentage of the variance that is in common between NBA team payroll and NBA team performance. What we find is that the proportion of variance that is in common between NBA team performance and NBA team payroll is rather small.
Some have argued – incorrectly – that we use the wrong statistical measure. They say the true measure is the correlation coefficient – also called r. Why is this incorrect? As I explained in this post on The Wages of Wins Journal, the correlation coefficient does not measure how much of the variation between NBA team payroll and NBA team performance is in common, but rather whether NBA payroll and NBA performance change together or change oppositely.
Sometimes correlations can lead us astray. For example my blog about there being is a high positive correlation between vocabulary and corporate success. If we use correlation as our guide to the importance that one variable has on another, we would conclude that studying the dictionary (or watching The Daily Show) will allow us to climb higher on the corporate ladder. While I do not have the data, my guess is that the R2 is rather low, since the amount of variation that is common between these two variables is most likely tiny. These cases where you get very high correlations (positive or negative) are referred to as spurious correlation.
So with the stats stuff briefly discussed, let me show you why I disagree with the USA Today’s inferences about NBA payroll and team performance. If we calculate the coefficient of determination (R2) for NBA team payroll – using the USA Today’s NBA salary database and the NBA’s final season team performance the R2 is 0.041. What this means is that the proportion of variance that is common between NBA team payroll and NBA team performance is 4.1%. Just to be clear, the correlation coefficient is 0.202.
Not only that, but I also tested to see if the correlations between this past years NBA team payroll and team performance were related, and using the test statistic: ((n-2)*R2)/(1-R2)) for 1 degree of freedom and 30 degrees of freedom, found that the calculated test statistic was less than found at the 5% probability level in the F Distribution, so we would accept the null hypothesis, which is that the correlations between the two variables (NBA payroll and NBA performance) are unrelated. So not only the proportion of variance that is common between the two tiny, but here I am able to show that the correlation coefficient between the two populations (NBA payroll and NBA performance) for the 2008-2009 season is statistically zero.
Now since I am only looking at the 2008-09 NBA season, I did not calculate relative payroll as we did in The Wages of Wins. If I were to calculate relative payroll – like we did in The Wages of Wins – we will get the same answer since relative payroll is a monotonic transformation of total payroll.
Earlier this year, an unnamed NHL executive and I looked at NHL payroll (using their data) and NHL team performance, and we found in essence the exact same result – which was a surprise to him, but not to me.
Bottom line: team payrolls are poor gauges in measuring team performance.
Click here for more information on correlation.
The next stolen comment is from Matthew Yglesias. What he states is both a) obvious and b) as he notes, generally missed by many sportswriters.
My baskeblogging has gotten pretty lame around here. So lame that I didn’t even watch the Rockets upset the Lakers last night. Huge mistake. That said, this seems like a good time to revisit a classic theme of Yglesias NBA commentary—a lot of times you hear that guys are making awesome contributions that don’t show up in the box score when, in fact, their contributions show up in the box score. Thus this from J.A. Adande:
And Chuck Hayes? Well, you couldn’t even find a box score by his locker. He said he doesn’t even bother to read them anymore, because they don’t reflect his contributions. “What he does, it does show up … just in winning and losing,” Morey said.
My copy of the box score shows that Hayes only played 6 minutes. Obviously, under the circumstances he didn’t make that huge an impact. But it also shows that during those six minutes he grabbed three rebounds and a steal while taking zero shots and committing zero turnovers. A guy who played 30 minutes and grabbed 15 rebounds and five steals should, I think, be seen as making a huge contribution to his team as long as he plays defense well even if he doesn’t score many points. The key thing is that your possession monster can’t be missing tons of shots. Hayes used to be a modest scorer whose field goal percentage was consistently over 50 percent. Add that to great rebounding, and you have a very effective player whose contributions are very much being captured by the box score. This season, however, Hayes’ FG% and FT% are both way down which makes him less useful.
All this, however, is right there in the box score. The box score has its limits—most notably it’s hard to draw any conclusions about defense from box scores—but unless by “box score” you mean “raw point total without considering shots taken or minutes played” then it really is a very informative thing.
Again, I should have something original posted on the MVP award soon. At least, as soon as I get done with all this grading.
– DJ
The WoW Journal Comments Policy
JNS
May 5, 2009
Regarding Stacey Brook’s comments, I have to say: oh, dear. First of all, it is a simple mathematical fact that R^2 is r, squared. There is a very straightforward relationship between the correlation and the coefficient of determination: as long as the relationship in question is bivariate, the coefficient of determination contains exactly the same information as the correlation, except that it doesn’t tell you whether the linear relationship in question is positive or negative. So, for the vocabulary and corporate success example, if r is very high, then R^2 mathematically has to be very high, as well. So when Brook says, “the correlation coefficient does not measure how much of the variation between NBA team payroll and NBA team performance is in common,” that’s not really true. Take the absolute value of the correlation coefficient, and you have a straightforward, if nonlinearly transformed, measure of the shared variation between NBA team payroll and NBA team performance.
Another point: both r and R^2 measure linear relationships only. Language about “shared variation” can be made to sound more general than this, but it is not. In the present context, the crucial question about linearity would be whether there are diminishing returns to scale in spending.
Also, when Brook says, “here I am able to show that the correlation coefficient between the two populations (NBA payroll and NBA performance) for the 2008-2009 season is statistically zero,” this is an abuse of hypothesis testing. To say that the correlation is statistically zero is to say that we can generate a confidence interval for the correlation coefficient that is centered at zero and has width zero. Obviously, this is not the case. What is true is that we are unable to show that, if the probability model is right, a true population with a correlation coefficient of zero would generate fewer than 5 yearly payroll/victory combinations out of every 100 samples with sample correlation coefficients as big or bigger than the one we observed. This is routinely glossed with the phrase “statistically indistinguishable from zero,” but not “statistically zero.”
Finally, “spurious correlation” is about bivariate correlations that disappear when theoretically appropriate covariates are added to create a multivariate version of the model. It has nothing to do with the difference between r and R^2. In the present context, one might be tempted to ask whether there might be something of a spurious non-correlation between spending and performance. I don’t know if this is the case, but one plausible alternative hypothesis might be that underperforming teams tend to add payroll more quickly than other teams in the effort to improve their situation…
Todd
May 5, 2009
“The box score has its limits—most notably it’s hard to draw any conclusions about defense from box scores”
I find the quoting of this comment peculiar on this website since one of the major premises of win score seems to be that the box score gives credit for rebounds and the rebounder should get credit for all of the defense provided by his team other than blocks and steals.
Acknowledging this weakness of the box score is basically acknowledging that win score overrates rebounding and underrates defense.
Oren
May 6, 2009
Todd,
You are aware that Berri didn’t write that quote, right?
He was quoting the whole article of a different author(Matthew Yglesias) because he was too busy to write his own.
That doesn’t mean that Berri agrees with it.
mrparker
May 6, 2009
todd,
Win score does not give all credit for defense to the rebounder. It does however give much more credit than other player evaluations.
Michael
May 6, 2009
^Case in point, Denvers defense being so much better after losing Marcus Camby. (Billups and Anderson obviously also get some credit.)
Michael
May 6, 2009
Actually that reminds me, I think I owe Charley Rosen an apology concerning Camby!
Christopher
May 6, 2009
How about no column on the MVP. Lebron James has easily surpassed Kobe Bryant and was the only credible choice for MVP. This is really a non-issue. Maybe and MVP by position would be intriguing. But I’m more intrigued by the notion that the Cavs look like they could win the title without losing a single game. I’d like some perspective on that and on coaching. I remember you saying something about a subset of coaches made their payers better. Are they still coaching? Is Mike Brown one of them?
Chip Crain
May 6, 2009
Why not blow off the grading and focus your attention on your blog. I am sure your students would appreciate it!
Just teasing and keep up the good work on both ends.
Clipfan
May 6, 2009
The boxscore in the game missed arguably the most critical contribution to the win by Chuck Hayes. Andrew Bynum and Pau Gasol missed shots and were discouraged from shooting because of Chuck’s defense. This was not captured for Chuck in the boxscore. Win score is pretty useful for breaking down win contribution into component team parts. It is nearly useless for breaking down win contribution into component player parts. Mr. Berri should stop representing it as such.
James
May 6, 2009
@ Clips fan. Wins Produced has a team defense adjustment. This is necessary because without the team defensive adjustment the box score statistics would only explain about 75% of wins (I think this is about right). The defensive adjustment is not the magic trick of this formula, it is important though.
From what I’ve seen about Wins Produced (not Win Score) the general criticism is not about whether it effectively explains wins at the team level, but instead whether it assigns wins correctly to players. Also, Win Score isn’t supposed to be Wins Produced. Win Score (If I remember correctly) doesn’t have the team or positional adjustment, but instead just the weighted box score stats. The criticism against Wins Produced is that Dberri can’t draw the conclusions about player wins based on WP data because it is uncertain whether it explains player wins well enough. The criticism is not “his player level data is clearly wrong” it is “you can’t be sure that WP effectively explains player wins”. I’m not statistically inclined enough to understand whether the formula does explain player level wins effectively enough to be an effective player-production model, but I can say that you can’t just throw out the formula because you have a question about its effectiveness. There needs to be some data that says it is ineffective. Seriously, what data set can you think of that you could correlate to WP that would allow you to prove or disprove the way that WP assigns wins to individual players?
The only thing I want to know is does the team defensive adjustment correctly assign defensive production or does it need to be improved?
Also, Dberri doesn’t use Win Score to draw player conclusions unless he wants to do a brief and general overview of a topic.
Matthew Yglesias
May 6, 2009
What James said. And, indeed, what I said in the first place. The box score stats can tell you a lot about a player’s contribution. There is, however, a meaningful residual element related to defense.
Don Taylor
May 7, 2009
http://sports.espn.go.com/nba/playoffs/2009/columns/story?columnist=hollinger_john&page=PERDiem-090506
mrparker
May 7, 2009
Michael,
We all spoke about Denver’s defense this year on another comment thread. Camby’s defense was replaced by a healthy Nene, Chris Anderson and Renaldo Balkman and some other minutes subtracted from a player who’s defense wasn’t that good. Denver didn’t add 2 players this year as everyone wants to pretend. They pretty much have a new player at every position(1 billups, 3 balkman, 4 anderson, 5 nene) besides 2 guard.
Luke Nelson
May 7, 2009
Hi,
I love your statistical work. I was just wondering when the final 2008-09 wins produced numbers for individual players will be calculated? I was looking to get a jump start on an initial win prediction for the Bucks next year (should they actually remain healthy, which seems to be a yearly struggle).
Thanks!!!
Anon
May 7, 2009
As a Bucks fan, I can assure you that they won’t be good :(