Columbia Statistics Professor Asks a Few Questions

Posted on June 14, 2006 by


Andrew Gelman, a professor of statistics at Columbia University, has taken an interest in The Wages of Wins.  At his blog he posted a few questions concerning the methods we use in our book.  Below are his questions with my answers.  These were very good questions and hopefully my answers further clarify the methods we use to measure productivity in the NBA, and also, Malcolm Gladwell’s perspective on our work.

AG: 1. Reading Gladwell's article, I assume that Berri et al. are doing regression analysis, i.e., estimating player abilities as a linear combination of individual statistics. I have the same question that Bill James asked in the context of baseball statistics: why restrict to linear functions? A function of the form A*B/C (that's what James used in his runs created formula, or more fully, something like (A1 + A2 +…)*(B1 + B2 +…)/C) could make more sense.

DB: Before we discuss alternative approaches to what we have done we first need to establish what we exactly did. I would emphasize that the book does not present any math. Still, I think the models are described in enough detail that one can follow what we did — and did not do.For now, let me offer a brief description. It is important to note at the onset the motivation behind the model. We are economists interested in using the productivity data generated by the NBA to answer questions we think are interesting. To do this research, one first needs to make sense of the data. The NBA tracks a collection of statistics to measure player performance, but the statistics are not easily understood. For example, points, rebounds, and assists all seem important, but what is each stats relative value? To answer this question, it makes sense to use the tool economists most often employ, regression analysis. But how one builds the regression is not entirely straightforward.In 1999 I published a paper which presented an early effort. The basic model employed in The Wages of Wins improves upon this effort on a number of dimensions. It is a simpler approach, it is more accurate, and I think, more theoretically sound. The specific model described in the book — and I would emphasize again the word described since there is no math in our book — begins with a very simple regression. Specifically, wins are regressed on both offensive and defensive efficiency — where offensive efficiency is defined as points divided by possessions employed and defensive efficiency is defined as points surrendered divided by possessions acquired. Now that one regression is just the beginning of the story. Assists, blocked shots, and personal fouls are not part of any element of offensive or defensive efficiency. That does not mean, though, that these three factors don't matter. To get the value of these statistics, though, one needs to craft additional regressions. Of these, the regression designed to determine the impact of assists was easily the most difficult to construct.I would emphasize that one approach often taken in the study of NBA performance is to attempt to logically derive the value of the statistics. We do not take this approach, but instead rely entirely on regression analysis. In other words, the relative value of each statistic is determined by the regressions and the data. The trick, of course, is defining the regressions correctly (which I think we do).One last observation…In an end note in the book it is noted that the results one derives from the model based on offensive and defensive efficiency can also be derived from the model presented in the 1999 paper, if one makes a few modifications to the earlier work. That you get the same results with a different formulation suggests that the findings are fairly robust.

AG: 2. Have Berri et al. looked at the plus-minus statistic, which is "the difference in how the team plays with the player on court versus performance with the player off court"? When I started reading Gladwell's article, I thought he was going to talk about the plus-minus statistic, actually.

DB: We talk about plus-minus in the book briefly. The Wins Produced model is not a plus-minus approach, at least not in the way that term has been defined with respect to the NBA. The Wins Produced model utilizes the standard statistics the NBA tracks for its players and we find that these statistics do allow one to measure each player's contributions to team wins. I plan on commenting on plus-minus in more detail later on, but for now, I would say I think both plus-minus and the Wins Produced model are valid approaches and often (although not always) come to similar conclusions.

AG: 3. I'm concerned about Gladwell's causal interpretation of regression coefficients. I don't know what was in the analysis of all-star voting, but if you run a regression including points scored and also rebounds, turnovers, etc., then the coefficient for "points scored" is implicitly comparing two players with different points scored but identical numbers of rebounds, assists, etc.–i.e., "holding all else constant." But that is not the same as answering the what happens "if a rookie increases his scoring by ten per cent." If a rookie increases his scoring by 10%, I'd guess he'd get more playing time (maybe I'm wrong on this, I'm just guessing here), thus more opportunities for rebounds, steals, etc. Just to be clear here: I'm not knocking the descriptive regression. In particular, you can play with it to model what might happen if players are switched in an out of teams (as long as you think carefully about issues such as playing time, I suppose). I'm just sensitive to mistakenly-causal interpretations of regression coefficients–the idea that you can change one variable while holding all else constant.

DB: If you have two rookies, equal in every way except one has 10% more points, then the one with more points will have 23% more voting points. That's how I read the coefficient. I think the key result, and this is found in studies of different decisions, is that how many points a player scores dominates decision making in the NBA. Studies of what determines a players salary, factors that cause a player to be cut from a team, and the coaches' voting for the All-Rookie team all indicate that points scored is the most important factor. Rebounds, turnovers, steals, and shooting efficiency determine wins, but do not have as much impact on player evaluation.

AG: 4. Gladwell's article is subtitled, "When it comes to athletic prowess, don't believe your eyes," and he writes, "We see Allen Iverson, over and over again, charge toward the basket, twist and turning and writhing through a thicket of arms and legs of much taller and heavier men–and all we learn is to appreciate twisting and turning and writhing. We become dance critics, blind to Iverson's dismal shooting percentage and his excessive turnovers, blind to the reality that the Philadelphia 76ers would be better off without him." But it seems here that the problem is not that people are ignoring the statistics, but that they're using the wrong (or overly simplified) statistics. After all, he points out in the first paragraph of his article that Iverson has led the league in scoring and steals, and his team has done well. Even if he didn't look cool flying to the basket, Iverson might have gotten recognition from these statistics, right? This is a point that Bill James made (with regard to batting average in Fenway Park, ERA in Dodger Stadium, etc.): people can over interpret statistics in isolation.

DB: Let me try and shed additional light on what Gladwell was saying by taking you through a story from Gladwell's work. There is a great story in Blink (Gladwell's latest book) where he talks about medical doctors trying to determine if someone is having a heart attack the moment the person arrives in the emergency room with chest pains. If the doctor said yes and he was wrong, the patient would tie up staff and space unnecessarily. If the doctor said no and he was wrong, the patient could be sent home and be in very serious trouble. According to Gladwell, given the importance of the decision, doctors would look at everything. Unfortunately, much of what they considered was not important to the decision. A cardiologist named Lee Goldman developed a simple algorithm which found that only three factors truly mattered. And furthermore — and this is the interesting results — doctors who looked at everything could not come close to the accuracy of the simple algorithm.

After reading his piece in the New Yorker, I think Gladwell looks at our algorithm the same way as he sees the algorithm designed to predict heart attacks. People in the NBA try and look at everything in evaluating players. So they watch every move the player makes on the court trying to figure out who is good and who is bad. Watching, though, is biased towards the dramatic, which often is scoring. Although we show that scoring is of course important, scoring itself is not always evaluated correctly and often the non-scoring actions are just as important. Specifically, with respect to scoring the issue is not really how much you score, but how you score. If you score inefficiently, you might look good on the court, but you are not helping your team win very much. Furthermore, non-scoring factors like turnovers, steals, and rebounds, which may not stand out when you just watch a player, really impact outcomes. In the end a player with high scoring totals, and in Gladwell's words, good dance moves, can easily lead to an incorrect evaluation if all you do is watch the player. Perhaps decision-makers would be better off first looking at the numbers, and then watching the player to be sure that the numbers the player posted in the past are the numbers you will likely see in the future. In other words, start with the numbers which tell you how productive the player has been. Then watch the players to see if you can figure out why that productivity is happening.