Answering a Critic

Posted on November 26, 2006 by dberri

NOTE TO VISITORS FROM GLADWELL.COM: If you are looking for my response to Hollinger, please look at the post entitled John Hollinger Responds.

A few weeks ago I was asked at The Wages of Wins Journal to comment on how the Wins Produced metric differed from John Hollinger’s Player Efficiency Rating. As I noted, the question had been originally asked by Malcolm Gladwell as he prepared to write his review in The New Yorker. After I posted my answer a few days ago, Gladwell also decided to comment on my PERs critique at his blog.

Interestingly, the comments at Gladwell’s blog have not been directed to the difference between Hollinger’s methods and the metric we offer in The Wages of Wins. Rather, a group of people inspired by Dan Rosenbaum have taken the opportunity to attack our work.

As JC Bradbury — of Sabernomics fame — noted at Gladwell.com, these comments – posted at Gladwell.com and in other forums on the Internet – have been “nasty.” Given the source and tone of these comments, we have had a debate at WOW about whether we should respond. Stacey says no we shouldn’t. Marty says we should. This leaves me as the tiebreaker, and since I have nothing else to write about on Sunday (and I do try and post each day), let me try and offer a response without being “nasty.”

The Rosenbaum Critique:

At Gladwell.com Rosenbaum posted a summary of his critique:

Wins Produced is a metric that (a) professes to be regression-based, but is only marginally so, (b) misapplies its own logic when it derives (rather than estimates) its linear weights, (c) proposes explaining team wins as a barometer when ANY metric with a team adjustment (no matter how bizarre) would explain team wins just as well, and (d) only performs microscopically better than points per game at explaining how teams do when particular line-ups are on the floor.

Let me respond to each point.

To point (a) and (b) – yes, the Wins Produced model is based on regression analysis. In fact, the estimated weights are the result of several regressions. So this particular critique is simply factually incorrect. We did attempt to explain the intuition behind our results, which might give the impression you could have reached the same conclusions without the regressions. I do not think, though, that this is actually true.

To point (c) – Rosenbaum has fixated on the team adjustments as the secret to the Wins Produced model. I disagree that any model with a team adjustment could explain wins as well and offer the same ability to forecast future player and team performance. As we note in the book, the team adjustment we use do not dramatically impact the rankings of players derived from our model. So his story about the team adjustment is incorrect.

To point (d) – this statement refers to an approach Rosenbaum offered several months ago to test our model. What he did is regress the player rankings from his replication of Winston-Sagarin (described below) on player rankings derived from Wins Produced and other models. In other words, he evaluated how well each model predicted his model.

We are going to have to plead ignorance to this evaluation method. We tried to create a model that would take the statistics tracked for individual players and explain team wins, as well as future player and team performance. We did not know that the “Holy Grail” (a term often used in the APBRmetrics community) of player performance models was the one that explained Rosenbaum’s player rankings the best. If that is the standard, though, then I guess we are going to have to live with a model that comes up short.

The Winston-Sagarin Model

What exactly is the Winston-Sagarin model? It is difficult to describe accurately in a simple post, but I think a good description would be that the Winston-Sagarin model is a sophisticated version of plus-minus. Such an approach ignores all of the traditional player statistics and focuses solely on how a team does with and without a specific player. Having exchanged e-mails with Wayne Winston, I think such an approach is interesting, although perhaps not the best approach for our research.

Although Winston did send a few e-mails to me when our book first hit the market, the exchange was always congenial and we basically left the discussion acknowledging the pluses and minuses (sorry about that) of our two approaches. I would note that Winston and Rosenbaum are not working together. Winston, working with Jeff Sagarin, originated this approach which was purchased by the Dallas Mavericks (who are owned by one of Winston’s former students). Rosenbaum replicated this model later on and sold his approach to the Cleveland Cavaliers. At no point were all the details of this approach published in a refereed journal, a point that further hampers are ability to use this in our research.

The Wages of Wins Story, Again

It is important that I once again re-state the basic argument we make in The Wages of Wins. This quote I think captures this essential story:

“People in the game often claim to know instinctively how to measure intangibles, but salaries suggest otherwise. Teams pay for little more than the glory statistics (points, rebounds and, to a lesser extent, assists).

Although steals, blocks, shooting percentage and an ability to avoid turnovers are crucial to a team’s performance, players proficient in these aspects are rarely rewarded with bigger paychecks.”

Interestingly, the author of this statement was not one of the authors of The Wages of Wins. The author is Dan Rosenbaum, who published this statement in The New York Times in April of 2005.

The Importance of Player Evaluation in the NBA

Let me close by making a point I have made a few times in the last few days. Basketball is a game invented by James Naismith. As Sports Illustrated.com noted about ten days ago, the game Naismith developed was based on a 19th century game called “Duck on a Rock.” A century later people are spending the Thanksgiving weekend debating how to best measure a player’s performance in “Duck on a Rock.” When you put it this way it is clear this is not a very important issue.

We never claimed that the Wins Produced metric by itself was “important.” I do believe this measure is a good representation of what a player does on a basketball court. Consequently, with this representation in hand, we can investigate various topics in economics. And I think some of our work in economics can be thought of as “important” (at least to us).

Beyond what we do in economics, we can also tell some stories that might be of interest to fans of basketball. For the most part, that is what I try and offer in this forum. And as long as I think there is an audience for these stories, I will keep writing.

– DJ

Posted in: Basketball Stories

16 Responses “Answering a Critic” →

Jordan Lichty

November 26, 2006

In my mind one criteria of a good model would be given that you know the ten players who were on the court, which team was home, and how many minutes (or possessions) were played, how accurately you can predict the results of this period? Is team A or B going to be leading? and by how much? Adjusted plus-minus ratings (like Dan’s) give each player in the league a value that most accurately answers this question. It is not a rating system, it is a description of how well teams played with certain players on the court. I hope you know this Mr. Berri, if you do not, please educate yourself, if you do please do not misrepresent Dan’s system.

So, a relevant question, is how good is your system (WP) at predicting how teams do with a given player on the court, adjusted for the presence of the other players. It does barely better than points per game at this. This indicates that either the team adjustment is covering your tracks, or you are mis-assigning credit within the team in order to accurately predict team records.
silverbird

November 27, 2006

Regarding points (a) and (b):

As I understand it, the ‘Rosenbaum critique’ isn’t so much that the entire Wins Produced method is regression-free, but that the values of certain statistics are derived theoretically rather than empirically. Thus, the value of each additional point scored (.033) is determined by regressing wins on team offensive efficiency measures, and additional regression models are used for assists and blocks. However, it is unclear from your book whether the value of possession stats are similarly determined, or whether they are in fact logically derived under the assumption that if teams average approximately 1 pt per possession, then the absolute value of each possession employed and acquired can be equated to that of 1 pt. Thus, for offensive rebounds, you discuss how a simple, bivariate model erroneously yields a coefficient of -.2, but you never report the coefficient from the ‘corrected’, multivariate model. Or do you?
Harold Almonte

November 27, 2006

Jordan, plus-minus is a rating system (and a team adjust at the same time), because you compare players each other, maybe not a players´stats rating system, but a player influence rating system that try to tell us who are the best and worst players (making points differentials). You can´t predict and do single games bettings with players performance rating systems (maybe with plus-minus), I think you need to do some adjusts, and people say WOW already includes those adjusts. But I think WOW was not created to explain single game bettings, maybe just some team season predictions and players win (a particular economical view of a player contribution) value. I think WOW is not guilty of the boxscore stat “rebound” problem all ratings have, although they try to minimize it. I do believe this problem and the “shot creation” stuff is so complex that any stat method actually used can resolve it.
silverbird

November 27, 2006

also, for point (c), I thought it was the position adjustments – not the team adjustments – that have the APBR crowd so suspicious. And not because they impact the rankings, but because they might eggagerate the fit between cumulative Wins Produced and team Wins Produced. Do the adjustments (both team and position) have an equally negligible impact here as well?

as for point (d), I agree that Rosenbaum’s habit of regressing other metrics on his own is completely mystifying, especially since the entire apbr MO is about the rejection of subjective, intuition-based analysis, and this is basically just a quantitative variation on just that. like if i proposed testing an objective measure of beauty by seeing how well it predicted the girls i’ve slept with.
Jason

November 27, 2006

What isn’t at all clear to me about +/- systems is that they have predictive value for future events. If the +/- results from any player are influenced substantially by their teammate, thne it’s difficult to deal with new combinations. Are player +/- consistent over time? Are the isolated +/- of individual players constant enough such that the group +/- can be derived if the prior +/- of the individual players are known? If yes, then it’s a reasonable forecasting model. If not, it’s another way to describe what has happened, but it’s not predictive. These data I have not seen. If it has been published, I haven’t found it. If it hasn’t been published, it’s difficult to assess the merits. That the supporter posting it here (who has called it ‘his method’ yet appears to be hiding behind a pseudonym) seems to be insisting that not only is his method superior but also his is the way to evaluate methods makes me think that he’s blowing smoke and has no real interest in anything other than self promotion.

“Jordan,” you appear to be generating a rather impressive strawman in framing the debate around your (Dan’s) model. You appear to be saying that the real value of WP would be how well it predicts the outcome of short periods with a player (or group of players). “[H]ow good is your system (WP) at predicting how teams do with a given player on the court, adjusted for the presence of the other players” is *not* a relevant question for a method that never claimed to be evaluating this. It is the debate you wish to frame and seem insistent on interjecting into as many threads as you can, but it’s not so relevant as you claim and I suspect you should know better. As I understand WP never attempted to predict player combinations for short stretches of floor time.

As I understand it, WP is a player rating system that tries to compute the individual contributions of a player towards wins over a given period (generally a season). David claims that WP is reasonably consistent from season to season for players and thus provides a measure of prediction for how a team will do in future seasons with new combinations of players on the team. As such, claiming that it should be judged according to how well it can predict a short segment of performance (e.g. a game or a quarter) is as ridiculous as trying to measure the cargo capacity of a dump-truck by looking at what sort of reception the radio gets and your attempts to highjack the criteria for evaluating it as such strikes of mean-spirited bullying (and likely childish self promotion).

Dan’s use of regression to show how a model holds up against another unpublished method with regards to a task that the model was not designed to address nor did the authors claim it could address is disingenuous and indicates a poor understanding of the model. Some of this can be forgiven. I would like very much to be discussing a published work from a refereed journal as there are aspects that I cannot fully evaluate from what is in WOW and in the earlier published versions of the model. Still, there *is* a description of the method in WOW and there *are* prior published attempts at models based on a similar method of regression-derived values of component stats. Some is not so easy to forgive as regressing models designed to do different things against one’s own model without substantiating that said model is the proper benchmark indicates that knowing how to run a regression is not the same thing as knowing why to run a regression.
dberri

November 27, 2006

I think Jason, once again, has summarized the argument quite nicely.

With respect to Silverbird’s comments… all of the values we propose for the statistics were derived from regression analysis. I do not know how to state it any more clearly than that. The position adjustments do not impact the predictive power of the model. And I have already commented on the team adjustments.
Jordan Lichty

November 28, 2006

Adjusted plus-minus is slightly less consistent year to year than WP. PER is significantly more consistent year to year than WP (this doesn’t mean it’s better, although that argument has been made in this space in the past).

I am still confused why a model whose objective is to describe player ability and performance as accurately as possible would not be able to predict how a team does with a given player on the court adjusted for all other factors. This would seem to be the exact definition of player performance.

By refusing to use plus-minus to seperate the effects of players from each other you run the risk of misattributing credit of a team between the individuals within the team. Mr. Berri has clearly made a commitment to his system and does not want to take a closer look at his system’s predictive ability in a more precise way.
Jason

November 28, 2006

For a predictive model to be useful, you have to know what it is that you are trying to predict. “Jordan”, y0u seem pathologically hung up on your question being the only one of interest and seem to have some real anthipathy for asking a different question.

My understanding of WP is that it’s a measure that looks at the contributions over an extended period and tries to isolate the contributions from that individual, minimizing as much as possible factors of other players and situations. As a predictive model, it relies on WP being relatively constant over time and not changing significantly due to the presence of other players and that the factors measured covary with victories to high degree. I’m not weighing in on whether or not it can do this, but that’s my understanding of what it’s useful for. As a predictive model, it can be used to influence decisions about what players to acquire at what price and what sort of playing time particular players warrant to maximize returns *over a reasonably large sample.* It doesn’t try to maximize particular combinations on the floor in the short time.

If WP works, then the wins for the upcomming season should be predictable if you know what players have done in the past, what players are on a team and what sort of PT particular players are to get. If it works, it should predict changes in the number of wins a team experiences over a significant sample due when a particular player plays for the team or is lost to the team. If it works in this regard, it works and it addressed the question at hand. This doesn’t seem like it’s what you’re particularly interested in, which is fine, but you should realize what a model is addressing and realize that not everyone wants to address the same questions as you. It does not claim to be a model that should be used to manage floor combinations and perhaps the particular combinations are a product of some of the error in the model and perhaps why Dave suggests that coaching does have an impact on wins. That’s entirely plausible, but that’s more refinement or a different model entirely.

One of the beautiful thing about academic endeavours is that a published model can serve as the basis for other models to build upon. This is one of the reasons that I believe it is important to publish if one actually wants to make progress in a field. You are free to take his model and use it if you are inclined and see if you can modify it or aspects of it to answer your own questions. Insisting that *he* cares about the particular question you have (and it seems to go beyond suggestion when you repeat it in thread after thread like a parrot subjected to the tape loop of “adjusted plus-minus”) and that he is somehow obligated to make his model suitable to answer the questions *you* want to address though is rather ridiculous. To insist that your question is the only important one strikes me as rather narrow, though that does appear to be your position.

You are free to see flaws in the model that may be improved by reallocating the individual statistical components via some other method (e.g. weighting them according to +/- or weighing them according to an opportunity cost to other players) and you are free to use the modifications accordingly and to see whether or not this makes the model better either for its initial task or perhaps makes it more applicable to a wider range of questions. You are free to be interested in different questions that should be evaluated by a completely different model, but you should realize when you’re comparing apples to butane lighters and realize that they aren’t the same thing. You do not have any particular right to insist that Dave does the work for you because it’s the question you care about and he is under no obligation to see that your question is the appropriate one and modify his model because you insist it’s a better question.

I am skeptical that any model is able to address *all* questions and I never expect a model to come out initially that is perfect. It’s part of a process to add to what we know, not to arrive immediately at some known end of all knowledge. I don’t know if you’re aware of this or not, but you behave as if you expect everything to answer your questions, as if the measure of a model is how it lives up to answering your questions and have little or no appreciation for not just different approaches, but different questions. You have commented on peer review as if you are skeptical that the system has value and seem to imply that there’s some sort of dishonesty in the whole process, so perhaps this explains why you show little patience with differing opinions. (Are you perhaps a disgruntled academic?)

David has every right to try to address the questions he sees fit to address and you have ever right to care or not, but being a jerk about it says more about you than about his method or model. He is being responsible in trying to publish his views in an arena where others can comment and where others can use what he’s come up with. By saying that he is “refusing” to do somthing and “does not want to take a closer look at his systems’s predictive ability in a more precise way” you seem to suggest that you know his actual thoughts and that he’s being purposefully and willfully negligent and irresponsible. That’s rather rude and shortsighted of the processes that go into actual research, else you’re just adding it for bullying insulting effect and don’t know how to behave in public. I do not know why you chose to behave accordingly, but that’s entirely your issue, not Dave’s. I see no reason why he should pay you any notice at all if you’re so readily willing to question his integrity in your critiques.
Jordan Lichty

November 28, 2006

Just to make it clear, I am an undergraduate student, and although I know people who have done adjusted plus-minus work (my roommate for example) I have not done it myself. I’m not sure if my roommate is ready to publish his stuff yet. WP is remarkably flawed, and I hope that eventually Berri gets someone in the NBA to listen to him so I can watch the Titanic type debacle that ensues.

And “Jason”, lets get one last thing clear, if there was such thing as a “jerk store”, David J. Berri would be their all time best seller.
Jason

November 28, 2006

I’m curious how someone can conclude that WP is remarkably flawed without seemingly knowing what the model is supposed to do and without having seen the details of how it fits actual data. At the same time, it’s curious that you can conclude another model that you haven’t actually worked with is clearly better.

I’m also mildly curious as to what you hope to accomplish in posting, Jordan. I hope very much that you’re not a psych major, Jordan. It’s curious psychology to ask someone to do something like modify his model while simultaneously claiming that it’s flawed and that the author of the model is a jerk. It sounds like you have something personal dispute beyond this forum with David.
Harold Almonte

November 29, 2006

I think, if you have a player rating with stats translated to wins (not points), and you get into Coach`s minds to know how much time they will give to starters and bench, and know players`s will at each game, then you could have a win differential, a not so useful percent for bettings. With plus-minus, you need all former, and will give you “points” differential. Go and get rich. Yes, we are comparing apples and butane lights.
kjb

December 1, 2006

How undramatic is the team adjustment? I’ve read Dan’s critique, in which he inserted some absurdly radical values for blocked shots (for example), and once the team adjustment was added, it had no effect on the overall ratings.

All that said, one thing I do find interesting is the heat involved in this debate between some very smart people who agree on the conclusion — that NBA teams overvalue “glory stats” and undervalue other important factors that affect a team’s ability to win games.
Myles Brand

December 1, 2006

“Adjusted plus-minus is slightly less consistent year to year than WP. PER is significantly more consistent year to year than WP (this doesn’t mean it’s better, although that argument has been made in this space in the past).”

This debate seems to be about a comparison of the forecasting performance of different metrics. Why make all these vague assertions like “slightly less consistent” and “significantly more consistent”? Why not use an accepted forecasting measure like root mean square error to evaluate these metrics? Oh, I see – you’re an undergraduate. Of course that would be too much work.
Saleageni

December 22, 2007

Santa came home earlier than usual, when his wife, Jeeto’s lover was still in the apartment. She hid her lover in a closet, and served dinner. As they ate, something rustled in the closet.
‘What’s that? ‘ Santa husband asked.
‘Nothing, darling. Just jackets.’
After a while, they again heard some noise in the closet.
‘What the hell is that? ‘
‘I’m telling you, just jackets.’
A few minutes later, the noise sounded once more.
‘I’ll check it, ‘ Santa said. ‘You’ll regret it if it’s not jackets.’
Santa yanked the closet’s door open. Inside, he saw a man who held a pistol. Santa quietly closed the door, and said, ‘Indeed, jackets, darling.’
)))
doroti248736

August 26, 2008

XRumer is the premier automated link-building tool. Through the use of this tool you will see a significant increase in the number of unique visitors to your site, as well as see your site jump in the search engine result pages. The tool is popular among both novices and gurus because of both its flexibility, power, and effectiveness. XRumer is extremely reliable and its fully automated workflow makes link-building a breeze.
http://www.botmaster.ru/product16368/