More on the Price-Wolfers Study

Posted on May 2, 2007 by dberri

Jonathan Weiler – of the Starting Five, and formerly of Sports Media Review – asked me to comment on a column by John Hollinger (insider blog). Hollinger looked at the Price-Wolfers study – entitled “Racial Discrimination Among NBA Referees” – and noted that the size of the effect, according to Hollinger, seems small.

As I told Weiler, this reaction misses the point. In an e-mail to Weiler, I noted the following (which Weiler posted at The Starting Five):

Whether or not this costs a team a game or not is, in my thinking, irrelevant. This is not a paper about basketball. It is a paper about how people judge people who are different (in this case, of another race). And this paper shows evidence that people are judged differently based on race. Given the circumstance, that is an impressive finding. NBA referees receive a great deal training. Their decisions are consistently reviewed. If they were racists, you would think they would not choose this line of employment. Given all this, Price-Wolfers still find an effect. So I think that is the important story. Even in a situation where you would think implicit bias wouldn’t be there, it is still there.

Perhaps we can think of it this way. dwil (at The Starting Five) noted the issue of “Driving While Black.” I guess one could argue that this is not a real problem. The incident he described didn’t impact his lifetime earnings. He still survived. So one could argue, DWB doesn’t have much “economic significance.” But the fact that it happens should be a concern. Whether it has an “economic” impact is not the point.

Readers of The Wages of Wins know that we tend to be all about “economic significance.” But on this issue, I think the statistical significance is the important point. Given how NBA referees are evaluated, one might not expect to see any evidence of this bias. Yet, there it is in the data. And that tells us something bigger about how human beings evaluate information and make decisions.

– DJ

Posted in: Basketball Stories

25 Responses “More on the Price-Wolfers Study” →

Fred Flintstone

May 2, 2007

From the paper:

“… Table 3 is instructive, showing that the rate at which fouls are earned by black players is largely invariant to the racial composition of the refereeing crew. By contrast the rate at which fouls are earned by white players responds quite strongly to referee race. Further regression-based tests yield a similar pattern (see in particular the coefficient on %white referees in Table 4), suggesting that the impact of the biases we document is on white players, who are either favored by white referees, or disfavored by black referees. ”

From Yahoo:

“… According to an upcoming paper by a University of Pennsylvania professor and a Cornell graduate student, white referees called fouls against black players at a higher rate than they did against white players.

Their study also found that black officials called fouls on white players more frequently than they did against blacks, but the disparity wasn’t as great…”

The media is portraying this as white refs discriminating against black players, but the paper seems to say that black and white refs don’t treat black players much differently, but they treat white players differently.
dwil

May 2, 2007

It is incorrect to think that police stoppages due to DWB do not have a potential (negative) economic significance on the DWB victim.

Should the event occur on the way to work it will have a detrimental effect on wages earned and perhaps employment. Should the event occur on the way home when one is “taking work home with them” (as in the case of a black judge on his way home to San Jose, Ca.) there can be a palpable negative effect on one’s ability to finish the work load brought home.

Additionally, there is the psychological factor of gathering one’s self in the aftermath of this sort of incident that has never been addressed relative to its negative economic impact; the loss of work time, quality of work after such an incident, etc.

——————–

Briefly on the study and its meaning:

The study reports that black and white referees treat opposite race players with a bias; white referees’ bias is more pronounced than black refs.

The conclusion is that through the set of variables that represents NBA play relative to fouls called, there is statistical evidence of subconscious (at least) bias in the NBA workplace that can be transposed to the workplace, in general.

(I am constantly surprised that anyone watching an NBA game with their own inherent set of biases toward the games and players they view can think bias does not exist between referees and players based on color.)
Steve Walters

May 2, 2007

Hmm. Very interesting. A little over a week ago this WoW guest blogger posted something called “It Ain’t Necessarily So”; it was about accepting empirical findings too quickly, the peer review process, and scholarly research. Now along comes a yet-to-be-published study that is getting INCREDIBLY big play. What are we to make of it?

The first point to stress is that it is still early days in the process of peer-reviewing this piece of work. It’s nice that smart guys at Harvard, Yale (Law), and Cal State had a chance to read the paper before Schwartz ran with the story. But this work is going to be chewed over much more before it makes it into print in a reputable journal. Accordingly, we should be careful about reaching any conclusions about what it “shows” about human behavior.

One valuable guide here should be a classic article by one of my old profs at UCLA, Ed Leamer; it’s titled “Let’s Take the Con Out of Econometrics.” Leamer used a nice big data set on capital punishment and its effects on homicide rates, and showed that a variety of models could pass conventional tests of statistical significance [note to Dave: I’d disagree, therefore, with your statement that “on this issue, I think the statistical significance is the important point”]. If you believed that capital punishment was a deterrent, you could construct and estimate a model that confirmed that prior belief with 95% (or greater) confidence; if you believed the opposite, another model specification could, with the same data set, also confirm your priors.

Leamer’s point was not to say, as many scoffers do, that “you can prove anything you want with statistics.” It was the opposite–that it’s really HARD to prove things with statistics, and you have to be very careful and very thorough about reporting things like, e.g., how sensitive your conclusions are to model specification. He made some very pointed recommendations about how to do econometrics and report your findings.

In coming weeks and months, given the prominence of this study, people will be going over it really aggressively (as they should) to see exactly how sensitive the “finding” is; i.e., if the paper has warts, they’ll be found.

In the meantime, reserve judgment about the real ability of NBA refs–and people in general–to be color blind. Dave’s right that this is an important question. Too important to consider settled just yet.
dwil

May 2, 2007

Steve-
It is important to fully read the co-authored study before trotting out the well-worn, all statistical analyses can be skewed to match a given hypothosis.

No one is color-blind – No One. That is tantamount to claiming that a human can be objective. Again, I am constantly surprised that people are surprised and/or taken aback or are wary of the outcome of this study.

Then again, the fear of admission that race does play a part in one’s everyday being is a tough pill to swallow when Western society is geared to surppressing this fact.
bob

May 2, 2007

mr. berry could you give us a basic equation or methodology that price and wolfers used? just something simple like x number of fouls per game over y number of minutes played showed that white referees called 2% more fouls on black players.

i don’t want to read 40+ pages or whatever of the study.

that must have been a lot of work, going over calls by looking at video recordings.
Steve Walters

May 3, 2007

Re: dwil’s remark that “No one is color-blind – No One. That is tantamount to claiming that a human can be objective. Again, I am constantly surprised that people are surprised and/or taken aback or are wary of the outcome of this study.”

This is what is known as having “strong priors.” When you start a research endeavor with such priors, there is considerable danger that you will stop investigating immediately after finding the result that validates those priors.

You really do need to read Leamer’s paper before you wave it off as “well-worn,” dwil. You’ll learn a lot about model specification, significance testing, and the scientific method.
dwil

May 3, 2007

Steve–
Actually, “human as subjective” is a basic tenet of postmodern anthropological theory. Having “strong priors (an antiquated Western anthro-thought if there ever was one) is simply admitting biases before formulating a hypothesis; it is of primary significance that a tester first examine his/her biases before entering into any hypothesis-making endeavor.

Without that requisite look inward scientific method is nothing.

(Also: Please refrain from further academic challenges. Though our fields of expertise or study may differ and though I left my chosen field behind, postmodern anthropological study – and in my case, mesoamerican archaeology focus – has, for some 45 years viewed statistical anlayses with a wary, and sometimes jaundiced eye. The fact that the pendulum has, in the last 10 years or so, swung back toward a middle ground as far as the “absolute” worth of such analyses, is more a reflection of an overall societal backlash at the premise of self-examination than it is the worth, or non-worth, of such analyses.)
Guy

May 3, 2007

“As I told Weiler, this reaction misses the point…..Whether or not this costs a team a game or not is, in my thinking, irrelevant. This is not a paper about basketball. ”

This is not correct. The paper is about BOTH human behavior AND basketball. The authors made explicit claims about the impact this bias has on actual game outcomes, saying that it involves several games per season. In fact, based on the foul results, replacing a black player with a white player would result in something like 9 fewer fouls per season, a tiny impact. It’s completely reasonable for Hollinger to point out that the authors have hugely exaggerated the actual impact.

In fact, as a reader who has in some sense vouched for the study, shouldn’t it be your role to either challenge Hollinger’s analysis or acknowledge the errors and urge the authors to correct them?

As for the human behavior aspect of the analysis, I suppose it’s a glass-half-full-half-empty issue to some extent. Let’s say they are right that an all-white ref squad will call 3% fewer fouls on white players than they should (the main finding of bias that I can see). Does that show “there’s still racism in America,” or that “it’s amazing how much we can reduce bias through good training and monitoring behavior?” Both are true, I think. But if studies come out next week showing that child poverty among blacks is “still 103%” of the white rate, or that the net worth of blacks is now “just 97%” of whites’, wouldn’t we consider that cause for celebration? I would. (And I say this as a white liberal who is totally willing to believe evidence of white racism, and believes there is still far more racism in our society than most white folks realize/acknowledge.)
Scott

May 3, 2007

I have only had a chance to skim the Price-Wolfers paper, but I would like to see what happens if they run the analysis while permuting the labels of race.

For example what is the effect size/significance if you compared people born in even numbered months vs odd number months instead of black vs non-black…and more to my point would be a process of doing hundreds if not thousands of random permutations of the race labels and see how they compare to the result seen in the paper when we use the correct labels.

After that is done it would be much easier to judge with confidence whether the Price-Wolfers results are due more to data mining and finding a result that is easy to write about (one wouldn’t think to write a paper about even month ref/player bias vs odd-month…but race is an interesting and possibly important story). So I’m still on the fence as to whether this analysis is both statistically significant and meaningful.
Guy

May 3, 2007

Dave: A technical question on the author’s win score analysis (table 5). I can’t tell, but it appears that by running separate regressions for each performance variable, they are in a sense double (or triple) counting the referee impact on performance in various ways. For example, an extra offensive foul called by a white ref would also register as a turnover, and also reduce the points scored by that player. Is it correct to sum the impact of all these variables using Win Score, as they do here?
Johnny Hatchett

May 3, 2007

It’s quite fun to listen to critics of this study trot out all sorts of goofy explanations, namely that there are more blacks in the league so they’ll (of course!) draw more fouls … (even though the study finds that whites have a higher foul rate!).

& the most baffling explanation yet is John Hollinger’s claim that teams with more black players had/have a slightly lower win rate than do those with fewer blacks because of the emergence of European players in the league. So, his story goes, teams that scout poorly and draft or sign fewer (skilled) European players are at a disadvantage not because of ref bias but because they put less talented (more black!) teams on the floor. (By the way, as near as I can tell, there are black European ballers. I’m not certain how that fits into Hollinger’s critique.)

Anyway, these explanations tend to ignore the study’s main contribution – that the chances of winning with an all black team decrease as the number of white referees increase. The latter point of that statement … “as the number of white referees increases” is left out of most of these critiques, including Hollinger’s.

That said, I think that Steve Walter’s caution, a tentative, “yes, but…” is appropriate. As far as I can tell, Walter’s isn’t denying that bias exists, isn’t denying that ref performance is influenced by “strong priors” but that we remain curious about the models used in this study and we remain open to alternative explanations and, even, alternative models of what and where bias is in the NBA.

To kind of merge DWil’s comments withWalter’s, those strong priors & all the biases we bring to our everyday experiences – occupations or whatever – are important. It seems to me, then, that the haggling over whether bias is there / bias isn’t there (an argument that the NBA has more interest in winning than do a few academics) is less important than working to figure out what kind of bias is there, how it looks, operates, and influences outcomes, chances, etc.

peace love gap
Johnny Hatchett
Guy

May 3, 2007

“So, his story goes, teams that scout poorly and draft or sign fewer (skilled) European players are at a disadvantage not because of ref bias but because they put less talented (more black!) teams on the floor. ”

Did you read the study? It clearly shows that teams with a higher proportion of Black players (in minutes) are less likely to win, whether there are 1, 2 or 3 white refs (only when all 3 refs are black is having more black players an advantage). And the authors concede that this pattern is itself not evidence of bias.

A separate impact from # of white refs would suggest bias, of course. But if you look closely at figure 1, and compare the 1-, 2- and 3-white ref patterns, it appears the refs’ race has only a tiny impact, if any at all, on the point spread between the teams.
tangotiger

May 3, 2007

I’m late to the party, but I’ll add my two cents anyway.

Since the days of collusion, franchise valuations have risen at around 10% per year. As has revenue. As has salary.

The S&P has risen 7-fold over the last 20 years (which includes dividends reinvested). Wanna guess the compounded rate of return? That’s right, 10%.

The Forbes estimates for valuations are just that, estimates. But that’s the same like a house down the street. You don’t know exactly how much it’s worth, but you do your comparative market analysis, and you know how much it’s worth. Once a team is sold in some sport, you use that as an important indicator to determine the value of all the teams. Forbes constantly recalculates its valuation based on known sale prices, and they estimate complex transactions (like the Redsox), etc.

The present value of future earnings is likely close to zero, meaning it’s hard to believe that they capital appreciation is what it is. Since sports franchises have limited recurring operating income, owning a sports franchise is like owning a Picasso.

***

In hockey, 50% of the best players are outside of North America. 50% of 1st round picks are outside North America (though this year and in the future, additional rules were put in place that may change this behaviour). However, 33% of all players are from outside North America.

What does this mean? Well, teams stock up their 3rd and 4th lines with North Americans, since it’s not cost effective to fill out your roster with non-North Americans.

That means that the AVERAGE European in the NHL is better than the AVERAGE North American in the NHL.

Even think about the very first years that Russians came over. Who do you think made the NHL 20 years ago? Markov, Larianov, Krutov, Fetisov, Kasatonov, (and a couple of others) followed by Federov, Bure, Mogilny. The average Russian, in the NHL, was undoubtedly better than the average North American.

So, when we look at the NBA or any other league, it’s almost a given that the average European is better than the average North American, playing in the NBA. But, that’s simply an artifact of the economic rules, that makes it more cost effective to fill out your bench with North Americans.

If you want to argue only among the starting players, then the average European playing in the NBA is probably the equal of the average North American, and the average black is equal to the average white (again, limited to the starting players).

***

As for the study itself, I posted my comments on my blog.

One thing that I saw is that the mixed-race crews had no bias. It was the segregated ones.

As well, when we talk about “significance”, we are talking about “statistical significance”. This is far different from the comment term of “significant”, as in “easily observable in its impact”. This story shows how there’s a huge gap in terms of reporting statistical papers. Getting 3% more calls may be “statistically significant”, but it’s hardly “significant”.

Finally, I don’t like the way the authors extrapolated the effect to an all-white team.

I do want to add a couple of plausible things that came up on other boards:
1 – You have a limited number of referees, meaning you could have a “bad egg” situation. It may not so much be a black/white thing, as opposed to simply a strong signal from one or two guys that are averaged out across the group. So, the significance isn’t necessarily that you have a black/white thing, but rather, that you have *something*.

2 – How about the age of the referees? There’s more Archie Bunkers in the world at age 60 than age 30.
tangotiger

May 3, 2007

(oops, I posted two posts in one. If you can delete the part of the post from “Picasso” to upwards, I appreciate it.)
kevinbroom

May 3, 2007

As I wrote in my blog, I’m less interested in whether the effect (if there is one) is big enough to change the outcome of games. The issue of whether refs are affected by racial bias also isn’t all that interesting to me.

What IS interesting is what it says about the rest of us. The effect may be small, but then so is virtually all racial bias. Few blacks are getting called the n-word by whites anymore, but there are myriad subtle things that communicate the “blacks are less than whites” message. Stuff like the woman who shifts her purse away from the black man who got on the elevator next to her. Or the cashier who drops a black person’s change on the counter, but hands it to a white person. Yes, those acts are small and seemingly insignificant. But they can accumulate into a pattern of subtle, unconcious and (worse) unexamined bias that can have significant effects.
bob

May 3, 2007

guy, so you’re saying that if a black player and white player both played the average number of minutes played in the nba, then the black player will be called for only 9 more fouls from white referees according to the study?
Guy

May 4, 2007

Bob: Hollinger estimates the impact at 6-7 additional fouls for the black player, if he played a lot of minutes, so maybe 5 for someone playing an average amount.

Dave: one suggestion I hope you will pass on to the authors is to include tables that show us team win% results broken down by type of ref crew, so we can see how high- and low-black player% teams compare. For example, if we take the top quintile of teams in terms of black player playing time, what is their win% in front of 1 white ref, 2 white refs, etc. And same for player performance metrics (FG%, rebounds, etc.), shown separately for white and black players.

The authors do this for fouls (table 3), so we can see the uncontrolled results and then compare it to their regression results. There’s not much difference, which makes sense given random assignment of ref crews. They should give us the same data for the other two components of their analysis: player performance and team performance. This will presumably give them powerful supporting evidence (and results that are easier to explain to non-academics).
tangotiger

May 4, 2007

I agree with Guy.
AOM

May 4, 2007

Not to discount this, but can you post some thoughts/reactions on the Golden State – Dallas series.
tangotiger

May 4, 2007

Message cross-posted to my blog:

If everything is random, then you don’t need the opposing team’s racial makeup. But, clearly the authors have the data, so there’s no reason not to use it. Furthermore, it reduces noise, without costing you anything. So, not only is there a reason not to use it, there’s a reason that it should be used.

Then, it can be easily explained that when the racial makeup (weighting by playing time, of course, of the game in question) is greatly balanced toward an all-black v semi-black team, the team win% is 45% with an all-black crew and 49% with an all-white crew, that’s extremely telling.

This is the kind of simple message that everyone will understand.
tangotiger

May 4, 2007

Incredibly, I meant to post a triple-negative here, so here’s the corrected blurb, which probably makes this more unreadable than it should:
“not only is there not a reason not to use it…”
Steve Walters

May 5, 2007

Yeah, Tango, I was wondering if that was a bobbled triple negative back there. Tough play, that; probably no E will be charged…

As the Price-Wolfers paper wends its way through the refereeing process, I’m hopeful the authors include some “fragility analysis.” The coefficients they report, while statsistically significant, don’t seem very stable. And they have a long, long list of control variables and interaction effects; including or deducting some might affect their results. While they don’t have to show us all possible specifications, they should report the range of coefficients they obtained as they performed their specification search.
tangotiger

May 5, 2007

Steve, perhaps you can explain some of the background process for us with papers.

First off, I think having the Wisdom of the Crowd (WotC) at work here is great. There’s a huge think tank of NBA analysts (Oliver, Kupfer, Hollinger, etc) who can propose additional parameters to consider. While we know there’s something at play here, we’re not sure it’s necessarily black/white, if important parameters are missing.

Anyway, in order to validate and expand their work, do the authors of these papers release the data for our consumption? Ed Kupfer for example provided on his site the data of refs by game (for a 13 year period), and I was able to confirm that the refs are not part of fixed crews. Missing in his data is the all-important race data.

In this WotC world of papers, do you think the data should be released now? Are the authors afraid of having their work usurped if someone else posts a different using the same data?

I imagine with peer review, there’s a code of conduct that leaves the authors with first dibs, until publication?

Thanks…
Steve Walters

May 6, 2007

Tango: The two things you worry about with evaluating academic research are (a) fairness and (b) rigor.

To increase the likelihood of (a), we have double-blind peer review. When I evaluate your work, I shouldn’t know it’s you–so I have no reason to be intimidated, or snobby, or anything; I should just judge it on its merits, not your rep. And that rep, or your network of friends (or enemies, for that matter), shouldn’t affect how your stuff is handled. (Some of the biggest, most embarrassing blunders in academic publishing have been a result of an editor greasing the skids for friends or colleagues.)

On (b), editors first rely on a stable of trusted referees, who (usually for no pay whatever) go through submissions and write detailed reviews. Most submissions go to 2 or 3 refs, and most get rejected; even so, the process improves the product. Sometimes refs will suggest revisions that, if made, will lead to acceptance.

Many if not most journals these days also require researchers to make their data available as a condition of publication. The refs attest that the paper passes a minimum threshold of quality; then the broader profession gets its hooks in, and it’s not uncommon for folks to publish critical comments, challenging the findings in whole or in part. Or, “replicators” take on the same subject, but using different data or methods. I’ve got an article on submission right now that’s a replication and update of a paper that appeared 27 years ago; we’re seeing if the findings hold up (using better methods!), and, even if they do, whether market participants (it’s about baseball free agency) learn from prior mistakes.

I’d agree that there’s wisdom in crowds. But there’s also a lot of noise (don’t you find yourself wasting time answering critics who don’t have standing to criticize? Isn’t it hard to get attention among the cacaphony when there are NO barriers to entry?). So the academic publishing process aims to limit the extraneous noise, but then, once quality is initially assayed, the crowd can “have at it,” and is usually assured access to the tools they need to do the job.
tangotiger

May 8, 2007

Thanks for that detailed explanation. For this study in question then, we’ll have to wait for the study to be published before the Crowd gets its teeth into it.

As for your last paragraph, the answer is an emphatic no. I receive plenty of comments on my blog and email, and I answer virtually every one. I really don’t judge what I receive based on “standing” (someone even asked me how to calculate ERA). My basic position is that I’m on a journey to learn, and I want to take people along with me. If that means that not all of my time is used effectively, that’s ok. But, I’d rather be inefficient with my time, and listen to all, than be efficient with my time, and only listen to those that have standing.

What ends up happening is that my blog itself acts as a defacto filter, just as the WoW blog does as well. The only people who post here are those that care to do so, and not those who post on a whim.