A Discussion of Science and Journalism

Posted on November 4, 2009 by


Yesterday I was asked about the Book of Basketball by Bill Simmons.  Regular readers of the Wages of Wins Journal will know that Simmons and I do not always agree.  Still, I did buy his book (haven’t had a chance to read it yet, though); and the question yesterday reminded me of an essay I had read on line in September. 

Unfortunately I couldn’t remember where I read this essay.  Luckily, Tim posted a link in the comments.  The following is from a blog titled “No Time To Read.”  It’s not clear from the blog who wrote this, and it doesn’t look like the blog has been updated since this was posted last September.  Nevertheless, I found this to be an excellent discussion of how the approach of scientists (I play the scientist in this discussion) differs from the approach of journalists (a role played by Joe Posnanski and Bill Simmons).  Hopefully everyone find this as interesting as I did.  And if you are the author of this work, please let us know.

No Time to Read (September 7, 2009)

I got to thinking about a difference between writers and commenters. One crucial difference is skill, naturally. However, I am thinking about some of the emails sportswriters such as Joe Posnanski, Dave Berri, Peter King, and Bill Simmons get. The best correspondence they publish tends to follow up on a thought, often giving an example about some tragedy the pundits had written about.

Considering this small and selective sample, I concluded that the main difference beween lay writers and the professional is context. Professionals establish context in which lay writers tend to work. That is, professional writers organize examples by their themes, while the lay writers (i.e. commenters) write single examples. This leads, firstly, to the difference in length. The commenters provide an example or a vignette that refers to the established idea. I suppose one-graf bloggers tend to fall into this category, no matter how good the actual prose is. The professional writer would have developed the context for his main argument before using examples to emphasize his own point. While longer is not always better, of course developing ideas take up space. This leads to longer pieces. It takes a bit of skill to compress ideas into a paragraph (try reading abstracts from science papers and see if it makes sense to someone outside of the field you work in. The good ones will make sense to someone who doesn’t work in your field.)

For now, I want to focus on the difference between a professional writer’s and a scientist’s mode of writing. At the level of sports pundits and analysis, there are the Joe Posnanskis and Bill Simmons of the world, and there are popularizer of research, like Dave Berri. All three are wonderful writers for their fields, but I would rather read Posnanski and Simmons before Berri, if considering only the literary aspects of their writing. Nevertheless, the main difference between the two is not in the scope but in the details that provide context for their pieces.

Recently, Posnanski wrote about his desire to adopt a baseball stat for his blog. He hinted at reasons for disliking OPS (simply, on-base percentage + slugging avg), and presented an argument for his “hitting average.” That’s all fine and good; readers of Dave Berri’s blog and book Wages of Wins will note that finding Berri in fact tries to find statistical measures of athlete “productivity” that relates to point production and thus, wins. Now, here’s the difference between Posnanski’s and Berri’s approaches. It certainly isn’t scope, since both are ostensibly doing the same thing. However, Berri’s approach is scientifically sound where Posnanski’s isn’t, despite Posnanski dealing with objective mathemetical measures.

A caveat: I am not saying that Posnanski’s stat or approach is wrong. Posnanski has made every attempt to say that what he is doing is more for aesthetic reasons and than to find THE stat, the single model that explains MOST aspects of baseball. Again, I am merely considering their styles of presentation, which are partially limited by the scope and how they approach the details.

In any case, Posnanski details how stat-geek readers of his blog, led by Tom Tango, generated a new stat called “linear weights ratio.” Posnanski tests this stat out by checking the rankings of a number of players; of course, there is some alignment with more traditional advanced baseball stats. He also presents the formula for his hitting average, for readers to play with. Again, there’s nothing intrinsically wrong with this; Posnanski isn’t doing econometrics. If anything, he is doing a great service by getting various reads to think mathematically. But Posnanski doesn’t provide a context to evaluate that new metric. Mainly, he doesn’t compare this metric to established metrics. In contrast, Berri’s approach is, in essence scientific, since his arguments are constrained by the context of describing and comparing these metrics.

This context is the difference between a layman’s approach and a scientist’s approach. Berri did much the same thing as Posnanski suggests in researching basketball players’ productivity. Berri looked at the linear regression of things like points score, shooting percentage, rebounds, turnovers, and so forth, on the amount of points scored. Based on these stats and the weights identified from the regression analysis, he generated a linear model. He placed this stat, Wins Produced, into context by first applying it to all NBA players through all years for which stats are available, he compared its correlation to points scored for and against to existing NBA statistical models, and he generated points of comparisons for each NBA player to the “mean” player at his position. In this way, he is able to actually determine that his measure has a higher correlation to the efficiency differential (points scored – points given up) than the other stats. He was also able to identify the main difference between his and other models, in that the other models tend to use points scored as opposed to the ratio of points scored and shots attempted.

The weights Berri used are not arbitrary in the sense that he simply pulled them out in order to emphasize some difference between NBA players that he thought should exist. Naturally, he might have removed some measures from his model because the weight isn’t high enough, but that’s a different matter from “fine tuning” the weight. Regardless, the most important point is that generally, he made a model from the aggregates that significantly correlated with efficiency differential before applying the model to the players. In this way, he has created rankings of NBA player productivity that has generated some arguments in the sport pundit community (for an example, see here, here, here and here.)
While the particulars aren’t important, the conflict is illustrative of a scientific versus a more laid-back  (although it could still be rigorous) analytical approach. For Berri, he simply sets up a model, cranks out the numbers, and then organizes his views of the players by examining the stats. For the laid-back approach, one sees if the stat is properly associated with a player. Again, this latter approach is fine, within its domain. Sports writers are not scientists, nor do they control the purse strings for a sports team. Even within a sports franchise, one does not need to rely on statistics, if they so desire. As Berri notes, the stats comprise merely one component of NBA evaluation. It’s a shortcut to organizing player’s performance. In no case does it substitute ways of identifying why certain players are not rebounding, or generating enough assists, or reducing their turnovers.

In the Posnanski example, he presented a stat which is correlated with runs scored in baseball. He didn’t say whether this correlation is necessarily higher than other measures (such as OPS). This is a subtle point that is often missed. If the correlations between both measures are similar, than there really is no difference. Of course, there may be a lot more numbers involved in one over the other, but most scientists would simpler choose with one with fewer values. It’s probably also easier to calculate. Using the other numbers do not give you added value. I have seen people talk about complex stats as if complexity (lots of math squigglies) is somehow better or is more correct. That is not the case.

So, how does this relate to writing styles? Well, if the laymen write in examples, and professional writers extract themes and trends from examples, then scientists try to extract ideas/themes/trends that apply to all examples (well, ideally, all, but in generally they try to capture data from a meaningful sample that is indicative of the whole population.)

However, there is a limitation in the presentation of a scientific finding: the conclusions are bound by the premise of the hypothesis and the methods and measures that are used. Thus, in Berri’s case, he presents arguments for NBA player’s productivity in terms of his measure (or other measures, if he’s interested in comparing the different metrics.) But he is constrained by that, less so in his blog, but certainly in his peer-reviewed papers. As a matter of fact, Berri’s blog tends to be a bit dry, breaking down a player’s deficiencies by examining the particulars of how low his shooting percentage, rebounds, assists, etc are relative to the league or position average. Just as importantly, Berri suggests that the metric is best used as an entry point into proper player evaluation and development. It’s a short hand for identify players who might be improved. Despite Berri suggesting players don’t change much from year to year, from team to team, from coach to coach, it may be because no one has tailored a practice program for players based on this simple evaluation. Or it may reflect the ceiling offered by a player’s talent. Aside from these straightfoward analysis of why players have below, above, or near average productivity, Berri doesn’t write about how he might enjoy watching certain NBA players. I think it gives an unfair impression that he is a bloodless machine who doesn’t know what a basketball looks like. His model does not account for flair, style, or aesthetics that is probably the raison d’etre for watching sports in the first place.

For sports writers like Simmons and Posnanski, they approach it from the aesthetic domain first. The assumption is that they have an eye for talent and style, and that this is applicable to how everyone else enjoys watching that player or game. I don’t mean that they are interested in a so-called objective way to rank the entertainment or productive value of these players. I mean that they want, but are frustrated by the fact that they can’t always, to identify an essence of a player that can be applied without qualification or exception and can be easily demonstrable. The clearest example is in the way some describe and compare Kobe Bryant to Michael Jordan. Dave Berri can rank the two, not only in absolute terms but as some standard deviation above the league average for their eras. In that comparison, not only is Jordan more “productive” than Kobe, he is a nearly twice so. Simmons would argue that Kobe is the best there is now. He might be a cut below Jordan, but there is no player closer.

One solution here is to recognize that there is a difference between the professional and the scientific presentation of ideas. Berri started from the metrics first, despite whatever he might think about the players. Simmons cannot, or would not, separate the aesthetics and productivity of the players he enjoys watching. There is nothing wrong with either approach. The only difference is that Berri’s work easily translates into a scientific publication format. Its details all concern finding some measure, defending that measure, identifying advantages of using that measure, and discussing how this measure may be insufficient. In other words, Berri and other scientists are biased into finding “measurables”. For better or for worse, because in the end, the basic scientific hypothesis is “how much.” How much did this drug improve patient outcome? How much did the tumor reduce? How much is a photon deflected from its true path by a massive body? Can we identify how many molecules of this do we have?

This isn’t necessarily a reductionist approach; at its best, finding quantifables is a way of creating a reference point so we can start to discuss things. Thus, the proper angle to take against a scientist (i.e. Berri) is to identify and improve on his assumptions, find a different metric that gives a higher correlation, or improve on his metric by finding more terms that add value to enhance correlation. In other words, scientific discussion is limited by the context of the methods, which acts as a framework for subsequent arguments.

The sports writers do not have this limitation. They can seque between stats and aesthetics. Like Simmons, they can also sprinkle pop-culture references that actually advance their argument. However, I think because they do approach things from an aesthetic angle first, they tend to provide contexts based on motifs and not on metrics. In other words, it allows Simmons to focus on the literary spin of his piece, relating the NBA offseason to lines from  the movie Almost Famous. It allows Posnanski to say that he wants a new stat, because he doesn’t like how OPS is pronounce “ops” and not “Oh-Pee-Ess”. There is a lot of room for literary flourish, which shouldn’t make the argument any more objective, but it becomes much more enjoyable.

Interestingly enough, and, ironically, I haven’t looked at this for all cases, I think for the most part, Simmons and Berri emphasizes the same attributes they want from their ideal basketball player. They want someone who can shoot well (i.e. high shooting percentage), score a lot of points, make passes for assists, don’t cough the ball up, and make rebounds. Where they differ is in how they rank the so called “top players”.  Berri has noted that most conventional players evaluation centers on points scored (without regard to the number of misses the player made.) He has noted that player rankings and player salaries have a correlation of 0.99 compared to points scored. And strangely enough, Berri’s work showed that scoring points, by itself, does not lead to higher efficiency differentials. Despite what writers and general managers profess about finding complete basketball players, they put their money on the point-getters. In other words, all the verbiage devoted to arguing how smooth and graceful players are, how much one should enjoy their talent before they fade into old age, the idea of “aesthetics” and “points” are no different. It’s interesting that Berri noted that in fact there may be an implicit metric being used to evaluate players based on the so called explicit measure of a player’s style/gracefulness/aethetics.


The WoW Journal Comments Policy