A Guide to Evaluating Models

Posted on November 18, 2007 by dberri

One of my tasks this past weekend – beyond watching football and basketball — was to review a paper for the Journal of Sports Economics. Although I can’t reveal the details of the paper, I can note that this paper was relatively standard in the sports economics literature. A model was proposed, estimated, and discussed. My task as a reviewer was to evaluate the “quality” of this work.

So how do we do this? How do we evaluate a model? My sense in reading comments on-line is that this is not a well understood process. So I thought I would take the time to write down some basic guidelines I and other researchers consider when we review a model.

It’s important to note -as I will emphasize again before I conclude this column – that beyond the first issue, there is no specific order of importance to these guidelines. Rather these are just a collection of issues we keep in mind when we are reviewing a paper. So although I have numbered these points, don’t let the numbering suggest a ranking.

I would also add that although this column goes on for more than 2,000 words, this exercise merely serves as an introduction to the topic of model evaluation. If you want more information you probably need to take a few classes in statistics and econometrics.

Okay, enough of the caveats, here are the guidelines:

1. The theoretical foundation of the model

This is THE most important issue. A regression model looks at the link between a dependent variable (what you are trying to explain) and an independent variable (what you think does the explaining). The choice of independent variables must be guided by theory.

Why is theory so important? It’s important to remember that statistical analysis can tell us about correlations. Causation is inferred from theory. So if you have no theory, it’s not clear what your model is telling us. And without a theory, it’s not clear what other researchers and/or decision-makers would ultimately do with your results. In essence (with very few exceptions), if you ain’t got a theory, you ain’t got a model.

So having a sound theory is important. But you have to do more if you want to see your work published. Once it’s clear your work has a clear theoretical foundation, we then consider a series of other issues.

2. Statistical significance and economic significance

After we understand the theory being tested — and assuming the model was estimated correctly (correct functional form, no econometric issues like heteroskedasiticy, autocorrelation, multicollinearity, etc…) — we tend to look at the statistical significance of the estimated coefficients next. By statistical significance I mean whether or not the estimated coefficients (for the independent variables) are different from zero. If they are, then the researcher might have found something.

As we note in The Wages of Wins, Deirdre McCloskey has spent quite a bit of time hammering home the point that statistical significance is not the final word in evaluating a coefficient. We also want to know the economic significance of the results. What this means is that we want to know how much each coefficient matters.

To illustrate, we find that salary is linked in basketball to scoring, rebounds, blocked shots, and assists. These factors have a statistically significant impact on how much a player is paid. But what is important is which factor matters most. And when we look at economic significance, we find that scoring is the most important determinant of player compensation.

Historically, as McCloskey point out, many economists stopped with a discussion of statistical significance. But I think more recently – probably due to the persistence of McCloskey — economists are increasingly taking the time to talk about economic significance.

3. Robustness of results

If we estimated the model differently would we get substantially different results? When we ask this question we are asking if the results are “robust.”

To illustrate this point, consider the Price-Wolfers paper. This study found evidence of racial bias in how NBA referees called personal fouls. What was impressive about this result was that the finding of racial bias remained even after the authors tried a multitude of specifications. When we see such “robustness” our confidence in the results tends to rise.

4. Explanatory power

Non-economists and students tend to get most excited about explanatory power or R². R², for those who are not statistically inclined, is a simple ratio comparing the amount of variation in the dependent variable your model explained to the amount of variation that exists in the dependent variable. For example, if you have an R² of 0.55, then your model explains 55% of the variation in the dependent variable. This means that 45% of the variation in the dependent variable was not explained.

If you are comparing two models that are seeking to explain the same thing, you might consider explanatory power in deciding which model you prefer. However, and this is a point emphasized in most econometric textbooks, just because one model can explain more does not mean it’s the preferred model. A model with a higher explanatory power, but with no theory behind it or serious econometric problems, will be rejected by researchers and reviewers.

I would also note that I have had models published with an R² that was less than 10%. In fact, the aforementioned Price-Wolfers paper presented several models, ranging in explanatory power from 1% to 28%. Again, explanatory power tells us something, but it doesn’t tell us everything.

5. Forecasting power out of sample

Related to explanatory power is the issue of forecasting. Explanatory power evaluates a model within the sample examined. Forecasting considers how well a model does out of sample.

To be honest, few papers make any effort to forecast out of sample. Still, this is an issue one could consider in evaluating a model. Basically, if a model has high explanatory power, but cannot forecast, then we suspect the results are entirely specific to the sample tested. This tells us that results cannot be generalized and are perhaps of little interest. Furthermore, for decision-makers, an inability to forecast with a model is a significant problem. After all, decisions are about the future. If your model only applies to the past, then it probably cannot help a person make better choices about the future.

6. Simplicity of model

Which is a “better” model, NBA Efficiency, Game Score, or Player Efficiency Rating (PER)? Before answering, let me quickly note that Game Score is John Hollinger’s simple version of PER. The calculation of Game Score is as follows.

Game Score = Points + 0.4*Field goals made + 0.7*Offensive rebounds + 0.3*Defensive rebounds + Steals + 0.7*Assists + 0.7*Blocked shots – 0.7*Field goal attempts – 0.4*Free throws missed – 0.4*Personal fouls – Turnovers

Game Score does not make any adjustments like PER does for team pace. It simply adds and subtracts the box score statistics, according to the various weights Hollinger has chosen.

Is PERs “better” than Game Score? One might think the more complicated model (PERs) must be “better”. But when we look at player performance last season we find a 0.99 correlation between a player’s per-minute Game Score and his PER. In essence, each model tells the same story.

What about Game Score and NBA Efficiency? Here is the calculation for NBA Efficiency.

NBA Efficiency = Points + Rebounds + Steals + Assists + Blocked Shots – Turnovers – All Missed Shots

Game Score is a bit more complicated than NBA Efficiency. But when we look at player performance in 2006-07, we again find a 0.99 correlation between these two measures. So these models are also telling us the same story.

Given these similarities, which should we prefer? In general we prefer a simple model – or what we call a parsimonious model – to a complex model. In other words, we don’t add complexity to a model unless it helps us do something (i.e. solve an econometric problem, improve explanatory power, etc…). If two models give essentially the same answer, the simple model will be easier to both explain and work with. So that is our choice.

Comparing NBA Efficiency to Wins Produced, Win Score, and PAWS

My list of factors is not complete. For example, I left out whether the work is actually “interesting” or “important.” Still, I think this list gives a good set of issues to consider when looking at a model. To illustrate how this checklist can be used, let’s think about NBA Efficiency.

On the plus side, this model is quite simple (issue #6) and it does forecast itself (although not team wins) fairly well (issue #5). But it’s theoretical foundation is weak (issue #1) and a team’s NBA Efficiency only explains 23% of team wins (issue #4). Now if you change the dependent variable from team wins to player salaries, the usefulness of NBA Efficiency increases. Although NBA Efficiency doesn’t do a good job of explaining wins, it does a nice job of explaining free agent salary. In essence, NBA Efficiency (and models like this) does a nice job of telling us about perceptions of performance. It just has some problems if our objective is to measure the impact a player has on wins.

Now consider Wins Produced, Win Score, and PAWS. Wins Produced is derived from the relationship between wins and offensive and defensive efficiency. This relationship is based on sound theory, so there’s a clear theoretical foundation. And as end note 37 from Chapter Six of The Wages of Wins indicates, the results are also somewhat robust. One can make a few minor alterations to the model reported in Berri (1999) and get the same results.

Beyond theory and robustness, we also see that Wins Produced does a nice job explaining current wins and allows us to forecast. Again, those are positives. Finally, one can derive two simple models from Wins Produced. Both Win Score and PAWS are quite easy to calculate and essentially tell the same story as Wins Produced.

So when we consider issues like theory, robustness, explanatory power, forecasting, and simplicity, the Wages of Wins basketball measures appear to be at least “reasonable” models. In contrast, NBA Efficiency appears to have a few shortcomings.

A Few Last Issues to Consider

It’s important to note that these are guidelines, not a score-sheet. In other words, it’s not like each factor is assigned a value and we simply choose models that have the highest scores. No, this process is not quite that precise. All we do is review the model with these issues in mind. Then if we think the model falls short with respect to one or more issues we suggest the author make some changes (or if the problems are quite severe, reject the paper).

Given the lack of precision in the process, it’s not uncommon for two researchers looking at the same model to come to different conclusions. Certainly authors of a paper often disagree with the reviewers. The blind, peer review process, is not a process that gives us the “truth” or a definitive answer. It is, though, better than any alternative we have come up with to review research (by the way, I should dedicate an entire post to why reviews need to be blind and generated by an author’s peers).

I would emphasize that although the process often leads to disagreement, in my experience few people take the disagreements personally. Some of my best friends in economics produce work I am not too sure about (and they probably feel the same way about my stuff). Often non-researchers characterize what we do as a “contest.” In reality, research is really just a cool way to spend your time.

Let me close by emphasizing that there is one factor I did not include in the guidelines. Often I see people stating that model A is “good” because it fits what everyone already believes. Or model B is “bad” because it doesn’t. I have serious problems with such an approach. In general, in evaluating a model we do not consider whether the model confirms or rejects our pre-conceived notions. So with respect to NBA Efficiency, Wins Produced, or any other model designed to measure player performance, we do not consider whether the model returns a ranking consistent with our prior beliefs.

I often argue, if we let prior beliefs determine whether we accept or reject a model, then we might as well skip this entire process and simply take a vote. And although voting is nice, I am not convinced the democratic process produces results that trump solid statistical analysis.

– DJ

The following column offers even more on the importance of simplicity in building a model: Talking with Henry Abbott and a Comment about Model Building

Our research on the NBA was summarized HERE.

The Technical Notes at wagesofwins.com provides substantially more information on the published research behind Wins Produced and Win Score

Wins Produced, Win Score, and PAWSmin are also discussed in the following posts:

Simple Models of Player Performance

Wins Produced vs. Win Score

What Wins Produced Says and What It Does Not Say

Introducing PAWSmin — and a Defense of Box Score Statistics

Posted in: General

177 Responses “A Guide to Evaluating Models” →

Alex Chichilnisky

November 18, 2007

Good post overall but I don’t agree with this line– “The choice of independent variables must be guided by theory.”

Why can’t it be a purely black box model in which you adequately test out of sample to endsure robustness?
Pete

November 18, 2007

This is a great column.
T.G. Randini

November 19, 2007

dberri,
There was a comment by Pete on the 15th,
“TG Randini is a real person. I’ve never seen him, but ironically he is the author of a very eloquent review of “Wages of Wins” that partly motivated me to read it last year.”
Yes, I wrote that review.

And yes, 95% of your work is valuable, especially in the larger sports/economic issues.

But –

In your post above, “having a sound theory is important.”

1. Your player efficiency model in basketball does not allocate performance properly for reasons I’ve previously cited. By ignoring concepts of difficulty, you mis-price resources as if all resources were equally scarce.

They are not.

2. Financial valuation models price assets according to their risk. Assets that are riskier are priced so that future returns compensate for the increased risk.

Shooters take increased risk in your model because they are penalized when they fail to convert. Thus, when they are successful, they should be rewarded more highly than you reward them in relation to the rebounders, who do not take as much ‘risk’ in your model because rebounders are only rewarded in your model without ever being punished.

Your basketball performance model prices (or values) rebounds too high in relation to the riskiness of gaining the asset.

regards,
t.g. randini

A) a tall man who can free himself from a defender, jump up and deposit a ball from some distance into a round hoop takes more motor skills and hand-eye coordination than… B) another tall man who jumps up and merely grabs the ball. I believe that the additional motor skills required for ‘A’ makes the activity a scarcer resource than ‘B’.
T.G. Randini

November 19, 2007

dberri,

You have demonstrated that wins are the important determinant for home attendance (assuming market size is held constant) and star value is the important determinant for road attendance. This makes sense for home attendance because tribal-bonding rituals are enhanced when victory is ‘shared’ among the fans constituting the local ‘tribe’. Star value on the road makes sense because basketball is entertainment and ‘stars’ provide the highest entertainment value. As Maximus once said after defeating an opponent in the sport of gladiator fighting, “Are we not entertained?” (They were… and he won an Oscar.)

But, in addition, the national television contracts provide enhanced revenue for the owners, and are certainly important at the margin. It is well-known that ratings correlate with star power because the majority of the viewers of nationally broadcast games belong to neither of the local tribes.

Allen Iverson and his ilk (the Vince Carters, etc…) may not have great value as efficient win producers for their home team, and minimal incremental value as a road attraction, but their value through national broadcast revenue to the capitalist oligarchy that controls the NBA is often forgotten.

And this broadcast revenue may even pale next to the tens of millions of dollars of star power merchandise generated through jerseys, bobble-heads and the like. (After all, Iverson jerseys always seemed to out-sell Steve Kerr jerseys by a 1000 or more to 1.) (Wasn’t the oligarchy all too willing to share MJ’s salary expense when he returned to the Wizards because of the incremental revenue streams MJ would generate?)

It is not important to individual members of the oligarchy that Iverson plays for (italics…) your own team… as long as he plays (italics…) somewhere.

The numerous statistical analyses that try to compute a player’s value (and his monetary worth) based only on his win producing efficiency fail to take this additional revenue streams into account.

In a capitalist society, the rent-seekers know it’s not just about the wins a labor asset produces… it’s about the money.

always,
t.g. randini

PS: And perhaps, even more importantly… the ‘protection’ of their money. The aristocracy seeks to anesthetize the lower classes by providing entertainments. If the lower classes are entertained, they are not spending their time contemplating revolution.

Mr. Iverson not only provides revenue streams. He provides insurance.

In summary… his ‘win value’ to the oligarchy (in total) is meaningless and in the abstract it will always be zero (although plus x or minus y to an individual owner, it is a net zero to the oligarchy.)

His value to the oligarchy is in… A) broadcast and merchandise revenue streams, plus… B) as an insurance policy to prevent revolution and the subsequent re-distribution of rent-seekers’ wealth.

Mr. dberri, I’m surprised that you, as an economist, are mucking about with all these micro-level stats and narrowly defined valuations when the picture is much larger. Or is it your objective to provide more anesthesia to the masses? Are you a proletariat slayer of ‘old-stat’ thinking or just another capitalist tool?
The Franchise

November 19, 2007

Can I make a joke here about evaluating models?

(Answer: Probably. My wife doesn’t read this blog much, and my mother is unlikely to recognize my pseudonym.)
Baba O'Riley

November 19, 2007

hey, Are you going to tell us the joke now?
dberri

November 19, 2007

Franchise,
Give us the joke.
Alex Chichilnisky

November 19, 2007

We’re still waiting for the joke!!
Oren

November 20, 2007

Franchise,

Cute pun.
T.G. Randini

November 20, 2007

John Maynard Keynes once said, “I rather enjoy building models. But I’d much rather date them.”
dberri

November 20, 2007

Okay Franchise,
I am sooooooooo slow. I get it.
T.G. Randini

November 20, 2007

P.S…. the franchise’s joke was his ‘answer’ and it was hysterical.
Baba O'Riley

November 20, 2007

I still can’t find the joke. Is it on the blog linked to? Can someone post the URL?
dberri

November 20, 2007

Baba,
When Franchise says “models” he means fashion models. Hence the issue with his wife and mother.
Baba O'Riley

November 20, 2007

I get that much, but I still don’t see where the joke is. Unless it’s some kind of sick meta-joke where a joke is promised and then withheld.
Baba O'Riley

November 20, 2007

I think I got it now. In my defense, puns are the lowest form of humour.
T.G. Randini

November 20, 2007

The genuis in Franchise’s joke was not in the pun, but in his construction of the joke and the sublime way he whispered the answer.

Keynes, that wily old codger, was merely expressing that he appreciated all sorts of curves… supply, demand, and otherwise.
Baba O'Riley

November 20, 2007

Keynes was a closet Marxist.
Alex Chichilnisky

November 20, 2007

Keynes was an idiot.
T.G. Randini

November 20, 2007

That’s what the queen said when she knighted him. “John Maynard, you – are – an – idiot.”

“SIR Idiot, your highness!”
T.G. Randini

November 20, 2007

As much as Sir Keynes may have tried, no idiot will ever compare with Sam Jaffe’s portrayal of the Grand Duke Peter in Von Sternberg’s “The Scarlet Empress”.
Alex Chichilnisky

November 20, 2007

Randy, I’m aware of that anecdote. That’s what I was referring to. I actually like Keynes.
T.G. Randini

November 20, 2007

I love it…
Baba O'Riley

November 20, 2007

I think point #2 (economic/stat signifiance) in this post is a very important one that alot of people forget.
Michael

June 20, 2010

This is very interesting, thank you for posting this. Keep it up with the good work :D