Today’s guest blogger is Steve Walters. Apart from his day job as a professor of economics at Loyola College in Maryland, Steve has served as a consultant to two MLB teams and writes occasional statistical analysis features for The Sporting News. He grew up in Salem, Massachusetts, remains a citizen of Red Sox Nation, and counts as his most cherished piece of sports memorabilia an autographed copy of MBA: Management by Auerbach.
Let’s start with a pop quiz: At what age does the average big-league ballplayer reach his peak?
If you said 27, you are an unusually conscientious student of sports and, quite likely, a devotee of the great Bill James, one of the founding fathers of sabermetrics and a consultant to the Red Sox.
You’re also wrong. (Profs love to do this sort of thing, don’t they? Make people nervous with an obscure question, let the brown-nose down in front show off a minute, then slap him down. This is why everybody hates profs.)
James took on this research question back in his 1982 Baseball Abstract. Baseball traditionalists held that players are in their prime from age 28 to 32. But James concluded that this belief was “blatantly false.”
He examined the career stats of 502 hitters born in the 1930s and found that, on average, they had their best years (“peaked”) at age 27. Most players, he wrote, “attain their greatest value before the 28-32 period even begins, are declining throughout that age range and have lost nearly half of their peak value by the time it ends.” Among statheads, that eventually became accepted as gospel.
Then along came Bowling Green State University statistician Jim Albert, who looked at a broader sample of data in 2002 and found that James’s findings were… well, flukey (see: http://bayes.bgsu.edu/papers/career_trajectory.pdf).
Albert examined the productivity of hitters born over six decades, and found that James’s 1930s sample of players, for whatever reason, peaked at a younger average age than any before or since. Players born in the 1910s peaked at age 28.0, those born in the Roaring ‘20s at 28.6, and Depression babies peaked at 27.1. Those born in the ‘40s, however, peaked at 28.9, the ‘50s at 28.7, and the ‘60s (thank you, modern training methods) at an average age of 29.8.
It’s fair to say then, that most players through history have peaked closer to age 29 than 27. Albert also found that half of the players born in the ’60s peaked between age 27.9 and 32, while a quarter peaked before and a quarter after that age range. That could be another statistical fluke, of course. But it could also mean that the traditionalists are right on target and the smart-ass sabermetrician led his readers astray.
Why do I bring this up? Emphatically not to suggest that profs always know more than best-selling writers like Bill James. Believe me, I know a lot of professors, and quite a few can’t find their hindquarters with both hands, much less divine the truth from data. (Especially humanities profs, but don’t get me started about that.)
The point is that it’s actually damned hard to figure out what’s really, really true by sifting through numbers. Sometimes profs do it better than intelligent laymen, and sometimes the reverse is true.
All I’m saying is that we need to be careful before we conclude that some “study” by anyone actually “proves” something. As the James/Albert episode points out, sometimes a well-constructed study coughs up a result for one era that turns out not to be typical of others. Or a researcher’s methodology may unintentionally twist things in a particular way. Or a boatload of statistical subtleties may confound things.
Unfortunately, thanks to Al Gore’s invention, we are awash in data, making it wicked easy to crank out studies and proofs. Post-MoneyBall, just about every big-league ballclub has hired some kid with an Ivy League diploma to “crunch the numbers.” Or actually call the shots.
Not that there’s anything wrong with that—until and unless we get sloppy, credulous, and excessively eager to do new stuff without making sure the old stuff is, you know, actually true. Like the old saying goes, it isn’t what you don’t know that does you harm, it’s what you know that’s wrong.
In academic research, there are (at least) two devices that help protect us against knowing wrong stuff. Before we publish something, we have to submit to blind peer review, which means a few jealous, picky, anonymous rivals get to dissect our work. Usually for excruciating months. And if it’s ultimately published, other rivals are invited to try to replicate our conclusions using our data, or new data from other samples.
This is not a guarantor of truth. Sometimes malodorous stuff sneaks through these filters, which reduce but don’t eliminate the chance of error. But there’s value to you, the reader, in knowing whether “research” has passed through such a vetting process. Caveat emptor. When you’re consuming statanalysis, ask yourself whether the author is an expert or pseudo-expert—and even then whether other experts have had a crack at debunking the work. (E.g., it’s notable that the book which inspires this blog is from a renowned university press, and that much of the research on which it’s based was initially published in refereed journals.)
And if you’re a researcher yourself, I’d encourage you to spend some effort on replication of others’ work. Ask for their numbers and crunch ‘em yourself, or get fresh ones like Albert did. (Aside: Has anyone studied when basketball players peak? Paul Pierce turns 30 before next season, and I’m worried that he’ll be completely cooked before the Celtics get good again.)
In any case, whether we’re readers or researchers our inspiration should be the great philosopher Porgy, who once famously confessed that “I takes dat gospel whenever it’s pos’ble, but wid a grain o’ salt.”
–SW
john harrington
April 21, 2007
Excellent job with the Iverson analysis. I posted your original assesment of the trade in my fantasy league message board and they thought you were crazy. I loved posting this message, along links to some of my favorite blog postings from your site.
WoW has completely reshaped the way I watch and enjoy basketball. It helps me better understand why the Lakers lose when my favorite player, Kobe, scores 50 points on 33% shooting with 7 turnovers. It is so much better to understand how players contribute instead of listening to the conventional widsom that says, “Kobe had 50, where was the rest of the team?”
Anyway, I would love it if you guys would do some statistical analysis of Barry Bonds setting the single season HR record. How many standard deviations outside the mean was hitting 73. And how likely is it that A-Rod will make it to 74, considering his quick start this year.
Is A-Rods quick start simply something we should expect to see every once and a while with a 50-HR hitter?
(I presume it is very unlikely he will hit more than 60 HR’s this year as he has only hit more than 52 one time, 57)
john harrington
April 21, 2007
btw- I meant to post that in the previous article’s comment section.
Professor Walters, I enjoyed your post as well!
Jeremy
April 21, 2007
That’s actually a great post. It’s funny to think that at the end of the day a statistical proof still boils down to simple matters of trust–the capability and expertise of the person producing the work, the process they used, and the people who validate it (be it peers or a public audience).
The reason I came to trust the Wages of Wins relatively quickly had a lot to do with the apparent credibility of the authors and the fact of the work’s academic rigor. Those were unique qualities for subject matter that often competes with sports journalist statheads without obvious expertise or background.
At the same time I know some of the statistical arguments and language Wages of Wins used is a real turn off to a lot of sports fans. They distrust far out number-powered ivory tower theories and prefer their own eyes and own stories. And more power to them. In a lot of cases the wildly inaccurate stories are more fun.
Steve Walters
April 22, 2007
Thanks for the kind words, Jeremy and John.
On the topic of homerun records, John, you might be interested in a piece I did on “homer inflation” for TSN a while back. It’s available here:
http://findarticles.com/p/articles/mi_m1208/is_21_228/ai_n6126723
What A-Rod is doing now is mind-boggling. And given that the Red Sox are nevertheless in 1st, even I’m enjoying it…
The Franchise
April 22, 2007
A-Rod doing well is something all baseball fans should be pleased with, since all too many idiot Yankee fans have criticized him. It seems that carrying his team through the season leaves him worn down by the end of the year, when he is criticized for poor performance. That team would struggle to get there without him.
Pizza Cutter
April 22, 2007
As a researcher myself (in psychology), I’ve spent those many months waiting to hear back from journal editors. Maybe we need a Journal of Sabermetrics as a formal record of things… Imagine someone trying to cite that in a dissertation.
Guy
April 22, 2007
It looks to me like Albert and James are employing different definitions of peak value. Albert is using a rate stat in his study (linear wts per PA), while James was using a measure of performance effectively weighted by playing time (“VAM”). It’s certainly plausible that players log more playing time at ages 26-27 than at 28-32, on average, in which case they may be producing more total value despite a slightly lower performance rate per PA. Either approach is a valid way to define peak performance, of course, but an analyis using one definition cannot refute an analysis based on the other.
In addition, Albert limits his study to players with at least 5000 PA, a long career in baseball terms. It’s possible — indeed, likely — that a sample of players with longer than average careers will have peaked at a later than average age.
So I don’t think the data you’ve cited supports the conclusion that James was “wrong.”
tangotiger
April 23, 2007
“In academic research, there are (at least) two devices that help protect us against knowing wrong stuff. …Caveat emptor. ”
Your above quote from start to finish (of which I cut so as to not clutter up here) may sound right, but you can’t ascertain to the level that things are caught by the filter.
Speaking as someone who works “in sabermetrics”, I can tell you that there are enough academics who publish sports-related analysis in journals and books that I wouldn’t trust those apparent peer reviews any more or less than what I read elsewhere.
Not to pick on Albert, but I remember reading that particular piece, and I emailed Albert my notes on things that were just plain wrong, and I did not hear back from him at all.
There was another piece in a recent sports journal, which again had something plainly wrong. I was then invited to peer review other articles from that journal. So, that’s a good thing. (And when I did my first review, there was some major problems with an article. Whether some other peer reviewer would have caught it, who knows.)
These are just two instances in a long line of instances.
Perhaps peer review works in other fields, where the experts are actually part of that circle. But, when it comes to sports-related analysis, there are enough missing experts from academics that I wouldn’t consider being published in a journal with peer review in the same light as other fields.
As for the specific discussion on age, it’s pretty clear that the longer your career, the more chance you have to put up big numbers later (whether by true talent, or sample size). I’m pretty sure that the guy who comes into the league at age 21, and is out of the league at age 25 did not peak anywhere between age 27 and 31.
Steve Walters
April 23, 2007
Good points all, Guy, especially your focus on the differing methods used. The key thing that struck me in comparing them was the inter-temporal variation Albert found. It was, probably, just something that never occurred to James to test for, and the fact that they got similar peak ages for the same decade they both looked at leads me to suspect that James’s conclusion was sample-sensitive. But you’re right that saying he’s flat-out wrong may be excessive.
Tango also correctly points out that peer-review when the peers lack topic-specific training can be a damned poor filter. Agreed–and I tried to make clear that profs are very far from perfect. Tom, BTW, is an excellent and prolific sabermetrician with a fine book, available here:
http://www.insidethebook.com/
A lot of what he and other dedicated sabermetricians do is, in fact, subjected to an informal review process in that it gets amply chewed over in cyberspace and in print.
What’s problematic, to me at least, is that the review process can be uneven, both in academic circles and outside it. Ergo, the “caveat emptor” warning still applies, in my view: consider the source of what you’re reading, and take your time in judging almost anything to be a “settled issue.”
tangotiger
April 23, 2007
Thanks for the kind words.
I’d be happy if some of the academics would actually walk over to the non-academic forums. In terms of exchanging ideas with academics, the non-academics can teach as much as, if not more than, it can learn.
However, the authors of WoW actually wrote this in a BEpress journal:
“In the most recent issue of this Journal, Ronald Beech reviewed our book… Had this review simply appeared on his website… we would have been inclined to either ignore his comments or respond on our own website… By placing his comments – most of which are inaccurate and misrepresent our analysis – within an academic journal, though, we feel compelled to respond in the same forum.”
And that, in a nutshell, is how some non-academics see some academics. That they actually said they’d possibly ignore what thousands would deem thoughtful commentary from one of the leaders in the basketball analysis group. I’ve been told as much on another occasion.
In short, Mr. Thornton Melon has alot to teach Dr. Phillip Barbay.
tangotiger
April 23, 2007
Here’s another thought to consider. As we know, speed peaks earlier than other attributes. If you didn’t, this might help:
http://www.tangotiger.net/agepatterns.txt
Power peaks a bit later, and has a higher-slope up toward that peak.
If you have a league filled with Vince Colemans and Willie Wilsons, it’s possible that this league’s hitting will peak earlier. And if you have a league filled with Dave Kingman and Cecil Fielder, it’s possible that this league’s hitting will peak later.
If each type of player had his own aging path, the league makeup won’t necessarily have each type of player proportionate over time. So, it’s possible that you can get a later or earlier peak, simply by altering the profile of a league.
I don’t know if this has any impact. My point is simply that there are many considerations, some of which I go through on my blog that links to this blog:
http://www.insidethebook.com/ee/index.php/site/comments/peak_offensive_age/#3
And baseball is not only about hitting. A player is hitting+fielding+baserunning.
Guy
April 23, 2007
Having had a chance to look at the Albert paper a bit more closely, two other observations about its limitations:
1) Albert doesn’t adjust for changing offensive levels in the game. This means that hitters whose post-30 years come in a high-scoring era will appear to age better, and have a higher peak, and the reverse effect will appear if you turned 30 in, say, 1961. This probably accounts for his finding that players born in the 1960s have a very late peak — they got to play their declining years during the post-1993 offensive explosion. Many players born in the 1930s, in contrast, played their twilight years during the pitching-dominant 1960s, yielding an early peak estimate.
2) Just as importantly, Albert himself notes that the longer a player’s career, the flatter the curve, i.e. the longer you play, the less steep is the decline from peak. But he misses the profound consequences for his study: by looking only at players with substantial longevity (5000 PAs), he is looking only at players with flatter than average curves. And his sample must be heavily weighted toward very-high longevity players. Someone should correct me if I’m misinterpreting the consequences for his quadratic curves, but I think fitting his aging curves to these unusually long and flat post-peak performances must result in estimating peak age later than would be true had he looked at a wider sample of players.
This strikes me as a problem with the quadratic model approach. Presumably, what players do when they are 33 or 37 or 40 can’t tell us if they were at their peak at 27 or 29. Yet in this methodology, the late-years data will impact the shape of the curve, and thus the estimate of peak. Seems like a problem.
* *
Steve: thanks for your thoughtful reply. However, I have to object to your observervation that “The key thing that struck me in comparing them was the inter-temporal variation Albert found. It was, probably, just something that never occurred to James to test for.” You seem to have missed an important point: players born in the 1950s and 1960s, and even some born in the 1940s, had not finished playing when James wrote in 1982 — indeed, many hadn’t even begun their careers. The players James selected were the most recent generation available to him for study.
tangotiger
April 23, 2007
Guy, your comments I seem to remember were some of what I wrote to Albert.
A few points: when he does his individual comps, he makes no adjustment for era. So, the Rose/Raines comps aren’t good, nor are the others.
Look at a guy like Ozzie Smith:
http://www.baseball-reference.com/s/smithoz01.shtml
Up until 1990, the Albert model would have shown him to peak say around age 30 or so. However, after the 1992 season, where his 91 and 92 seasons were not only above his previous peak, but way above what the extended trajectory line would have him, the new trajectory line must be best-fitted. The result is a likely higher peak, and probably at a little later age for peak.
Now, who knows if Ozzie actually got better at ages 36-37, only to bounce back to where he should be afterward, or if it was simply a sampling issue (uncertaintly level, even with 1200 PA). But, the point is that the longer you are allowed to pile up PA, the more likely it is that you actually performed rather well (whether by talent or luck) and the more likely that you get to shift the aging trajectory to an older age.
What we are missing are the Gary Carters, the guys who were a shell of their former self, yet managed to continue to play. Most players wouldn’t have this luxury to keep playing. In order to best-fit Carter, his peak would have to be shown as coming earlier, so that the downward trajectory can be best-fitted with his crappy years as well.
With Carlton Fisk, you have the opposite.
Now, if all such players are in your pool, then great. But, Jim Rice was gone after one crappy year and no longer allows us to best-fit his performance at aged 37-42, where he’d be really low.
Matt Festa
April 23, 2007
Prof. Walters,
Hope all is well. Quick Question: Is there any research that suggests the top flight stars are aging more slowly or have a second peak later in their career (Bonds) relative to earlier starts (Mantle, T. Williams)?
TDDG
April 23, 2007
Prof. Walters: I fear I may have once been one of those brown nosers who always answered the questions first…
I think the sample set in such analyses will always be a problem. Players who remain in the league beyond age 32 or so tend to be above average players. It is also true that players with above average skills tend to be called up to the Majors at a younger age than other players. So your data set of 32 year-olds and 23 year-olds are both biased. Sure you can try to compare peak years player by player, but can you account for players who are called up at age 23 and get hurt? Mark Prior has likely peaked as a player. Does that say anything about what his physical peak might have been had he stayed healthy?
You can try to deal with this with regression, but I think you have an extrapolation problem. You can’t know if players with below average skills peak earlier or later than other players. Before you scoff and say “Who cares?” teams still need to fill out their utility positions. Or know when to cut bait on minor leaguers who are underperforming expectations.
Steve Walters
April 23, 2007
Awright… some favorite old students are taking time out from getting rich playing the markets to think about sports econ. Good to hear from you guys…
On the “2nd peak” issue, Matt, I did write something about this for TSN a while back:
http://findarticles.com/p/articles/mi_m1208/is_39_229/ai_n15634177
And TG gets at some bias issues that do, indeed, make this question hard to answer definitively. Albert’s mathematical approach dealth with some of it, but his sample was good players, and there is surely a question as to whether bad players peak sooner than good ones (or, OTOH, whether they peak at the same time, on average, but just make the majors later and exit sooner).
Guy and Tango also raise add’l interesting questions. One thing that’s related to the point about different abilities peaking at different times is that savvy teams might exploit the possibility that some skills decay at slower rates than others. E.g., the A’s were willing to acquire David Justice a few years ago despite the fork sticking out of his back because they felt (supposedly, anyway) that OBP skills persisted after others had mostly gone.
TDDG
April 24, 2007
Bill James used to talk about “old man skills” vs. “young man skills.” If I remember correctly, his theory was that big hulking power hitters aged poorly compared with more atheletic players. I can’t remember off the top of my head if he ever did a scientific study on the subject, but its logical that some skills would persist better than others. OBP makes sense. The A’s signed Piazza this year and I suspect there is a similar theory…
Bill Compton
June 4, 2007
Hi Jim. Photos i received. Thanks