The following is a quick guest post from WoW Journal reader Erich Doerr. Erich used the Win Score metric to investigate the top prospects for the 2007 NBA Draft. He also provided similar analysis in a first glance at the 2008 draft. In this post, Erich provides an analysis of who should be the favorites in the Big Dance.
Now that the seeds have been given, what are the odds on a national championship? One way to assess probabilities is to simulate the tournament, using basketball statistics. Given 10,000 simulations, a Monte Carlo method, we can generate a plausible list of championship odds for each tournament entrant.
The two strongest public NCAA metrics are the Sagarin Ratings and Ken Pomeroy‘s Pythagorean Ratings. Statistics used by the Wages of Wins are parallel to Pomeroy’s approach, as both incorporate offensive and defensive efficiency. Individual game outcomes can be modeled via methods like log5 analysis and Sean Foreman’s Sagarin approach.
With this approach, we should expect to see favorites generally prevail, and given enough trials, we’ll see a low seed team go all the way.
Which #1 got the easiest bracket? This analysis suggests Kansas has the easiest path by either metric. Sagarin seems to show UCLA got the toughest draw while Pomeroy’s stats believe North Carolina received the toughest draw.
The following tables report my analysis:
Table One: The Path to the Final Four
Table Two: Projecting the National Champ
Outside of statistics, there is another approach I find very fruitful. I enjoy following the prediction markets and their National Championship expectations. One example of this is at http://www.tradesports.com. If we average the bid & the ask prices, we can get an assumed percentage chance of winning.
As of writing this, the tradesports.com prediction market has Kansas trading at around 13, which indicates the market believes they have a 13% chance of winning the tournament.
The prediction market can also answer one other question. Who’s tournament chances improved given their seeding? For this question, I look at the Change column, indicating the price swing for today (Selection Sunday). The biggest winners & losers are as follows:
Team Last Trade Change
North Carolina 16.0 +1.6
UCLA 14.0 +1.5
Kansas 13.0 +1.2
Pittsburgh 1.7 +0.8
Georgetown 4.5 -0.8
Tennessee 4.9 -1.7
Duke 5.6 -2.0
Given the Sagarin, Pomeroy, and prediction market information, you have three powerful tools to approach your bracket game decisions. Enjoy, and keep your eyes on the prediction markets and this comment thread for updates throughout the tournament.
Update: With the results from a Monte Carlo analysis, we can construct the model bracket for each statistical measure. These brackets will advance the team that appeared most in each game situation. The numbers by the school name indicate number of advancements out of 10,000 iterations. (Numbers may differ from table as these statistics have been updated for Sunday’s games)
NCAA Brackets based on KenPom Simulation
NCAA Brackets based on Sagarin Simulation
– Erich Doerr
Notes:
Sagarin stats are as of March 16
Pomeroy stats are as of March 15
For simplicity I assumed Coppin State would lose in the play-in game
Additional Links
Monte Carlo
http://en.wikipedia.org/wiki/Monte_Carlo_method
Sean Foreman on Sagarin Stats and Monte Carlo Simulation
http://www.sju.edu/~sforman/research/talks/NCAA/
Sean Foreman’s current day job
http://www.baseball-reference.com/
Ken Pomeroy on Log5 and his statistics
http://www.basketballprospectus.com/article.php?articleid=202
Math behind Log5
http://www.diamond-mind.com/articles/playoff2002.htm
Ken Pomeroy’s Day Job
http://www.basketballprospectus.com/
Christopher
March 17, 2008
Nice read. How did you run the MC simulations? And did you run the whole tourney at once or round by round?
Animal
March 17, 2008
Excellent column! nice job.
Erich
March 17, 2008
Thanks for the complements. Each iteration of the Monte Carlo generated 63 random numbers and simulated the entire tournament based on those results.
The simulation was run in Excel, and I used a macro loop with “Calculate” to generate a new set of random numbers.
The Graphical Brackets show how many times each team advanced to that spot. Thus, summing opponents first round numbers will always equal 10,000 while the second round has 4 possible winners splitting the 10,000 iterations. Kansas’s easy bracket is a significant reason they are strongly favored in both simulations.
Matt Koidin
March 17, 2008
Erich –
Very cool analysis. Myself and some classmates from Stanford started a site a while back called TeamRankings.com. We just released our BracketBrains (www.bracketbrains.com) tool. It helps automate a lot of the analysis you’ve identified above. Certainly would be interested in getting your thoughts. I’d be happy to set you up with a free account so you can check out all the detail…
Enjoy the tournament and happy bracketing!
-Matt
Tommy_Grand
March 17, 2008
Thanks! Does your simulation incorporate the substantial homecourt advantage enjoyed by teams such as Texas (playing in Houston or San Antonio) or Stanford or UNC ? (Sorry if this was explained in a link I missed).
Erich
March 18, 2008
Matt,
Hmm, looks like a cool site. I’ll contact the support address for team rankings and share some notes.
Tommy_Grand,
I’m glad you enjoyed the analysis. Home team advantage was not factored in. For the Sagarin rating, a full home court advantage is worth 4 points, though I’d believe that may be mitigated by playing on a different court with a team crowd that may not even be the majority (Since the sites have 8 teams on day one and 4 on day 2.) It is a valid concern, however, and one that you may explore via Matt’s BracketBrains site.
I will try to run a simulation with a full home court advantage for the rounds and teams you listed and post the impact here.
Thanks again for your comments,
Erich
Rue Des Quatre Vents
March 18, 2008
Wow, Erich thanks for the insightful analysis. And Matt, wow, incredible webpage. Very useful.
Thanks!!
Erich
March 19, 2008
Tommy_Grand,
I gave full Home Court advantage to Texas (Rnds 3-6), UNC (Rnds 1-4), and Stanford (Rnds 1-2) and simulated the Sagarin bracket 10,000 times (since the adjustment for KenPom is not as easy). The results were as follows:
UNC’s Championship Odds increase from 16% to 22%
Texas’s Championship Odds increase from 1.4% to 6.5%
Stanford’s Championship Odds increase from 1.1% to 1.2%
Note that there are several reasons that make me feel giving full home court advantage is excessive. This is only for illustrative purposes.
Tommy_Grand
March 19, 2008
Erich,
Thank you very much! I am sure you’re right about the 4-point homecourt being excessive. Yet, I do think geography is a non-trivial factor. I appreciate your running the simulation.
Erich
March 20, 2008
“You’ve got to ask yourself one question: ‘Do I feel lucky?'” -Clint Eastwood
Over 25% of the tournament is complete. For my personal bracket, I used the KenPom model bracket included in this post. As of tonight, it is 16-0.
Prior to today’s games, the KenPom statistics would have assigned the following probabilities-
Odds of going 16-0
1.9075926%
For comparison purposes: odds of Belmont going 1-0
2.5867162%
From here on out, the chances are as follows:
Going 16-0 on Friday
1.2530207%
Going 32-0 through Sunday
0.1111549%
Going 47-0?
0.0000024276% (1 in 41 million)
The cumulative chance?
Going 63-0 from the bracket posted:
0.000000046308214%
That’s roughly 1 in 2.1 Billion. With a B.
So, whether Clint Eastwood is talking about how many bullets you fired or the perfect bracket, the correct answer to ‘Do I feel lucky?’ is “No”.
JTapp
March 21, 2008
Using the KenPom #’s, I calculate that the odds of Villanova, Siena, and San Diego all winning their games in a single tournament is about 0.6% (by multiplying the probabilities of their winning each game together). I doubt in reality the event is that rare.
Erich
March 22, 2008
JTapp,
Your approach sounds correct, though my calculation puts it at 0.9969771% (refer to math behind log5 & KenPom pythag win % as of 3/17). While not significantly bigger than your assessment, I do not understand your disbelief.
Is it because you are skeptical of log5 or Pomeroy’s win percentages?
If it is skepticism, you could always try your hand in the betting markets. Most lines are set close to what Ken Pomeroy or Sagarin ratings suggest. I believe sites like tradesports even offer free “play money” versions where you can create your own account and test for an efficient betting market & lines.
Overall, though, I’d be hesitant to scrutinize cherry-picked results and more likely lean toward the mountain of data behind the stats when assessing such an event occurs in around a 1 in 100 trials.
Apparently, Sienna+Villanova+San Diego would have given Dirty Harry an unwelcome surprise.
JTapp
March 22, 2008
I know it’s cherry-picked, and my doubt is purely normative. Have you ever read the book Fooled by Randomness?
To calculate the odds I simply looked at the Monte Carlo results and saw what % of games each of those teams won and multiplied the percentages together to get the probability of all 3 winning in the same tourny. I have a friend who has only missed 2 games so far (UConn and Arizona).
I keep trying to explain to him that the odds of doing that are ridiculously low but he showed me that the most he’s missed in the past 4 years were 3 games in the first round. Either he’s a great handicapper (he is betting on them), or he’s really lucky.
Erich
March 22, 2008
A similar perfect bracket analysis: Another analysis
Note that the 1 in ~3 billion scenario is the “Most Likely” perfect bracket by this approach. Picking underdogs pushes the odds out further. Much further.
This link also looks at the chances if you took something like a Sienna/Villanova/San Diego parlay, which pushes the perfect bracket odds into the trillions.
JTapp
March 22, 2008
And yes, I do doubt the numbers. The Pythag numbers you’re using are Pomeroy’s Adjusted numbers. I find it more reliable to use the raw efficiency numbers, especially because it’s not clear what Pomeroy’s formula is for adjusting the data (or Sagarin’s formulas for that matter). I don’t believe Kansas (or any team) would win 99.15% of its games against an “average” team (which is what the Pythag using Adjusted numbers is essentially saying). The numbers make the upset appear more unlikely than it should.
Besides, there are so many underlying things that aren’t taken into account — games played with currently injured players, games played with guys that are healthy now but were injured then. How do you truly compare a San Diego and a UConn? In-game injuries and other human factors make any forecast extremely unreliable.
Erich
March 22, 2008
I own the book, but haven’t read it yet (I enjoyed the Black Swan and so I bought Fooled by Randomness).
Re: your friend
Don’t neglect a third possibility- your friend may be lying. If he has been that successful, that is wonderful and I expect he’d bet larger and larger amounts over subsequent years, with his increasing bets having expanding power to move the betting markets towards more accurate lines.
Financial markets are generally efficient due to active traders pushing them that way. Sports betting markets do not currently have the same level of efficiency, which entices exploiters to enter, which in the long run creates greater market efficiency.
JTapp
March 22, 2008
I also like to look at TBeck’s Spread tracker that posts every “computer” predicted spread (Sagarin, Greenfield, etc.) and averages them. Any score more than 2 standard deviations beyond the mean I consider to be “rare.” But, during the season I notice that there are too many “rare” events. The distribution simply isn’t normal. Looks that way in the tournament, too. I don’t have any good explanations for these phenomena.
Tommy_Grand
March 23, 2008
“I have a friend who has only missed 2 games so far (UConn and Arizona). ”
How many of the sweet 16 did he (or she) predict correctly?
Erich
March 23, 2008
Updated simulation
Seed Team Sagarin KenPom
1M Kansas 36.1% 36.0%
1S Memphis 12.3% 10.5%
1W UCLA 15.9% 23.3%
3M Wisconsin 5.6% 10.7%
1E North Carolina 16.2% 3.8%
2S Texas 1.4% 2.9%
3E Louisville 1.9% 2.4%
4E Washington State 1.2% 3.9%
3S Stanford 2.0% 2.5%
2E Tennessee 1.2% 1.2%
5S Michigan State 0.9% 0.9%
3W Xavier 1.3% 1.0%
7W West Virginia 3.7% 0.8%
10M Davidson 0.3% 0.0%
12M Villanova 0.0% 0.1%
12W Western Kentucky 0.0% 0.0%
Stats are through Saturday games. 10,000 iterations
JTapp
March 24, 2008
My friends picks all went bust in the next round. His prior success was purely random.
I’ve been looking at TBeck’s posting and averaging of predicted point spreads (Sagarin, et. al and looking at Tradesports as well). The actual point spreads of the games the past 2 days have been several standard deviations from the mean prediction. “Rare” events. Random and unpredictable.
Even the above posted results from the Monte Carlo simulation show that the odds of this tournament unfolding as it has (like all others before it) is approaching astronomical numbers.
What do we call it when an astronomically improbable event happens? Randomness.
I’ve done more detailed regression analysis on previous tournament outcomes than probably anyone in America. You can fit the curve to explain/predict previous tournaments, but it’s all pretty useless in predicting future ones. The upsets are more explainable by randomness than by any actual data.
The data helps identify which teams “should” be the strongest, but there are so many human and unobservable/unquantifiable factors involved in a basketball game that even the best analysis or picking what the Pythag data says “should” be the right outcome is also fruitless.
Enjoy the randomness of the tournament.
Erich
April 7, 2008
I certainly hope you played that KenPom bracket!
Keep your eyes here in May for an assessment on how these kids shape up for the next level in the NBA draft preview, and be sure to drop on by next year for another NCAA tourney preview.