Earlier, I posted my current model for predicting the NCAA tournament. Since the whole thing is probabilistic, I figured that I would test it out against the current NCAA standings. I considered four models:
- The one that I described
- A random selection of which team would win (50/50 chance)
- Always picking the top seeded team
- A model suggested by a colleague at work
For each model, I ran 10,000 tests and compared them to the current NCAA tournament results, counting the scores for each test. Results are:
The X axis is the score (0-64 at this point), the Y axis is the number of test runs (out of 10k) that achieved that score. The number in the legend is the expected value (score) for each model. As you can see, my model had the [second] highest expected value. Choosing the top seeded team was the worst (guaranteed 10 points) best [see the 2nd update]. Choosing randomly was better than selecting the top seed the worst [see update] and my colleague’s model (cyan) was between my model and the random model. Not bad. I’ll update after the next two rounds of the tournament.
Update: one interesting thing is that this suggests that there was still a lot of luck in my ESPN pick. Only about 0.5% of my model runs were as good as that one.
Update 2: So, I’m lying in bed when it occurs to me that I’m an idiot… the team with the *lowest* seed wins a game in Model 3. This is why I say I don’t really know basketball.