Since 2017, I have been maintaining a model that forecasts the DCI scores and results for Finals Week. Earlier in the 2019 season, I also posted a version of the model looking at historical seasons, stretching back to 1995. While the core of the model has been pretty constant through the years, there are always adjustments on the margins. Therefore, it makes sense to go over how the model works this year.
Overall, the model uses four steps:
Each corps has three skill curves, one for each caption. That's General Effect, Music, and Visual. This is based on fitting a curve of the form y = a + xb to their data, where
x is the day of the competitive season and
y is the actual score. Typically,
a is close to a corps' first score of the season, and
b is somewhere between 0.5 and 1. That means the curves tend to start linear but level off as the season goes on. As an example, let's look at early-season Santa Clara Vanguard's skill curves:
This is a screengrab from the "Corps Summary" tab on July 3, 2019. In the plot above, the points are the scores of the shows themselves, and the lines are the skill curves fit to them. But the line doesn't pass perfectly through each point, because there's always some uncertainty in curve fitting. This is what the dashed lines represent - they are the 95% confidence interval for each caption. The wider they are, the more uncertainty there is in the skill curves. Because this is very early season, SCV's curves are pretty uncertain.
The curve fitting algorithm weights scores differently, for two reasons. The first is that early-season scores don't tend to be as predictive of Finals Week as those later in the season. To capture this, the "base weight" for scores increases through time. All things being equal, the model weighs scores from July 29 more heavily than July 1. The second thing the model does is underweight recent shows. This is because we want a model that is somewhat skeptical when a corps gets a suddenly high or low score. Historically, unexpected results like that tend to be outliers. Before the model overreacts, it waits to see a corps sustain their success over several shows. This is especially important in the Open Class predictions because their scores tend to be more volatile. Through time, the show's weights look somewhat like pyramid - the model starts with a low weight for the early-season scores which increase through time, but more recent shows are discounted as well.
There are two conditions which can cause a corps not to have skill curves. The first is just that the curve fitting didn't work - this will be pretty common early in the season when the model doesn't have much data to work with. The second is that corps can have too little data in the first place - the model excludes all corps with fewer than 7 shows. In either case, the model will proceed as though the corps doesn't exist at all. Generally speaking, this isn't a problem for World Class by mid-July, but Open Class can take a bit longer. Open Class predictions won't be live until at least 5 corps are in the model.
The model uses the skill curves to predict how good each corps will be during Finals Week. Because it also tracks the uncertainty in the curve fitting, it can do this forecast assuming there's measurement error. The more uncertain the curve fitting, the more the model hedges its prediction by using a wide score distribution. Using the distribution of
b coefficients for each corps, the model predicts Finals Week 10,000 times, assuming each time is independent. The end result is that each corps has 10,000 skill-based score predictions.
The "random" part of the model is based on corps rank and the natural variability in DCI scores. To understand why the model needs to rank corps, it's important to understand how it defines "natural variability".
Natural variability is the variation in how judges assign scores from show to show, as the show score rarely matches the score predicted by the skill curves. This variability does not mean judges are biased or political. In fact, the model doesn't consider individual judges in the forecast at all, and there is no evidence in the historical data of judges being collectively biased against individual corps. Rather, this variability is just based on the fact that sometimes judges score a little low or a little high. They're pretty good on average.
Unlike the uncertainty in the skill-based forecast, natural variability is not independent from corps to corps. Historically, judge error tends to be consistent from corps to corps at any given show. If the judges score Bluecoats higher than expected, there's a good chance they will do the same for Santa Clara Vanguard at the same show. This correlation is pretty strong, and stronger for corps that perform back-to-back than those that perform farther apart.
Because the correlated error depends on performance order, the model needs to guess the Finals Week performing order so that it can create the correct correlation matrices. It ranks them as they are now and assumes they'll slot the same during Finals Week. Using the most recent scores gives an unfair advantage to corps who have performed more recently, so the model uses the skill curves instead. This also removes individual show variability in scores from the rankings.
In 2017 and 2018, the model did these rankings on a per-caption basis. But performance order is determined by total score, so the 2019 model uses total score to rank the corps. Based on the rankings, the model once again predicts Finals Week 10,000 times, this time based on the gaps between each corps and their correlated random error. It's important to note that, while the rankings are based on total score, the forecast itself still takes place for individual captions.
Typically, the model assumes the mean error is 0 in these simulations, because the judges aren't biased against any corps. But there can be an exception. Later in the season, Open Class splits off into its own tour for a couple weeks before they join back up with World Class for Prelims. In this time, Open Class scores tend to get inflated, and then they drop back down on Prelims night. Historically, the average drop from Open Class finals to Prelims, just two days later, has been 2 points. Therefore, when the model is predicting the Indianapolis shows based on late-season data, it assumes the mean error for Open Class corps is -2, not 0, but keeps everything else the same.
As this point, each corps has 10,000 skill-based simulations of Finals Week and 10,000 simulations based on natural variability and rank. All the model does is combine them, by taking a weighted average of the scores. The average is weighted based on the historical magnitude of overall variance in Finals Week scores (somewhere between 2 and 2.5 points) and the percent of this variance that comes from skill versus natural variability. Overall, the natural variability is weighted about twice as much as the skill-based simulations.
In this forecasting and averaging process, the interpretability of the raw score predictions breaks down. The winning corps' score can vary from less than 90 to 100 points as the season progresses, but we know that's unrealistic. What the model maintains, though, it the gaps between each corps. This is why the predictions the model makes are never greater than 0 - it assigns 0 to the winner and calculates the gaps for everyone else.
In order to convert the 10,000 average simulations into percentages, the model just counts. For example, if Santa Clara Vanguard wins in 6,000 of the 10,000 simulations, the model gives them a 60% of winning. The model tracks odds of getting each medal, making finals, and making semifinals.
That's really about it. Do you have any other questions? Does it seem like something should be in this article that isn't? Do you have a problem with my methodology? Reach out! You can track me down on reddit as u/Overthink_DCI_Scores, on Github as EMurray16, and via email at firstname.lastname@example.org.