Due to COVID-19, the 2020 season has been cancelled. This sucks because, quite frankly, I don’t know what to do with my hands or summer evenings. Normally I would be working on my DCI forecast model right now, except there’s nothing to forecast and there will be no scores to collect. But that go me thinking: what if I used my model to simulate a DCI season, rather than just forecast one?
The DCI model I’ve been running with for the last few years is optimized to specifically predict what happens in Finals Week. In fact, it was written to predict only 5 shows: Open Class Prelims, Open Class Finals, World Class Prelims, DCI Semifinals, and DCI Finals. But there’s no reason, in theory, that it can’t be used to predict other shows1. In fact, I left the option of forecasting any show open when I originally wrote the code2.
In theory, this means we can put the pieces together to simulate each would-be show each day of 2020 season, following a process that looks something like this:
The process is simple, but each step requires some work - be it changes to my model or some data wrangling.
If we’re going to simulate each day of this DCI season, we need to have the full 2020 schedule. DCIs schedule page redirects to their announcement that the season has been canceled, but I was able to go through each corps’ DCI schedule using pages like this one and piece together the whole schedule. You can find the resulting schedule on the simulation branch of my DCI model, as a csv file3.
The code I use to fit skill curves doesn’t need any changing - the trick with this step is coming up with the pre-existing data. Things get interesting here because we don’t have any data from 2020. But this also presents an interesting opportunity, because we are forced to use past data instead. For example: who may have won if Cavaliers 2006 had competed against Blue Devils 2009?
Because the DCI model assumes data is relatively noisy, we don’t have to necessarily worry about the trouble with comparing scores between seasons - the fact that a 96.25 in one season would actually be a 95.75 in another tends to wash out over the course of a full simulation 4.
Therefore, the pre-existing data that we can use is any caption-specific data we have for any show over the course of a full season. Using a full season’s worth of data to fit the original curves gives us the advantage of sample size. Because the sample is large, the curve-fitting will be robust to outlier scores or early season bias 5. Any two shows can compete against each other as long as we have enough caption-specific data to fit the skill curves. This robustness also means we can also get away with some sparseness in the data - if we don’t have every score from a season, that’s okay. We just need enough to fit a realistic skill curve.
This part is relatively straightforward. Using past data to fit the original skill curves, we just have to simulate each show on a given day. But there are some code changes to make.
First, the forecasting model simulates each show 10,000 times, to get the odds of each corps winning, coming in second place, and so on. However, in this simulation, we only want to get results from 1 of these 10,000 runs. Rather than aggregating and averaging the scores, the new code picks one simulation and says that’s the one that happened. If there are two shows on the same day, the model will pick a different simulation for each one to make make sure they’re independent like they would be in real life.
We also have to change the model output. Because we need to use the caption scores to fit skill curves later on, the model needs to output all the scores by caption. The original forecasting model only produced total scores. This is an easy change that I thought about implementing in the 2019 forecast, and it may be one that I use for forecasts in future years.
Whenever a show takes place in our simulation, we will replace the real-world historical score with the fictional simulated one. Doing this makes sure that we largely preserve each corps’ general pace of improvement. By the time we get to the end of the 2020 season, though, the results are entirely simulated. This should strike a reasonable balance between being true to history while having enough uncertainty to make things interesting.
The noisy simulations make it possible for weird things to happen. For example, I did 500 simulations of the 2019 season, and The Cavaliers ended up winning five of them, despite getting 5th place last year. This happens because replacing the real-world scores with simulated ones creates a positive feedback loop. If a corps gets a simulated score that is abnormally high, the model will apply that as though they are getting better really fast, rather than treating it like an outlier. This doesn’t mean the simulations are inaccurate (Blue Devils won more simulations than anyone else while the Bluecoats were a close second), but simply that the random variation is larger than we would see in real life.
In the case of a simulating a season, I think this is a feature 6, because it means that things are more uncertain. When we’re trying to forecast a real-world event like the 2019 DCI Finals, we want the correct amount of noise so the model reflects the real world. But in a simulation, I think we want the opposite because more variation means more chaos! The corps that has the highest scores in the historical data is not necessarily likely to win.
More uncertainty = more fun!
Now we can put all the pieces together. Our simulated season will work like this:
We start with a bunch of historical data for each corps, from past seasons. This can be any season and any corps, as long as there is enough caption-specific data from that season’s publicly available recaps to fit a skill curve.
Now we start to walk through each day of the 2020 season, simulating each show. Each day, any historical data up to that point is removed and replaced with the simulated data. Rather than predicting the total score, I’ve changed the model to output results by caption.
Once the shows have been simulated and the scores added, the model will re-fit the skill curves based on the new data. To make sure the simulated data impacts the curve fitting, we will weight the simulated twice as much as the historical scores.
Once this is done, we move on to the next day and repeat the process.
In all, the concept is pretty simple. You can see the code on Github if you’d like, and stay tuned to see how you can be involved! Specifically, we need to choose which shows will compete against each other, and for this I plan on turning to the community!
I hope you are enjoying Drum Corona International! People who do open-source projects like this often ask for donations to help cover expenses (like server costs) but luckily I’ve got that covered. Instead, if you’re enjoying any of my projects, please consider donating to the Michigan Drum Corps Scholarship Fund.
The Michigan Drum Corps Scholarship Fund is a 501(c)3 nonprofit that I cofounded to help support members of the Michigan band community who are interested in DCI. The organization offers scholarships to Michigan students marching or auditioning with any DCI drum corps. The economic implications of COVID-19 are broad, and I think scholarships programs like this are especially important for keeping drum corps accessible to everyone. Every dollar helps!
In practice, things are a little more complicated. The parameters that I specify in the model are optimized to predict the Finals Week shows, and it’s likely that I sacrificed the model’s ability to predict other shows in the process. This is basically an example of the no free lunch theorem.↩︎
This is a bit of a lie. Technically, the model needs enough data to be able to start guessing how good corps are and how fast they’ll improve, which normally takes 6-10 scores. So my model can’t forecast shows very early in the season, like the DCI Premier at Ford Field or Drum Corps At the Rose Bowl.↩︎
I’m fairly certain that it’s accurate, but I also acknowledge that I probably wouldn’t have noticed if I’m missing something here or there. If you look at the schedule and something looks amiss, please let me know!↩︎
I did some runs to verify this, and it is mostly true. The effect is still exists a little bit, but it is much smaller.↩︎
I’m referring here to the famous debate as to whether East or West coast scores are inflated early in the season. For what it’s worth, the East coast scores have been inflated the last two seasons relative to the West coast.↩︎
I called this same tendency a bug in early versions of my forecast model. It would overreact to randomly high scores that a corps got - typically on a weekday. The model assumed their score at the Saturday regional would reflect the outlier , but judges normally brought the corps back down to earth.↩︎