Tracking COVID-19 in Chicago: Release Notes

2020-10-19

Tracking COVID-19 in Chicago: Release Notes

Chicago Test Out is a project that use the same datasets as the city data Daily Dashboard but features more detailed line charts, mobile compatibility, and much faster load times. This week, the project is undergoing several major changes, and I want to take the opportunity to explain these changes for users.

Problems with the original testing tab

When I originally wrote Chicago Test Out, the city was using the COVID-19 Daily Testing - By Person dataset, which was at the time just called COVID-19 Daily Testing. This dataset counts PCR tests in the city among unique individuals. A person will show up in the dataset once for every time the person tests negative until that person tests positive. That positive test will appear in the dataset once, and then all subsequent tests of that individual will be filtered from the dataset going forward. This is the dataset that runs the "Testing" tab in Chicago Test Out.

The city added the By Test dataset

However, later, the city added a second, similarly-structured dataset called COVID-19 Daily Testing - By Test and added the "By Person" phrase to the original dataset. The newer By Test dataset includes almost every PCR test, even if the person had previously tested positive. According to the city, this conforms to the way that other jurisdictions are measuring case volumes and the test positivity rate, so the 7-day rolling average of the By Test dataset is the official metric used by the city's daily dashboard.

(Important side note: I have precisely zero qualifications in epidemiology and medicine. I have summarized my understanding here because I believe that is useful for users of the site, but I strongly urge the reader to follow the above links to the official sources and let the city speak for itself.)

The raw data chart becomes less useful

The other main problem is the line chart that I initially designed in the spring has outlived its usefulness. My original chart charted the raw (By Person) positive and total PCR tests. This allowed the viewer to pretty easily grok both the rough change in percent positive over time (by looking at the space between those lines grow or shrink) as well as the absolute level of cases.

The problem is that, fortunately, we're now doing enough tests that the total tests dwarfs the positive tests. The two lines are now, quite consistently, very far apart, making the change in the ratio between them difficult to discern. It also makes the absolute change in cases difficult to read. For example, if new cases double from 200/day to 400/day, the chart will still mostly just show a very flat line pretty close to zero because the y-axis is so distorted by the high test volume.

This is an excellent problem for Chicago to have! But it does require some revision in the way I display these charts if they're supposed to be useful at a high test volume.

The new "Testing v2" tab

Because of these issues, I've made a few judgement calls and have decided to completely rewrite the testing tab with new datasets and newly organized charts.

Switching to the By Test dataset

The decision to switch to the By Test dataset was an easy one. It's the dataset that drives the official decisions by the city and, according to the city, it's the dataset that's most comparable to other testing datasets around the country. To me, it appears that the old By Person dataset is mainly maintained for backwards compatibility purposes (for which I have been grateful so far).

Because they both have positive tests, total tests, and a positive percentage, I'm going to leave the old By Person tab up for a time so that readers will understand the difference.

Dividing up the charts

Under this By Test dataset though, I would still face the same charting challenges if I tried to move the old line chart as-is, so I'm taking the opportunity to redo the charting as well.

The new By Test tab will still try to show both the positive percent and the volume of new cases, but will do so on different charts. The first chart will be a line chart of the percent itself, over time, and the second will show just cases.

However, I'm taking the opportunity to make the cases chart just a little bit more useful as well. Instead of showing the raw number of cases, I'm going to move to the case rate per 100,000 Chicagoans. This is a more useful metric because the case rate per capita can be used to compare the rate of infections across other populations. For example, Chicago's travel quarantine order applies to states based on state with a case rate over 15/100k, and the COVID Tracking Project from The Atlantic can show data per million (which is an easy conversion).

Unfortunately, I think I am bound here to show the seven day rolling average case rate because I can't find a case rate per 100k by the day. I don't love having one metric on a 7 day rolling average and not others on a 7 day rolling average, but I hope that the way I have labeled the data will suffice so that it won't be too confusing.

Even more changes to come

My next move will be to better explain the data with a quick sentence and a link to the city data portal dataset from which each chart draws its data because readers can't, and most certainly shouldn't, take pandemic information as gospel from some rando on the Internet like myself. I'm also planning on linking to the source code (available on GitLab either on the footer or on a new "About" tab.

After the data have been sufficiently contextualized by official sources, I'm going to go ahead and delete the old By Person Testing v1 tab. This will have the added benefit of a faster load time since it the app will have one fewer dataset to fetch.

Then the next priority will be an important stability enhancement--namely, a proper JSON parser for the Hospitalizations tab, the code for which is a horrible, hacky mess that fails if even one single number from the city is improperly formatted. (Don't read that code. It will burn your eyes.) This will be an important stability improvement as well as a good opportunity to write a new post on JSON parsing. If you can't tell by the content of this site so far, I absolutely love writing parsers.

I write to learn, so I welcome your constructive criticism. Report issues on GitLab.

← Home