“I spent a few hours correlating the census tract data with the ‘neighborhood’ COVID map”

interactive map also embedded below
Many thanks to Molly for sharing a view on the district’s neighborhood coronavirus map:
“I continue to provide more detail and explanation on my DC COVID map which is weighted by population. There had been some talk about wanting to see a version of the map that controlled population in the comments on your daily data posts this week, and I was really curious so I spent a few hours correlate census tract data with the “neighborhood” COVID map which the city began publishing earlier this week. I’d say I’m semi-professional when it comes to stuff like that – as in, I’m professionally trained but that’s not what I usually get paid for these days. So I would call it a citizen data science effort!
The city has been reporting coronavirus data through Ward throughout the crisis. They recently added a report for positive cases by “neighborhood”. City wards are made up of groupings of DC Census Tracts (Standard Geographic Areas) and therefore may not correspond to what we generally consider to be ward boundaries. If you want to take a closer look at the city’s neighborhood classifications, zoom in on this map pdf.
The city reports the total number of positive cases by neighborhood in its daily reports. However, some neighborhoods have a much higher population than others. If 200 people are sick, it’s important to know if it’s 200 out of 2,000 versus 200 out of 20,000. To give that perspective, I’ve created a graphic map that can show us the rate of cases in each neighborhood, with an interactive map here:
This way isn’t necessarily “better” than the way the city shows it, but I personally find it useful to understand the extent of the outbreak relative to population.
How I did: The 2018 American Community Survey (ACS) provides census tract level data for population and certain demographic characteristics such as age and income. Using the data available on Open Data DC, I matched all census tracts to COVID neighborhoods in the city to determine the population of each “neighborhood”, which then allowed the calculation of the case rate by neighborhood. I report this statistic as positive cases per thousand inhabitants.
Why is this important: Let’s take an example: Tenleytown (neighbourhood N44) and Shepherd Park (N40) had a similar number of positives in the May 9 data, at 112 and 114 respectively. As such, they are the same color on the city map. But Tenleytown’s population (18,099) is more than double that of Shepherd Park (8,696). So proportionally, someone in Shepherd Park (13.1 cases per thousand) is twice as likely to have been ill as someone in Tenleytown (6.2 cases per thousand). There are a number of “neighborhoods” for which this is the case – similar total case counts but different populations, which means different rate of illness.
One of my first reactions when I crunched the numbers and saw the rate card – and I bet I’m not the only one: “What’s going on in ‘Stadium Armory’?! ” The map reveals that the case rate at the Stadium Armory (65.2 cases per thousand on May 9) is around triple that of the next hardest hit neighborhood, and more than seven times the median rate for the district (8.9 cases per thousand). On the city map, Stadium Armory doesn’t particularly stand out, as its total case count for May 9 (173) was not an exception among neighborhoods. The two versions of the map tell very different stories about what is happening in this neighborhood.
The DC prison is located in the stadium armory, which I think is the reason for the high rate of positive cases. But my map doesn’t tell you Why the case rate is so much higher there than anywhere else. This could (as in any neighborhood) reflect the level of screening as much as the level of illness, or it could be a number of other factors.
Analyzing the data by population made this outlier and other facts visible. Now we are able to ask questions about the “why” that we might not have realized otherwise that we need to ask. We also have another level of information that can help assess individual and collective risk. Good data visualization informs decision-making, and I hope my efforts here can contribute to the COVID conversation in DC.
I created a spreadsheet which I hope to keep updated regularly with new data and graphs. I am open to any comments or collaboration with other dataviz people. Everyone is welcome to download a copy of the spreadsheet and do their own work with it. I’m no magician and I’m sure others can do more advanced things.
Thank you essential workers! Everyone else, stay home! I live in a basement and I want to see the light again one day.