tracking the spread of COVID-19

Due to the rapidly expanding outbreak of coronavirus disease 2019 (COVID-19), researchers have looked to news and social media networks to gather epidemiological data.

Epidemiology is the study of the cause, distribution, and patterns of health and disease conditions in defined populations. Studying epidemiological data in real-time can help to increase situational awareness and inform intervention strategy. Researchers from the US National Institutes of Health searched, a health orientated social network that streams news from Chinese health agencies, to track and analyse the COVID-19 outbreak.

Looking at cases reported between 13th and 31st January 2020, researchers collected data for 507 patients with COVID-19. 55% of the patients were male, and the average age was forty-six years. Only 3% of patients were under the age of fifteen. The majority of patients came from mainland China with 143 patients from outside of China. Analysis of the data studied outbreak progression, assessed delays between symptom onset and seeking healthcare, and reporting of cases.

Published in The Lancet Digital Health, data analysis showed in mainland China reporting delays decreased from five days to two days after 18th January 2020. This date is significant as media attention and outbreak awareness became more pronounced at this time. Delays between symptom onset and seeking care were longer in the Hubai province.

The age distribution of patients skewed toward an older age group with only 3% of patients presenting under the age of fifteen. There are multiple schools of thought as to why this may be. The pattern identified could indicate a difference in susceptibility between age groups. However, the majority of the patients were travellers, which are most commonly adults. The data captured is from the health system, therefore, biased towards more severe reported cases. Severe cases of COVID-19 is associated with chronic conditions more frequently found in adults. The data suggests that, if the epidemic continues to increase globally, there would be increased respiratory mortality in those aged over thirty. This presents a different picture to that seen in the 2009 influenza outbreak, where people under the age of sixty-five were most at risk.

Although data collection was from a small sample of patients, results aligned with official reports from the Chinese authorities. The novel approach of sourcing data from social media and news reports provided robust epidemiological information. Researchers stress that despite being identified as a useful data source, it should not replace official statistics. The compilation of data from social media and news sources is time-consuming, therefore, not sustainable once cases reach the thousands. However, in emerging outbreaks, social media and news sources are identified as a useful tool to monitor the situation when little other data is available.

Researchers identify that this is an early report on a continually evolving situation. They plan to continue tracking the COVID-19 outbreak with data from social media and news sources.


Written by Helen Massy, BSc.



Sun, K., Chen, J. and Viboud, C. (2020). Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: a population-level observational study. The Lancet Digital Health.

Image by Pexels from Pixabay

Facebook Comments