Study: Early warnings of COVID-19 outbreak identified from Twitter posts

Increasing concerns about pneumonia cases were identified on posts published on Twitter in seven countries between the end of 2019 and the beginning of 2020.


Even before public announcements of the first cases of COVID-19 in Europe were made, at the end of January 2020 – signals that something strange was happening were already circulating on social media.

According to a study at IMT School for Advanced Studies Lucca, published in Scientific Reports, increasing concerns about pneumonia cases were identified on posts published on Twitter in seven countries — between the end of 2019 and the beginning of 2020. The analysis of the posts shows that the “whistleblowing” came precisely from the geographical regions where the primary outbreaks later developed.

How the study was conducted?

For the purpose of the study, authors first created a unique database with all the messages posted on Twitter containing the keyword “pneumonia” in the seven most spoken languages of the European Union — English, German, French, Italian, Spanish, Polish, and Dutch — from December 2014 until 1 March 2020. The word “pneumonia” was chosen because the disease is the most severe condition induced by the SARS-CoV-2, and also because the 2020 flu season was milder than the previous ones — so there was no reason to think it to be responsible for all the mentions and worries.

Then, researchers made a number of adjustments and corrections to the posts in the database to avoid overestimating the number of tweets mentioning pneumonia between December 2019 and January 2020. Specifically, all tweets and retweets containing links to news about the emerging virus were eliminated from the database to exclude from the count the mass media coverage of the emerging pandemic.

The results

The analysis of the authors shows an increase in tweets mentioning the keyword “pneumonia” in most of the European countries included in the study as early as January 2020.

In Italy, for example, where the first lock-down measures were introduced on February 22, 2020 – the increase rate in mentions of pneumonia during the first few weeks of 2020 differs substantially from the rate observed in the same weeks in 2019. That is to say that potentially hidden infection hotspots were identified several weeks before the announcement of the first local source of a COVID-19 infection (February 20, Codogno, Italy). France exhibited a similar pattern, whereas Spain, Poland, and the UK witnessed a delay of 2 weeks.

The authors also geo-localized over 13,000 pneumonia-related tweets in this same period, and discovered that they came exactly from the regions where the first cases of infections were later reported — such as the Lombardia region in Italy, Madrid, Spain, and Île de France.

Researchers then repeated the process with the keyword “dry cough”, which is one of the other symptoms later associated with the COVID-19 syndrome. Even then, they observed the same pattern – an abnormal and statistically significant increase in the number of mentions of the word during the weeks leading up to the surge of infections in February 2020.

On the record

“Our study adds on to the existing evidence that social media can be a useful tool of epidemiological surveillance. They can help intercept the first signs of a new disease, before it proliferates undetected, and also track its spread.” said Massimo Riccaboni, Full Professor of Economics, IMT School.


It is suggested that in a successive phase of the pandemic, monitoring social media could help public health authorities mitigate the risks of contagion resurgence. For example, they could adopt stricter measures of social distancing where the infections appear to be increasing, or vice versa relaxing them in other regions. These tools could also pave the way to an integrated epidemiological surveillance system globally managed by international health organizations.