Daily COVID-19 data and Benford’s law
For which countries daily coronavirus disease (COVID-19) data satisfy Benford’s law.
Abstract
This short article aims to highlight when Benford’s law is not satisfied for the data provided by each country in relation to the new confirmed cases of coronavirus disease. The Benford’s law can be used if you want to test whether a set of numbers may be artificial or manipulated, so the implications could be that there are some problems in the tracking of the new COVID-19 cases or the data provided are manipulated.
Introduction
In order to verify if the Benford’s law is satisfied for the daily data provided by each country for the new COVID-19 cases, I started from the dataset which can be found on ourworldindata.org at this link. This dataset contains the new confirmed cases data for each country, at the time of writing this article there is data updated to 2020/10/22.
The idea is to detect anomaly in the countries data, for each country was tested if the distribution data provided differs significantly from the Benford’s distribution. Basically a distribution satisfy the Benford’s law if the leading digit d (d ∈ {1, …, 9}) occurs with the following probability:
Methods
To perform this analysis quickly a Python script was created, it uses the package benfordslaw which test if an empirical (observed) distribution differs significantly from a theoretical (expected, Benford) distribution. Furthermore this package allow the possibility to plot results.
To use the following script Python3 and benfordslaw are a prerequisites
This script can be used running the following command (the script can be used providing it others CSV data e.g. new death)
python3 covid_benford.py <URI_OR_CSV_FILE_PATH> <ALPHA_VALUE>
The script accept two arguments, it need the URI or path of the CSV file to use and the alpha value which is used to detect only statistical significant results, for example:
python3 covid_benford.py https://covid.ourworldindata.org/data/ecdc/new_cases.csv 0.005
By default the package benfordslaw use alpha = 0.05.
Results
For alpha value equals to 0.005 there are 79 of 211 countries for which anomaly was detected. Below are the graphs that compares the Benfords distribution with the provided data of Tajikistan and Switzerland, they are the countries with lowest and highest value respectively. As showed below the Tajikistan’s data not satisfy the Benford’s law, so could be some problems in the tracking of the new COVID-19 cases or the data provided are manipulated.
Here can be found the file which containing all results
Conclusion
This simple analysis highlighted about the 37% of countries provided a set of data, related to the new confirmed COVID-19 cases, that not satisfy the Benford’s law.