Daily COVID-19 data and Benford’s law

For which countries daily coronavirus disease (COVID-19) data satisfy Benford’s law.

Lorenzo D'Isidoro
3 min readOct 22, 2020

Abstract

This short article aims to highlight when Benford’s law is not satisfied for the data provided by each country in relation to the new confirmed cases of coronavirus disease. The Benford’s law can be used if you want to test whether a set of numbers may be artificial or manipulated, so the implications could be that there are some problems in the tracking of the new COVID-19 cases or the data provided are manipulated.

Introduction

In order to verify if the Benford’s law is satisfied for the daily data provided by each country for the new COVID-19 cases, I started from the dataset which can be found on ourworldindata.org at this link. This dataset contains the new confirmed cases data for each country, at the time of writing this article there is data updated to 2020/10/22.

New COVID-19 cases in the last two month for each country ourworldindata.org

The idea is to detect anomaly in the countries data, for each country was tested if the distribution data provided differs significantly from the Benford’s distribution. Basically a distribution satisfy the Benford’s law if the leading digit d (d ∈ {1, …, 9}) occurs with the following probability:

The probability function source en.wikipedia.org

Methods

To perform this analysis quickly a Python script was created, it uses the package benfordslaw which test if an empirical (observed) distribution differs significantly from a theoretical (expected, Benford) distribution. Furthermore this package allow the possibility to plot results.

To use the following script Python3 and benfordslaw are a prerequisites

This script can be used running the following command (the script can be used providing it others CSV data e.g. new death)

python3 covid_benford.py <URI_OR_CSV_FILE_PATH> <ALPHA_VALUE>

The script accept two arguments, it need the URI or path of the CSV file to use and the alpha value which is used to detect only statistical significant results, for example:

python3 covid_benford.py https://covid.ourworldindata.org/data/ecdc/new_cases.csv 0.005

By default the package benfordslaw use alpha = 0.05.

Results

For alpha value equals to 0.005 there are 79 of 211 countries for which anomaly was detected. Below are the graphs that compares the Benfords distribution with the provided data of Tajikistan and Switzerland, they are the countries with lowest and highest value respectively. As showed below the Tajikistan’s data not satisfy the Benford’s law, so could be some problems in the tracking of the new COVID-19 cases or the data provided are manipulated.

Tajikistan’s result
Switzerland’s result

Here can be found the file which containing all results

Firs rows of the CSV results, complete file is available at this link

Conclusion

This simple analysis highlighted about the 37% of countries provided a set of data, related to the new confirmed COVID-19 cases, that not satisfy the Benford’s law.

--

--