John Graunt, the first data analyst

31 may 2021

Did you ever fancy living in a pandemic? I did not before March 2020. The COVID-19 pandemic reminded me that modern statistics has its roots in another pandemic: the plague. Data collected by the parishes of London were used by the authorities to decide which measures should be taken against the plague. These data were published on a weekly basis as of 1604. John Graunt (1620-1674) was the first to analyse such data using scientific inquiry. He is therefore regarded as the founder of demography, the statistical study of populations, and more generally as the first statistician. Nowadays, we would probably call him the first data scientist or data analyst. To be able to fully acknowledge the accomplishments of John Graunt we will have a look at the data John Graunt analysed. Before doing so, a few things we know about John Graunt’s life (more about his life can be found on the Wikipedia page about John Graunt and in particular in the first reference given there).

John Graunt’s life

John Graunt was born in 1620 and died at the age of 53. Graunt was married to Mary Scott with whom he had four children. For many years he worked in his father’s shop who was a draper. John Graunt was a respected London citizen who held several offices; for example, in the Council of the City. It seems that he was in economic trouble after the Great Fire of London in 1666. Also his change of religion, he became a Roman catholic, seems to have enhanced his economic troubles. Anyhow, none of these things would have resulted in a Wikipedia page if he had not published his ”Natural and Political Observations Made upon the Bills of Mortality” in 1662. This book was a great success which can be seen from the fact that a second edition was published in the same year. The importance of the book can also be assessed from the fact that the book resulted in his election to the Royal Society.

The data: The bills of mortality

Starting in 1604 (more precisely on December 29, 1603) the parishes of London published weekly mortality statistics, called the bills of mortality. More precisely, the bills did not report the number of deaths but the number of burials. The bills also reported christenings. Until 1629 the parishes only recorded the total number of deaths and the number of plague deaths. Other causes of death than the plague were reported in the bills as of 1629. In 1629 the bills also started to record burials and christenings separately for females and males. However, the bills did not record the age of the deceased before 1728. This is important to notice because it shows to us that John Graunt had to analyse the data without knowing the ages of the deceased.

A potential reason could be that the searchers who classified the deaths had been bribed (maybe early ancestors of FIFA officials 🙂)

John Graunt’s analysis of the data

The main questions Graunt hoped to be able to answer by analysing the data were “How many inhabitants are there in London?” and

The main questions Graunt hoped to be able to answer by analysing the data were

How many inhabitants are there in London?
How many fighting men (men between 16 and 56 years of age) are there in London?

Before tackling these questions he discussed the reliability of the data. Even today, many statistical analyses start by checking how plausible the data are. But let’s see what John Graunt did to study the trustworthiness of the bills of mortality.

Here comes the first example. The following table gives the number of burials and the number of deaths classified as plague death for the months March to December for 5 years according to the bills of mortality.

Year	Burials	Plague deaths	%
1592	25,886	11,503	44
1593	17,844	10,662	60
1603	37,294	30,561	82
1625	51,758	35,417	68
1636	23,359	10,400	48

John Graunt now argued that the plague mortalities in 1603 and 1625 were the greatest and about the same. While we can easily concur with the first part of this statement you may wonder what led him to his second conclusion. Graunt analysed the years before 1625 which were non-plague years (not given in the table above but in the bills of mortality). In these years the number of annual burials ranged between 7,000 and 8,000. He concluded that in non-plague years preceding 1625 the number of burials was kind of stable over time. Now re-considering the year 1625 one finds that the number of non-plague deaths equals 54,265 − 35,417 = 18,848. Here 54,265 is the total number of deaths in 1625, the table only gives the deaths between March and December. John Graunt argued that this number is highly inconsistent with previous years and concluded that about 10,000 plague deaths had been reported as non-plague deaths. A potential reason could be that the searchers who classified the deaths had been bribed (maybe early ancestors of FIFA officials 🙂). With this correction the percentage of plague deaths becomes 85% explaining why he found 1625 to have plague mortalities similar to 1603.

This is exactly what we call stationarity in Time Series Analysis.

Here comes a second example of John Graunt’s critical appraisal of the data. For the two non-plague years 1631 and 1659 the bills reported the following:

Year	Deaths in childbed	Ordinary burials	Christenings
1631	112	8,288	8,524
1659	226	14,720	5,670

The low number of christening in 1659 compared to 1631 is apparent. The reason being a change of religious opinion. As said, John Graunt’s goal was to answer questions 1 and 2 above. Clearly, as long as almost every child is baptized the number of christenings is almost identical to the number of births. However, when this is no longer the case the need arises to make inference based on the given (incomplete) data. John Graunt proceeded as follows to draw a conclusion about the number of births. He assumed that the chance of dying in childbed is almost constant w.r.t. years. Note that we encountered a similar assumption in the last paragraph and that this is exactly what we call stationarity in Time Series Analysis. Based on this assumption he concluded that the number of births in 1659 should be 11,500 (which is still rather low assuming a constant chance of dying in childbed). After these reliability checks of the data (nowadays we would probably call this data cleaning) Graunt tried to answer the two questions above by constructing the first life table. If you are interested to learn how he tried to answer the questions it can be found in the book ’A History of Probability and Statistics and Their Applications before 1750’ by Anders Hald. The examples presented above can also be found in this book. You may wonder whether John Graunt managed to achieve his two goals. While he was the first to construct a meaningful life table he did not fully grasp how to use it to calculate the number of inhabitants or the number of fighting men. It was Edmond Halley (1656-1742) who some years later understood how to use life tables to calculate population sizes. Yes, that is the same Halley who was the first to calculate the periodicity of the comet that is now referred to as Halley’s comet.

John Graunt’s life

The data: The bills of mortality

John Graunt’s analysis of the data