Data Scientist - Benjamin Tovar

What can we say about world fertility, life expectancy and population size?

17

Mar

 

 What can we say about world fertility, life expectancy and population size?

Introduction

The purpose of this post is not to “reinvent the wheel”, but rather to be used for a playground with a somewhat difficult Data Science problem where data is highly correlated or not correlated at all and new useful conclusions are difficult to achieve. I did not put much effort programming the following analyses, so I hope at least my code can be used as a starting guide for something more productive.

Materials and methods

Description of datasets

Fertility dataset

  • Official name: Fertility rate, total (births per woman)
  • Details: Total fertility rate represents the number of children that would be born to a woman if she were to live to the end of her childbearing years and bear children in accordance with current age-specific fertility rates.
  • Sources: (1) United Nations Population Division. World Population Prospects, (2) United Nations Statistical Division. Population and Vital Statistics Repot (various years), (3) Census reports and other statistical publications from national statistical offices, (4)
    Eurostat: Demographic Statistics, (5) Secretariat of the Pacific Community: Statistics and Demography Programme, and (6) U.S. Census Bureau: International Database.
  • Source

Life expectancy dataset

  • Official name: Life expectancy at birth, total (years)
  • Details: Life expectancy at birth indicates the number of years a newborn infant would live if prevailing patterns of mortality at the time of its birth were to stay the same throughout its life.
  • Sources: (1) United Nations Population Division. World Population Prospects, (2) United Nations Statistical Division. Population and Vital Statistics Report (various years), (3) Census reports and other statistical publications from national statistical
    offices, (4) Eurostat: Demographic Statistics, (5) Secretariat of the Pacific Community: Statistics and Demography Programme, and (6) U.S. Census Bureau: International Database.
  • Source

Population dataset

  • Official name: Population, total
  • Details: Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship–except for refugees not permanently settled in the country of asylum, who are generally considered part of the population of their country of origin. The values shown are midyear estimates.
  • Sources: (1) United Nations Population Division. World Population Prospects, (2) United Nations Statistical Division. Population and Vital Statistics Report (various years), (3) Census reports and other statistical publications from national statistical
    offices, (4) Eurostat: Demographic Statistics, (5) Secretariat of the Pacific Community:
    Statistics and Demography Programme, and (6) U.S. Census Bureau: International
  • Source

 

Data filtering
We decided to remove for each dataset all entries (countries) with NA (NULL value). For pairwise analyses (say, comparing fertility versus and population and versus life expectancy) we only included countries and years shared among the three datasets in order to make fair comparisons. In other words, include countries and years that are represented by all datasets. Dimensions of matrices are: fertility, life expectancy and population are of size |country| × |year|.

Results

Boxplots of datasets over time and by world region

This section includes boxplots of each dataset over time (from 1960s to early 2010s). Additionally the distribution of values given world region is shown.

Fertility

We show that fertility is decreasing over time:

fig1

Below we show the distribution per region across all years. There are some remarks we can point out, for example countries of Sub Saharan Africa (SSA) have a median of ~6 children per family (largest of all regions) but with some outliers having less than ~4 children/family. On the other hand, regions such as Europe and Central Asia (ECA) and North America (NAM) have, in general the lowest number of children per family. This behaviour might be caused by different cultural and economic factors.

fig2

Life expectancy

In the following figure we show that life expectancy is increasing over time. This result might suggest that in general, living conditions  (education, medical)  improves life expectancies in all countries over time.

fig3Below we show that for example, Sub Saharan Africa (SSA) and South Asia (SAS) countries have lower life expectancies in all years while North American (NAM), Europe and Central Asia countries (ECA) have larger life expectancies in all years.

fig4

Pairwise comparisons of values by time and by region

In this section, we include pairwise comparisons of all datasets (comparing fertility versus and population and versus life expectancy). We performed year by year (mean value of each column from each dataset) and region by region comparisons (mean value of each row from each dataset). The former explains differences between years  and the latter explains differences between world regions.

This figure shows that fertility and life expectancy have a correlation of -0.88 in all regions, indicating that these factors change regardless of the country of origin. In other words, this correlation value indicates that the decreasing of family members and the increasing of life expectancy is a global trend that affect all countries. Population is a factor with no correlation at all with fertility and life expectancy.

Besides, we found that looking at the distribution of fertility data, there is two peaks consisting of abundance of families with ~2 and ~6 children per family.

fig8

Code and data

You can download materials (code and datasets) clicking here

twittergoogle_plusredditlinkedin

Tags: ,


Leave a comment
 

Your email address will not be published. Required fields are marked. *