An Analytical View on World Happiness with Unsupervised Machine Learning PDF Free Download

1 / 8
3 views8 pages

An Analytical View on World Happiness with Unsupervised Machine Learning PDF Free Download

An Analytical View on World Happiness with Unsupervised Machine Learning PDF free Download. Think more deeply and widely.

RESEARCH ARTICLE
European Journal of Humanities and Social Sciences
www.ej-social.org
DOI: http://dx.doi.org/10.24018/ejsocial.2022.2.3.216
Vol 2 | Issue 3 | May 2022
1
Andrew Zhu
ABSTRACT
This analysis covers the underlying behavior and meaning of the data
provided by the World Happiness Report (WHR). The data includes six
parameters that the WHR uses to calculate world rankings. From the
data analysis, several issues with the growth of the world as a whole and
the lack of resources in the trailing countries can be inferred. From the
trends in bar graphs and histograms, world growth can be categorized as
a “spearhead” type of happiness growth, with developing countries that
are behind developed countries and a few trailing countries that lag
behind the rest of the world. A heatmap was generated to show the
correlation between the 6 variables, showing that corruption and
generosity have no correlation with any other variable including the
happiness score. Using hierarchical clustering, an unsupervised machine
learning model, 3 clusters of countries were found, which supports the
results of the heatmap and shows that the poorest cluster, while ranking
high in generosity, still rank much lower than the other two groups of
countries.
Keywords: Corruption, Data analysis, Generosity, Hierarchical clustering,
Machine learning, World happiness
Published Online: May 05, 2022
ISSN: 2736-5522
DOI: 10.24018/ejsocial.2022. 2.3.216
A. Zhu*
Holy Trinity School, Richmond Hill, ON,
CA.
(e-mail: zhuandrew07 gmail.com)
*Corresponding Author
I. INTRODUCTION
The World Happiness Report (WHR) scores 146 countries and territories across the globe. They are
ranked according to GDP per capita, Social Support, Healthy Life expectancy, Freedom to make life
choices, Generosity, and Perceptions of Corruption obtained by Gallup. The WHR is a resource used by
governments, specialized individuals, and social scientists to determine the overall state of a countries
populations happiness. The WHR bases its rankings on data that can be analysed individually to infer
trends in specific categories evaluated, and using unsupervised machine learning, the countries of the
world can be classified which can help specialists analyse trends.
II. DEFINITIONS
A. Gross Domestic Profit per capita
GDP per capita is the measurement of Gross Domestic Profit that a country generates yearly divided by
the population.
B. Social Support
Social support is measured by the World Gallup through questionnaires, it consists of family support,
community, and government support in a country.
C. Healthy Life Expectancy
Healthy Life Expectancy is taken by each generation's expected life duration within a country. This
aspect does not consider unexpected diseases or rare cases, which eliminates individual weight on this
variable.
D. Freedom to Make Life Choices, Generosity, and Perceptions of Corruption
These three rankings obtained by questionnaires/census, where the responses of those questioned in a
country become data used by the WHR. These aspects take politics and war into consideration. These 3
variables are prone to change as they are synonymous with political change in a country. They are also
the most difficult to measure, as a wide variety of citizens’ responses are needed to obtain an accurate
measurement.
@
An Analytical View on World Happiness with
Unsupervised Machine Learning
RESEARCH ARTICLE
European Journal of Humanities and Social Sciences
www.ej-social.org
DOI: http://dx.doi.org/10.24018/ejsocial.2022.2.3.216
Vol 2 | Issue 3 | May 2022
2
III. PURPOSE
Social studies involving these 6 variables are typically convoluted as they collectively affect a country
and individually affect each other. I was interested in psychology and world happiness was a subject
Analyzing the world's happiness data has the potential to have useful applications in the real world.
Especially in our current world, citizens' morals are crucial to a healthy economy and in turn a healthy
country. In my opinion understanding world happiness can benefit government officials in this post-
pandemic recovery process. The analysis of world happiness trends from the WHR has the potential to
aid leaders in policy making and governing, as a result, benefit countries of all scales in the coming
months or years recovering from the pandemic.
IV. DATA
The WHR (World Happiness Report) is the most significant and recognized organization that records
and measures world happiness across countries. The WHR measures happiness in all countries across the
globe except 5 states. The WHR receives their data from the Gallup World Poll, which polls over 1000
adult citizens in each country annually with over 100 consistent questions. The World Happiness
Report’s main goal has been to measure and use subjective perceptions of well-being to track and explain
the quality of lives all over the globe. With the data given by the World Happiness Report from 2019
which includes the scores in 6 main categories for determining a country's happiness level, exploratory
graphs can be made using Python programming language and machine learning model. Using code we
can visualize data into graphs that we can ascertain valuable information from, this can come in the form
of categorization, outliers, and trends.
V. DATA ANALYSYS
Fig. 1. Collage of each of the six main variables including overall score and rank.
RESEARCH ARTICLE
European Journal of Humanities and Social Sciences
www.ej-social.org
DOI: http://dx.doi.org/10.24018/ejsocial.2022.2.3.216
Vol 2 | Issue 3 | May 2022
3
Fig. 2. Historgram of GDP per capita in relatin to number of countries, assorted by overall score.
Fig. 2 shows the histograms for each of the 6 variables. Histograms visualize the distribution of the
number of countries associated with each range in the Overall rank, happiness score, GDP per capita,
Social Support, Healthy Life expectancy, Freedom, Generosity, and Corruption. 1. Overall Rank graph
only shows the distribution of ranks to the number of countries, and it displays that in high, middle, and
low rankings there are countries with almost identical scores. 2. Score graph displays the growth of the
world is not evenly distributed, and that there will be more high score countries than low score counters
and that the majority of the world will be attaining higher scores in the near future. 3. The GDP graph
depicts the relation between GDP per capita and the number of countries. GDP graph illustrates there is a
large number of countries that have some of the lowest GDP per capitas, just as large as the amount in the
middle of the pack. This trend can be visualized from Our World in Data. 4. The social support graph
depicts the number of countries related with social support scores given by the WHR. Unlike GDP per
capita doesn't seem to have any relation to geographical situations, but rather depends on politics and
cultures which can be difficult to change and advance. 5. Healthy life expectancy the graph actually
illustrates that there are a large number of countries in all life expectancy categories. Similar to the rest of
the graphs is still that there are a select few countries that are far behind the rest of the world.6. Freedom
to make life choices follows a similar trend and the social support graph, with a small amount of outliers
in the lowest scores in each section.
Also similar to the social support graph there is a significant outlier at the lowest score, which also
most likely is related to culture or governments. This graph in combination with social support shows that
there is a certain group of counties that outlay in contrary to the majority of the world.7. The Generosity
to number of countries graph depicts a completely different picture compared to the previous graphs.
Generosity has at least an inverse relationship with the other variables or is not related to the other
variables at all. 8. The Perceptions of Corruption graph seems deceiving at first to align itself with the
generosity graph, it is. Similar to Freedom the factors that have led to this data is most likely based on the
culture and the government, but there doesn't seem to be any correlation between any of the graphs.
Fig. 2 shows us not only the distribution of countries in each GDP per capita score, but also the final
happiness score of each country. For the majority of the graph the happiness scores follow the “teardrop
shape of the number of countries, where generally the higher the GDp per capita the higher score the
countries would receive. However if we look at the highest GDP per capita range the countries with the
highest happiness score aren't present. This tells us that for the happiest countries GDP isn't as related to
their happiness compared to the rest of the world.
Fig. 3 shows a color-coded histogram of the social support to the number of countries. Social support is
similar to GDP in the sense that generally the higher the social support the higher the social support score,
however there are a significantly larger number of outlier countries. Unlike the GDP graph the countries
with a score of 3 are scattered around the social support columns, rather than generally being on the lower
side of the graph. While the majority of the scores of 5 and 6 follow the trend, there are some outliers
with lower social support scores. This relatively inconsistency in the relation between social support and
happiness score shows that social support is not as influential as a determinant for happiness.
RESEARCH ARTICLE
European Journal of Humanities and Social Sciences
www.ej-social.org
DOI: http://dx.doi.org/10.24018/ejsocial.2022.2.3.216
Vol 2 | Issue 3 | May 2022
4
Fig. 4 compares the scores of each individual category with each other so we can see which categories
are closely related. From the heatmap we can determine visually that GDP per capita, Social support,
Healthy life expectancy, and overall score are closely related with each other because they have a
correlation of above 0.5 or are green in color. We can also see that not only are Freedom, Generosity, and
Perceptions of Corruption not related to the first three variables, they are not related to each other either.
Only Freedom to make life choices is correlated with overall score with a correlation of 0.57, which
means there is a faint relation between the two.
Fig. 3. Histogram of Social Support in relation to number of countries, assorted by overall score.
Fig. 4. Heatmap of each of the six main variables and their correlation with each other.
From the results of the heatmap we can determine that the most influential factors to overall score are
GDP per capita, social support, Healthy life expectancy, and Freedom in descending order. We can also
determine that Generosity has an inverse relationship with GDP per capita, Social support, and Healthy
life expectancy because it has negative values associated with these 3 variables. This heat map can show
us that whilst the WHR says that these 6 variables are the main variables they use to determine world
happiness, only 3 or 4 of these variables have a meaningful impact on the countries happiness score.
RESEARCH ARTICLE
European Journal of Humanities and Social Sciences
www.ej-social.org
DOI: http://dx.doi.org/10.24018/ejsocial.2022.2.3.216
Vol 2 | Issue 3 | May 2022
5
Fig. 5 shows the pairwise relation between each variable in scatter plot form, along with the color
coded histograms (by happiness score) of individual variables on the diagonal. We are able to analyze
how the score is distributed in each of these variables. We can see that GDP per capita, Social support,
Healthy life expectancy and Freedom have the most noticeable and consistent score-happiness trend,
while generosity and Perceptions of corruption lack this distinctiveness, which supports the results of the
heatmap of the 6 variables above.
Fig 5. Pairplot of all variables color-coded by happiness score.
Fig 6. Hierarchical clustering dendrogram.
VI. HIERARCHICAL CLUSTERING
This dendrogram illustrates the process of splitting to produce clusters after the PCA dimension
reduction. The color distinction in the dendrogram shows 3 clusters can be meaningful when separating
countries based on the data.
RESEARCH ARTICLE
European Journal of Humanities and Social Sciences
www.ej-social.org
DOI: http://dx.doi.org/10.24018/ejsocial.2022.2.3.216
Vol 2 | Issue 3 | May 2022
6
Using Principal Component Analysis, we can see the projected data points separated nicely according
to the three clusters, as shown in Fig. 7.
From Fig. 8 we can see that the red cluster (0) includes certain parts of Central America, Middle
Africa, parts of the Middle East, and South-Eastern Asia. The yellow cluster (1) consists of South
America, Mexico, USA, North and South Africa, Southern Europe, part of the Middle East, and North-
Eastern Asia. The violet cluster (2) consists of Northern European countries, Canada, and Oceania. It is
important to note that this clustering is made after PCA dimension reduction, which can alter similarities
between two countries, but overall should keep a generally accurate result.
Fig 7. Principal component analysis projection of Fig 8. World map color-coded by three clusters.
three clusters.
Fig 9. All variables in relation to countries in the three clusters.
With the 3 clusters’ averages, we can compare their graphs. As the correlation analysis shows, GDP,
Social Support, Life expectancy, and Freedom are positively correlated to each other, and to their
happiness score, which is reflected in the bar plots. Cluster 2 shows high performance in every variable.
Interestingly, the lower happiness cluster 0, shows more "generosity" than cluster 1 which has higher
average happiness; and the same goes for the perception of corruption, where cluster 0 is slightly higher
than that of cluster 1. This aligns with the precious analysis taken from the heatmaps and the histograms,
where we learned that both Corruption and Generosity are not closely related to overall happiness. This
RESEARCH ARTICLE
European Journal of Humanities and Social Sciences
www.ej-social.org
DOI: http://dx.doi.org/10.24018/ejsocial.2022.2.3.216
Vol 2 | Issue 3 | May 2022
7
can be beneficial for countries across the globe, especially countries part of the lower cluster, because it
shows that while they excel in Perceptions of Corruption and Generosity, their overall Happiness in the
international spectrum does not reflect that advantage. Considering the nature of these two categories it is
evident that lower-happiness countries have achieved high generosity and low perceptions of corruption
due to culture and community. It is important for national leaders especially in this cluster that they take
advantage of their countries' communities and focus their resources on GDP per capita, social support,
and Healthy life expectancy which have some of the highest positive correlations with the overall score.
This data is also very revealing for the WHR, as it shows the insignificance of 2 of their 6 main variables
in calculating happiness.
VII. CONCLUSION
Some key takeaways from this deep analysis of the data are that while the World Happiness
Report states that their happiness rankings are based on the six variables of GDP per capita, Social
support, Life Expectancy, Freedom, Generosity, and Corruption, not all 6 variables are weighted equally
in the final score given to a country. For countries to increase their score the most effective factors will be
in descending order: GDP per capita, Social support, Life Expectancy, and Freedom. These 4 major
variables taken into consideration by the WHR have the most correlation with the final happiness score.
Perceptions of Corruption and Generosity on the other hand have very little correlation with happiness
score and reflect this in the clustering simulation, as the lower happiness cluster is shown to have a
similar or higher Generosity and Perception of Corruption rating.
Fig 10. Count plot of countries in the three clusters.
The growth of world happiness is not evenly consistent. The general format that the globe follows in all
6 major happiness variables is a small number of “leading countries” followed by the majority of “middle
class” countries and a miniscule amount oftrailing countries” which are far behind in all categories. This
is reflected in Fig. 1. This is essential for all countries to realize as the gap that these trailing countries
are left behind is almost impossible to close from the individual countries. Unlike the majority of the
“middle class” the trailing countries are small in number and resources and are several increments of
score behind. The standard of living of these countries is already well known by the majority of the
world, however what is in line with the analysis is that not only are the standards of living low in these
countries, so is their happiness. This is a relatively undervalued measure of a country's success, but
knowing that standards of living such as GDP per capita, and Life expectancy the happiness of the
“trailing countries” can be majorly improved along with these factors.
Overall based on the rankings from the World Happiness Report’s yearly report and the analysis of
their data the public and excerpt can deduce global trends relating to happiness. It is essential in this time
to consider the happiness of countries and how to improve them. It is also important to use World
happiness as a method to illustrate the progress of the globe relative to each other.
VIII. REFERENCES
China CDC weekly. (2021, July 9). Major trends in population growth around the world. Gu, D., Andreev, K., & Dupre, M. E.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8393076/.
Gallup, I. (2021, November 20). Global research. https://www.gallup.com/analytics/318875/global-research.aspx.
Our World in Data. (2013, May 9). World population growth. https://ourworldindata.org/world-population-growth.
Our World in Data. (2022, Jan 8). GDP per capita. https://ourworldindata.org/grapher/gdp-per-capita-worldbank.
The World by Income and Region. (2019, Feb 23). The world by income and region. https://datatopics.worldbank.org/world-
development-indicators/the-world-by-income-and-r egion.html.
RESEARCH ARTICLE
European Journal of Humanities and Social Sciences
www.ej-social.org
DOI: http://dx.doi.org/10.24018/ejsocial.2022.2.3.216
Vol 2 | Issue 3 | May 2022
8
Vancouver School of Economics at the University of British Columbia. (2019, Mar 20). Changing world happiness. John F.
Helliwell Vancouver School of Economics at the University of British Columbia. Changing World Happiness.
https://worldhappiness.report/ed/2019/changing-world-happiness/.
World Happiness Report 2019. World happiness report 2019. (2019). https://worldhappiness.report/ed/2019/.
World Happiness Report 2021. World happiness report 2020. (2021). https://worldhappiness.report/ed/2020/.
World Happiness Report 2021. World happiness report 2021. (2020). https://worldhappiness.report/ed/2021/.
World Population Review. (2022, Jan 8). Happiest Countries in the World 2021. https://worldpopulationreview.com/country-
rankings/happiest-countries-in-the-world.