Theoretical and Natural Science: Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation PDF Free Download

1 / 205
0 views205 pages

Theoretical and Natural Science: Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation PDF Free Download

Theoretical and Natural Science: Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation PDF free Download. Think more deeply and widely.

TNS
Theoretical and Natural Science
Proceedings of the 2nd International Conference
on Mathematical Physics and Computational Simulation
Glasgow, UK
August 9 - August 16, 2024
Volume 41
Editors
Anil Fernando
University of Strathclyde
Gueltoum Bendiab
University of Frères Mentouri
Marwan Omar
Illinois Institute of Technology
ISSN: 2753-8818
ISSN: 2753-8826 (eBook)
ISBN: 978-1-83558-493-4
ISBN: 978-1-83558-494-1 (eBook)
Publication of record for individual papers is online:
https://www.ewadirect.com/proceedings/tns/home/index
Copyright © 2024 The Authors
This work is fully Open Access. Articles are freely available to both subscribers and the wider public with permitted reuse.
No special permission is required to reuse all or part of article, including figures and tables. For articles published under
an open access Creative Common CC BY license, any part of the article may be reused without permission, just provided
that the original article is clearly cited. Reuse of an article does not imply endorsement by the authors or publisher.
The publisher, the editors and the authors are safe to assume that the advice and information in this book are believed to
be true and accurate at the date of publication. Neither the publisher nor the editors or the authors give a warranty,
expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This imprint is published by EWA Publishing
Address: John Eccles House, Robert Robinson Avenue, Oxford, England, OX4 4GP
Email: info@ewapublishing.org
Committee Members
CONF-MPCS 2024
General Chair
Anil Fernando, University of Strathclyde
Organizing Chair
Marwan Omar, Illinois Institute of Technology
Organizing Committee
Sharidya Rahman, Monash University
Büşra Oğuzhan, Çukurova University
Selda Kapan Ulusoy, Erciyes University
Mazhar Javed Awan, University of Management & Technology
Arshad Hassan Khan, FAST NUCES Islamabad
Technical Program Chair
Stavros Shiaeles, University of Portsmouth
Technical Program Committee
Bilyaminu Auwal Romo, University of East London
Bismark Singh, University of Southampton
Achintya Haldar, University of Arizona
Bhupesh Kumar, University of St Andrews
Altaf Khan, University of Mianwali
Yazeed Ghadi, Al Ain University
Jie Zhang, University of Bath
Mustafa Istanbullu, Çukurova University
Gueltoum Bendiab, University of Frères Mentouri, Constantine
Publicity Committee
Festus Adedoyin, Bournemouth University
Xiaolong Li, Beijing University of Posts and Telecommunications
Preface
The 2nd International Conference on Mathematical Physics and Computational Simulation (CONF-
MPCS 2024) is an annual conference focusing on research areas including mathematics, physics, and
simulation. It aims to establish a broad and interdisciplinary platform for experts, researchers, and
students worldwide to present, exchange, and discuss the latest advance and development in
mathematics, physics, and simulation.
This volume contains the papers of the 2nd International Conference on Mathematical Physics and
Computational Simulation (CONF-MPCS 2024). Each of these papers has gained a comprehensive
review by the editorial team and professional reviewers. Each paper has been examined and evaluated
for its theme, structure, method, content, language, and format.
Cooperating with prestigious universities, CONF-MPCS 2024 organized three workshops in Glasgow,
Constantine and Chicago. Dr. Anil Fernando chaired the workshop “Unlocking Video Contextual Ad
Insights: Enhancing Topic Explainability with Rich Multimodal Content Retrieval”, which was held
at University of Strathclyde. Dr. Marwan Omar chaired the workshop “Quantum Machine Learning:
Bridging Quantum Physics and Computational Simulations”, which was held at Illinois Institute of
Technology. Dr. Gueltoum Bendiab chaired the workshop “Machine Learning: Integrating Machine
Learning Techniques to Advance Network Security”, which was held at University of Frères
Mentouri.
Besides these workshops, CONF-MPCS 2024 also held an online session. Eminent professors from
top universities worldwide were invited to deliver keynote speeches in this online session, such as Dr.
Anil Fernando from University of Strathclyde and Dr. Marwan Omar from Illinois Institute of
Technology. They have given keynote speeches on related topics of mathematics, physics, and
simulation.
On behalf of the committee, we would like to give sincere gratitude to all authors and speakers who
have made their contributions to CONF-MPCS 2024, editors and reviewers who have guaranteed the
quality of papers with their expertise, and the committee members who have devoted themselves to
the success of CONF-MPCS 2024.
Dr. Anil Fernando
General Chair of Conference Committee
Workshops
Workshop Glasgow: Unlocking Video Contextual Ad Insights: Enhancing Topic
Explainability with Rich Multimodal Content Retrieval
August 9th, 2024 (GMT+1)
Department of Computer and Information Sciences, University of Strathclyde
Workshop Chair: Prof. Anil Fernando, Professor in University of Strathclyde
Workshop Constantine: Machine Learning: Integrating Machine Learning Techniques to
Advance Network Security
July 15th, 2024 (GMT+1)
Electronics Department, University of Frères Mentouri
Workshop Chair: Dr. Gueltoum Bendiab, Associate professor in University of Frères Mentouri
Workshop Chicago: Quantum Machine Learning: Bridging Quantum Physics and
Computational Simulations
October 10th, 2024 (UTC -5)
ITM Department, Illinois Institute of Technology
Workshop Chair: Dr. Marwan Omar, Associate Professor in Illinois Institute of Technology
The 2nd International Conference on
Mathematical Physics and Computational
Simulation
CONF-MPCS 2024
Table of Contents
Committee Members ······························································································································
Preface ·······················································································································································
Workshops ················································································································································
Workshop
Machine Learning: Integrating Machine Learning Techniques to Advance
Network Security
Analyzing musical tones with fourier transformation ····································································· 1
Xilin Hong
A method to test the uniform convergence of function series ························································ 6
Zhian Wu
Workshop
Quantum Machine Learning: Bridging Quantum Physics and
Computational Simulations
The application of convex function and GA-convex function ······················································· 10
Dingrun Zhao
Research on Improved Crowd Detection Based on YOLOv5 ························································ 16
Qi Wen, Kecheng Li, Yue Wang
Prediction of heart disease based on logistic regression ································································ 25
Zixin Zhang
Analysis of the market value of Premier League attacker ····························································· 32
Wenji Liu
Harmonic analysis approach to the proof of Heisenberg inequality ··········································· 37
Yuchen Wang
Research on the influencing factors of student performance ························································ 43
Chenrui Pei
Analysis of the Relationship between NBA Player Salary and Their On-Court Performances 51
Zijian Yang
Schrödinger equation for various quantum systems based on Heisenberg's uncertainty
principle ················································································································································· 59
Kexin An
Analysis of the Principles of Quantum Computing and State-of-the-Art Applications ··········· 65
Zhuolun Li
Advances in monocular ORB-SLAM system: a review ·································································· 72
Ziyi Yuan
Prospects for the development of cartography through the integration of SLAM technology
with GIS technology ···························································································································· 78
Yaodong Tang
Comparative analysis of matrix factorization and graph convolutional networks in student 85
Tongye Wu
Review on VSLAM based on deep learning ···················································································· 91
Xin Shao
Intelligent assistive obstacle avoidance device based on SLAM and wearable technology ····· 98
Yang Zhang
The research on the factors affecting the World Happiness index ············································· 104
Yizhi Zong
Quantum Entanglement and Qubit Interactions: The Key to Quantum Supremacy ·············· 112
Han Zhang
Quantum Neural Networks: A New Frontier ················································································ 119
Boyu Zhang
Research on the Correlation between the Movement of the Dollar and the Price of Gold ····· 126
Yanxi Zhan
Improvement of visual servo system of industrial robot based on sliding mode control and
deep reinforcement learning ············································································································ 132
Yunzhe Zhou
The 2nd International Conference on Mathematical Physics and Computational
Simulation
Optimizing supply chain networks using mixed integer linear programming (MILP) ·········· 139
Xu Li, Xiaoheng Ji, Xiaolong Zeng
Environmental monitoring system design based on STM32 platform ······································ 145
Yuhe Tie, Peiming Chen
Spacecraft design for interstellar travel ·························································································· 154
Leyan Ouyang
Review on application of fractional Fourier transform in Linear Frequency Modulation signal
and communication system ·············································································································· 167
Zhuoran Wang
The sum of four squares: An exploration of Lagrange's theorem and its legacy in number
theory ··················································································································································· 175
Yifan Cheng
The model of price of sailing ships based on Lasso regression ··················································· 180
Yueying Zhang, Xinyi Zhou, Yingfei Wang, Dongmin Wang
Leader-follower consensus for nonlinear multi-agent systems under directed topology ······ 187
Sicheng Lu
Analyzing musical tones with Fourier transformation
Xilin Hong
School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
23300180056@m.fudan.edu.cn
Abstract. This essay delves into the mathematical exploration of musical tones through the
application of Fourier Transformation, a pivotal tool in the field of digital signal processing and
acoustics. By converting complex musical tones from the time domain to the frequency domain,
Fourier Transformation enables the deconstruction of sounds into their constituent frequencies,
revealing the unique harmonic structures that contribute to the characteristic timbre of different
musical instruments. The focus of this analysis is particularly on the trumpet, chosen for its rich
harmonic content and distinctive sound. Through the examination of audio recordings, this study
uncovers the fundamental frequency and harmonics of the trumpet, demonstrating how these
elements combine to form its unique acoustic fingerprint. The process involves recording,
analyzing, and comparing musical tones using software tools like MATLAB and Python,
providing an accessible yet profound insight into the intersection of mathematics and music. This
essay not only highlights the technical methodology of Fourier Transformation in analyzing
musical tones but also explores its practical applications in music theory, digital audio processing,
and the broader field of acoustics. The findings underscore the transformative power of
mathematical analysis in understanding and appreciating the complex beauty of musical sounds,
opening avenues for further research and application in both the scientific and artistic domains.
Keywords: Fourier Transform, Musical Tones, Frequency Spectrum, Timbre Analysis
1. Introduction
Fourier Analysis, named after the French mathematician Jean-Baptiste Joseph Fourier, is a mathematical
technique that transforms a function of time, space, or any other variable into a function of frequency. It
decomposes complex waveforms into simpler components, specifically into sines and cosines, which
are easier to analyze and understand. This transformation is pivotal in numerous fields, including
engineering, physics, and, notably, music analysis [1-3].
The relevance of Fourier Analysis to music stems from its ability to dissect musical tones into their
constituent frequencies, offering a deep dive into the acoustic properties that define the unique sound,
or timbre, of musical instruments [4]. In music theory and acoustics, the application of Fourier Analysis
transcends basic tone analysis. It plays a crucial role in digital signal processing, enabling technologies
such as MP3 compression, noise reduction, and the synthesis of musical sounds [5]. For musicians and
sound engineers, Fourier Analysis provides a scientific basis for crafting sounds and understanding their
interaction in compositions and recordings. It bridges the gap between the physics of sound production
and the perception of music, offering insights into the construction of musical instruments and the
development of electronic sound synthesis and audio processing tools. Chen studied the interaction
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240109
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
1
between music and vision based on Fourier transform, providing a visualization method that can be used
in musicology related research and interactive media creation methods [6]. By performing Fourier
transform on the audio of different instruments and analyzing the distribution of harmony between
instruments, the characteristics of each style were presented by Yuan [7]. Xu used discrete Fourier
transform to study the composition principle of musical notes on the original signal, and used inverse
discrete Fourier transform to generate music [8].
This paper will analyze musical tones, specifically focusing on recordings from a trumpet, piano, and
flute. The primary objective is to utilize Fourier Analysis, via MATLAB, to dissect and compare the
unique frequency signatures of these instruments. The methodology spans data collection,
preprocessing, application of Fourier Transform, and subsequent analysis.
2. Applications of Fourier Transformation
The applications of Fourier Transformation to the audio samples of the trumpet, piano, flute, piano and
triangle reveals intricate details about the acoustic properties that differentiate these instruments.
Utilizing MATLAB for the analysis, this section discusses the findings from the frequency domain
perspective, providing insights into the unique sonic signatures of each instrument.
2.1. Musical Instruments: Fundamental Frequencies and Harmonics
The Fourier Transform of each instruments audio recordings illuminated the presence of fundamental
frequencies corresponding to the played notes, alongside multiple harmonics that contribute to the
timbre or color of the sound.
The trumpet recordings showcased a strong fundamental frequency with a series of harmonics that
decayed less rapidly than those of the piano and flute in Figure 1. This characteristic brass “brightness”
is due to the trumpet’s ability to produce strong higher-order harmonics.
Figure 1. Time domain signal and frequency domain signal for the trumpet
Piano samples displayed a complex harmonic structure with a rich set of overtones in Figure 2. The
decay of these harmonics was more pronounced, contributing to the pianos distinct reverberation and
tonal complexity.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240109
2
Figure 2. Time domain signal and frequency domain signal for the Piano
The flute’s frequency spectrum was simpler, with a clear fundamental frequency and fewer
harmonics in Figure 3. The flute’s sound is purer and more sine-wave-like, owing to the instrument’s
acoustical properties, which favor the fundamental frequency over the harmonics.
Figure 3. Time domain signal and frequency domain signal for the flute
The visualizations produced in MATLAB effectively highlighted the differences in harmonic content
among the instruments. Plots of magnitude against frequency for each instrument at various notes
provided a visual representation of the acoustic fingerprints. These plots were instrumental in identifying
the unique patterns of harmonics that define the sound of each instrument.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240109
3
2.2. Noise Instruments
This Other instruments like drums and triangles that don’t have a clear fundamental frequency are noise
instruments.
For drums, which produce rich and complex sounds with a broad spectrum of frequencies due to
their diverse modes of vibration, the Fourier Transform can decompose these sounds into their
constituent frequencies in Figure 4. This decomposition helps in understanding the timbral qualities of
the drum, revealing how different materials, shapes, and sizes affect the sounds frequency content. By
analyzing the spectral content, sound engineers and instrument makers can modify and optimize drum
designs to achieve desired sound qualities, from the deep, resonant bass of a kick drum to the sharp,
concise attack of a snare.
Similarly, the triangle, despite its seemingly simple structure, produces a sound rich in overtones.
The Fourier Transform can uncover this intricate harmonic structure, showing a spectrum dense with
harmonics that contribute to its bright, penetrating quality. This analysis not only aids in the crafting of
triangles with specific tonal characteristics but also in digital synthesis and sampling, where
understanding the spectral content is crucial for recreating realistic triangle sounds in in Figure 5.
Figure 4. Time domain signal and frequency domain signal for the drum
2.3. Implications
The implications of this research extend far beyond academic inquiry, touching upon several practical
and theoretical aspects of music and sound engineering.
Understanding the frequency makeup of instruments can enhance teaching methods, providing
students with a more nuanced appreciation of music composition and instrument design. Insights into
the harmonic content and how it contributes to timbre can inform the design of new instruments or the
refinement of existing ones, aiming to achieve desired sound qualities. The principles uncovered through
Fourier Analysis are directly applicable in the development of audio processing software, including
effects, synthesis, and noise reduction algorithms. Producers and sound engineers can leverage this
knowledge to manipulate recordings more effectively, ensuring that the desired emotional and aesthetic
impacts of music are achieved.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240109
4
Figure 5. Time domain signal and frequency domain signal for the triangle
3. Conclusion
The exploration of musical tones through Fourier Transformation has yielded significant insights into
the unique acoustic characteristics of the trumpet, piano, and flute. This analytical journey, underpinned
by mathematical rigor and facilitated by MATLAB, has illuminated the complex interplay between
fundamental frequencies, harmonics, and the resultant timbre of musical instruments. By dissecting
sound into its constituent frequencies, Fourier Analysis has provided a quantifiable understanding of
what gives each instrument its distinctive sound.
This study represents a step towards demystifying the complex relationship between the physics of
sound production and the perceived qualities of musical tones. The application of Fourier
Transformation offers a powerful lens through which to view the intricacies of music, providing a
foundation for future research and innovation in music theory, acoustics, and digital audio technology.
With the development of technology, the potential for new discoveries and applications in the realm of
music and sound engineering will be vast.
References
[1] Zhu, H., Wen, X., Jin, W., He, Z., and Zeng, Yi. (2015) Oil and gas detection based on
deconvolution short-time Fourier transform. Progress in Geophysics, 5, 6.
[2] Zhou, H. and Wang, Y. (2008) Fourier transform is used to measure motor speed. University
Physics Experiments, 21, 54-56.
[3] Yuan, J. (2020) Comparison of Harmony between Timbres of Different Musical Instruments:
Application of Fourier Transform in Music. Chinese Writers and Artists, 000(002), 35-35.
[4] Smith, J.O. (2007) Mathematics of the Discrete Fourier Transform (DFT), with Audio
Applications. W3K Publishing.
[5] Brown, J.C. (1991) Calculation of a Constant Q Spectral Transform. Journal of the Acoustical
Society of America, 89, 425-434.
[6] Chen, J. (2019) Research on music visualization creation method based on Fourier transform.
Science and Informatization, 30, 2.
[7] Yuan, J. (2020) Comparison of Harmony between Timbres of Different Musical Instruments:
Application of Fourier Transform in Music. Chinese Writers and Artists, 000(002), 35-35.
[8] Xu, Q. (2017) Musical tone analysis and generation based on Fourier transform. Electronic World,
4, 2.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240109
5
A method to test the uniform convergence of function series
Zhian Wu
School of Mathematics and Statistics, Lanzhou University, Lanzhou, Gansu, China
wuzha21@lzu.edu.cn
Abstract. The series refer to performing infinite addition operations on infinite numbers or
functions in a certain order. It is hard to find out whether the positive function series converges
uniformly in many cases. In this article, a new method that replacing the sum of function terms
series with improper integral will be introduced, which is designed to solve problems that cannot
be solved by classical Weierstrass M-test. The Cauchy uniform convergence test will serve as
the basis for the entire proof process because it can lead the focus point from the whole sum to
the partial sum of the function series, where its value can be easier substituting by the value of
the improper integral. After using basic knowledge of the improper integral, the uniform
convergence can finally be known. By using this method, testing the uniform convergence of the
irregular function series even estimating its value can be possible accomplished.
Keywords: Uniform convergence of function series, Improper integral, Cauchys convergence
test, Weierstrass M-test, Mathematical analysis.
1. Introduction
A series is a sequence of countable real numbers and it is important to study its sum [1]. According to
historical records, Archimedes was the first person to give the sum of an infinite series. When calculating
the area under the arc of the parabola, he used the exhaustive method, which extremely approximated
the value of π [2-3]. However, people later realized that testing the convergence of the series rather than
directly calculating the sum of the series could indirectly understand the properties of a series. After that,
the sages have devoted themselves to the study of series convergence.
The function series as the topic of this paper is a series, where the terms are functions. Among the
various types of convergence, the uniform convergence is very ideal for a series because many properties
of the function series are preserved by its convergent function [4]. If a function series is equicontinuous,
then the property of continuity transfers to the limit function. Cauchy firstly came out with the theory of
uniform convergence. Later, Seidel and Stokes pointed out Cauchys limitations [5]. Cauchy then
acknowledged their advice and reached the Stokes conclusions [6]. Thomae used Cauchy’s theory for
his own theory of functions without realizing in time the difference between uniform convergence and
non-uniform convergence [7]. The Weierstrass M-test is also helpful to test the uniform convergence of
function series, but this is not a universal method [8]. Florentin used improper integral to approximate
the value of positive series, but the method of using improper integral to determine the series of function
terms has not yet appeared [9]. The subject of the paper is to give a method of testing about function
term series.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240105
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
6
The paper is organized as following. In section 2, the basic knowledge will be shown. In section 3,
the proof of this method will be given. In section 4, two applications by using this method will be
displayed.
2. Basic Knowledge
Let’s recall some facts of improper integral and function series that will be applied for the proof.
Theorem 2.1 (Cauchy’s convergence test) For every ɛ > 0, if 
 (x) is uniformly convergent,
then there exists a natural number N and for every pN, when n > N,
12ɛ (1)
Example 2.2: Let (x) =, for all x [0, ρ], 0 < ρ< 1. Prove that the series is uniform convergent.
Proof:
Since (x) For all kN, there exists |   | 
= 󰇛󰇜= 
. Since 0 < ρ < 1, as n →∞,
0. Hence (x)
is uniformly convergent.
For the integral 󰇛󰇜
, if it is convergent, its value can be simply given by replacing the infinity
with a natural number A that:
󰇛󰇜
lim

󰇛󰇜
(2)
Example 2.3: Calculate the integral: 
.
Choose a natural number b which is efficiently large to replace the infinity, then:

2
lim

2
lim
1
1
1
(3)
Hence the value of the improper integral is known.
3. Method
Let’s introduce some notations:
󰇛x󰇜
 󰇛x󰇜 (4)
󰇛x󰇜 󰇛󰇜
(5)
Theorem 3.1 For all kN, let f(x) =(x). The function f(x) is continuous and monotone between
every interval [k, k+1], then the method below can be used to test the uniform convergence of series:
󰇛󰇜󰇛󰇜󰇛󰇜 (6)
Proof:
According to the Cauchys convergence test, to test whether (x) is convergent or divergent is
identical to test the uniform convergence of (x).
Hence let’s consider the interval [n, ] where the function f is defined on is being divided into unit
subintervals [n, n+1], [n+1, n+2], …, [n+p-1, n+p], … for every pN.
Afterwards, the total sum of f(k), for every k ≥ n+1, actually is the (x). Then:
󰇛󰇜 󰇛󰇜
 (7)
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240105
7
Now using the improper integral can get the upper and lower bound of (x).
󰇛󰇜
 󰇛󰇜󰇛󰇜 󰇛󰇜
󰇛󰇜 (8)
On the other side:
󰇛󰇜 󰇛󰇜
 󰇛󰇜
 󰇛󰇜 (9)
Combining (8) and (9) together, finally (6) is finished now.
When testing some function term series which is hard to be worked out through classical Weierstrass
M-test, researchers can use this method and turn the series test into the improper integral to find out
whether the improper integral is convergent or not.
4. Application
Next let’s apply the method to a series that Weierstrass M-test can’t solve it directly.
Example 4.1 [10]: When α > 0, please discuss the uniform convergence of 
  on [0,
].
Proof:
When 0 < α ≤ 1:
By the method, (x) = 
 = 󰇛󰇜(x). Let x =
 and n → ∞. It is
easy to conclude that (x) is divergent. The series is divergent now.
When α > 1:
Using this method, (x) = 
=   (x). Since α > 1,  i s
convergent to 0 when choosing the n that is efficiently large.
Hence the series is uniformly convergent on [0, ∞].
Example 4.2: Showing that 󰇛󰇜

 is uniformly convergent for x[0, ∞].
Proof:
It is easy to find that M-test doesn’t apply on this integral. So, using the method above:
(x) =

 
󰇡
󰇢(x). If choosing a natural number which is efficiently large,
then (x) ≤ (x) < ɛ, for every ɛ > 0. Through this way the series is uniform convergence.
From the example it is obviously knowing that using improper integral to evaluate function series is
helpful when Weierstrass M-test is not applicable.
5. Conclusion
The connection between improper integrals and infinite series has an inseparable relationship between
their theory and application. When solving certain improper integrals, they can be transformed into
infinite series summation. In this paper, a new method to bypass Weierstrass M-test and obtain uniform
convergence is given and strictly proved. This method makes it possible to use improper integrals to
determine the uniform convergence of function term series. In addition, by calculating the improper
integral, the value of the function term series can be roughly estimated, which greatly facilitates
approximate calculations in practical applications. But this method only applies when the function term
series is positive. In the future, a method that can test all the function term series will be an expectation.
References
[1] Thompson,S. and Gardner, M. (1998) Calculus Made Easy. Macmillan and Co. London.
[2] O’Connor, J.J. and Robertson, E.F. (1996) A history of calculus. University of St Andrews.
[3] James, K. B. (1993) Archimedes and Pi-Revisited. School Sci. Math. 94, 127-29.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240105
8
[4] Nicholas, P. (2020) A note on convergence of sequences of functions. Topol.Appl. 275.
[5] Viertel, K. (2021) The development of the concept of uniform convergence in Karl Weierstrass’s
lectures and publications between 1861 and 1886. Arch. Hist. Exact Sci. 455-490.
[6] Henrik, K. S. (2005) Exceptions and counterexamples: Understanding Abels comment on
Cauchy’s Theorem. Hist. Math., 32, 453-480
[7] Christian, K. Tanguy, R. (2005) How can we escape Thomae’s relations? J. Math. Soc. Japan,
183-210
[8] Rudin, W. (1953) Principle of Mathematical Analysis. McGraw-Hill, Inc. New York.
[9] Florentin, S. (2006) A Triple Inequality with Series and Improper Integrals. Bull. Pure Appl.
Sci.,25,
[10] Chen, J. Yu, C. and Lu, J. (2018) Genuine Mathematical Analysis. Higher Education Press.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240105
9
The application of convex function and GA-convex function
Dingrun Zhao
School of Mathematics and Statistics, Central South University, Changsha, Hunan,
410083, China
7805220128@csu.edu.cn
Abstract. A convex function is a function that maps from a convex subset of a vector space to
the set of real numbers. Convex functions have some important properties, such as non-negativity,
monotonicity, and convexity, which can help us derive and prove inequalities. This paper
explores the concepts of convex functions and GA-convex functions, demonstrating their utility
in proving a variety of common and complex inequalities. Beginning with an overview of convex
functions and their extension to GA-convex functions, the study shows how these mathematical
tools can be effectively utilized in the context of inequality proofs. By leveraging the properties
of these functions, the paper successfully establishes rigorous proofs for a range of inequalities,
highlighting the versatility and applicability of convex and GA-convex functions in
mathematical analysis. The properties convex and GA-convex functions allow us to use it to
determine the direction of inequalities, prove inequalities, determine the optimal solution of
inequalities, and even prove Cauchy inequalities.
Keywords: Convex function, GA-convex function, Application.
1. Introduction
The concavity and convexity of functions have many applications in proving inequalities. Cha conducted
research on formulas related to the theorems of convex functions, deriving several important
inequalities, which were further applied to prove inequalities and solved conditional extremum problems
in 2004 [1]. In 2005, Xia derived the Jensen’s inequality from the concavity, convexity, and continuity
of functions [2]. Wu provided the definition of square-convex functions and methods for determining
square-convex functions. Then the Jensen-type inequality for square-convex functions was established
in 2005 [3]. In 2010, Song and Wan obtained a more concise Hadamard-type inequality for GA-convex
functions through their study of GA-convex functions [4]. Shi et al. obtained a new refinement of the
Hermite-Hadamard-type inequality for GA-convex functions in 2013 [5]. In the same year, Shi et al.
derived some new weighted Hadamard-type inequalities for differentiable GA-convex functions [6]. Wu
and Mao proved the Hermite-Hadamard inequality on a special region in 2022 [7].
This article mainly introduces convex functions and GA-convex functions. The paper first introduces
the definition of convex functions and its equivalent definitions, extends it to n numbers, and then proves
several common inequalities using its properties in section 2. This paper transitions from convex
functions to GA-convex functions, introduces its definition, proves its properties, creates an inequality,
and then proves a more complex inequality relationship in section 3.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240107
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
10
2. Convex Function and its application
2.1. Properties of concave-convex function
The definition of concave-convex function will be introduced first, followed by an explanation of its
properties.
Definition 2.1. ([8]) The original definition of convex functions is derived from geometric intuition.
Assuming curve 󰇛󰇜󰇟󰇠, take 󰇟󰇠 such that . The equation of
the chord passing through the points 󰇛󰇛󰇜 and 󰇛󰇛󰇜 is
󰇛󰇜󰇛1󰇜󰇛2󰇜󰇛1󰇜
21󰇛1󰇜2
21󰇛1󰇜1
21󰇛2󰇜(1)
So 󰇛󰇜 is concave upwards or downwards in interval 󰇟󰇠,
󰇛󰇜󰇛󰇜2
21󰇛1󰇜1
21󰇛2󰇜(2)
Property 2.2. Suppose 󰇛󰇜 is concave upwards or downwards in interval 󰇟󰇠 then it holds that
󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜.
Proof: Let
2
211󰇛21󰇜2
2111
212󰇛01󰇜(3)
If theand in equations (1) and (3) are interchanged, the result remains unchanged. This means
that the above results are independent of whether is greater than or less than , as long as
󰇛󰇜. Therefore, set
2
2101
210112(4)
So 󰇛󰇜 is concave upwards or downwards in interval 󰇟󰇠 that can be replaced by another form:
󰇛12󰇜󰇛󰇜󰇛1󰇜󰇛2󰇜(5)
Definition 2.3. Let 󰇛󰇜 be defined on interval 󰇟󰇠, 󰇟󰇠,  , if
󰇛12󰇜󰇛󰇜󰇛1󰇜󰇛2󰇜(6)
Then it indicates that 󰇛󰇜 is concave up or concave down on the interval 󰇟󰇠.
2.2. The Application of Convex Functions in Proving Inequalities
In this subsection, common inequalities are proven using the properties of convex functions. First, a
lemma is introduced.
Lemma 2.4. Each Let󰇛󰇜 be convex upwards and downwards on 󰇟󰇠, 󰇟󰇠,
there exists,
󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜
 (7)
Proof: By induction, when, the proposition can be proven using (6). Assuming it holds for
, prove that it also holds for , 󰇟󰇠,
󰇛121
1󰇜󰇛
112
1
1󰇜
Let


󰇟󰇠, then
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240107
11
121
112
1
󰇛󰇜󰇛1󰇜󰇛2󰇜󰇛󰇜
 (8)
Example 2.5. Let. Prove:
1
11
21
123
12
(9)
Proof: First prove the right half of the equation.
123
12
12󰇟12
󰇠
ln 1ln 2ln
ln 12
(10)
The inequality can be proven using convex function 󰇛󰇜 and the Lemma 2.4. Replacing
with
󰇛󰇜 can prove the left half of the inequality.
3. GA-Convex Functions
3.1. Characteristics of GA-Convex Functions
The definition of GA-Convex Functions will be introduced first, followed by an explanation of its
properties.
Definition 3.1.([9]) The Let 󰇛󰇜 be a function defined on 󰇛󰇜. For any and
󰇛󰇜, it exists, 1
2
1󰇛1󰇜󰇛1󰇜󰇛2󰇜(11)
Then 󰇛󰇜 is called a GA-subconvex function on ,if the inequality sign is reversed; otherwise, it is
termed a GA-superconvex function on that interval.
Theorem 3.2. If a function 󰇛󰇜 is GA-convex on the interval 󰇛󰇜󰇛󰇜 , then for any
󰇛󰇜 and for 󰇛󰇜 , the function 󰇛󰇜 is GA-subconvex function on the interval
󰇛󰇜.
Proof: Let any 󰇛󰇜, and 󰇛󰇜, then
1
2
1󰇡1
2
1󰇢1󰇛1󰇜2
1󰇛1󰇜2󰇛1󰇜󰇛1󰇜󰇛2󰇜(12)
Where 󰇛󰇜 is GA-convex on 󰇛󰇜 . For any 󰇛󰇜 , since 󰇛󰇜 is GA-subconvex
function on 󰇛󰇜, for any 󰇛󰇜 it holds
1󰇛1󰇜2󰇛1󰇜󰇛2󰇜1󰇛1󰇜󰇛1󰇜󰇛2󰇜(13)
Therefore, 󰇛󰇜 is GA-subconvex function on interval 󰇛󰇜.
Theorem 3.3. Let a function 󰇛󰇜 be twice differentiable on the interval 󰇛󰇜. Then 󰇛󰇜 is
GA-convex on the interval if and only if the following conditions hold:
(1) Let 󰇛󰇜 be GA-convex on , the inequality 󰇛󰇜󰇛󰇜0 must hold all in .
(2) Let 󰇛󰇜 be GA-concave on , the inequality 󰇛󰇜󰇛󰇜0 must hold all in .
Proof: It is easy to establish the connection between the second derivative of 󰇛󰇜 on the interval
󰇛󰇜 and the concavity/convexity of the function.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240107
12
Theorem 3.4. Suppose 󰇛󰇜 is GA-concave on the interval , 󰇛󰇜
It holds 󰇡󰇢󰇛󰇜󰇛󰇜󰇛󰇜
 
Proof: This theorem can be proved by induction. Then, it is easy to get if 󰇛󰇜is GA-Concave on
interval : 12
1
󰇛󰇜
1
(14)
3.2. Applications of GA-convex functions.
Theorem 3.5. ([10]) Suppose function 󰇟󰇠󰇛󰇜 is GA-Concave, it holds
󰇛󰇛
󰇜
󰇜
󰇛󰇜
󰇛

󰇜󰇛󰇜󰇛
󰇜󰇛󰇜 (15)
If function is GA-Convex, inverting the inequality sign is sufficient.
Proof: First prove the inequality on the right-hand side. It can be proved easily by taking the logarithm
on both sides. Let 

 and 


Let 
, it is easy to infer 󰇛󰇜. By the properties of GA-Concave, the following formula
can be derived.
󰇛󰇜
󰇛󰇜󰇛󰇜
󰇛󰇜󰇟󰇛󰇜󰇛󰇜󰇛󰇜󰇠
󰇛󰇜
󰇟󰇛󰇜󰇛󰇜󰇛󰇜󰇠
󰇛󰇜
󰇟󰇛󰇜󰇛󰇜󰇛󰇜󰇠󰇛󰇜󰇛󰇜
󰇟󰇛󰇜󰇛󰇜󰇛󰇜󰇠
󰇛󰇜󰇛󰇜󰇛󰇛󰇜󰇛󰇜󰇜󰇛󰇜

󰇛
󰇜󰇛󰇜
󰇛󰇜 (16)
Dividing both sides by will get the inequality on the right-hand side. By the same way, the
inequality on the left-hand side can be proved. Let 󰵎󰵎󰇟󰇠. By the
definition of a definite integral and Theorem 3.4, the following formula can be derived.
1
󰇛󰇜


1
󰇛Δ󰇜
1
󰇛󰇛Δ󰇜󰇜
1
󰇛
󰇟󰇛Δ󰇜
1
󰇠󰇜󰇛
󰇟󰇛Δ󰇜
1󰇠󰇜
󰇛󰇝1
Δ

Δln 󰇡Δ󰇢
1󰇞󰇛󰇝 1

󰇞󰇜󰇛1
󰇧
󰇜1
󰇨 (17)
When 󰇛󰇜 , the inequality in (15) holds.
Example 3.6. ([10]) Suppose ,
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240107
13

󰇛󰇜
󰇛
󰇜
󰇛
󰇜
(18)
Proof: This example can be proven by GA-concave functions and Theorem 3.5 By substituting
󰇛󰇜
into the inequality on the left side of (15), it follows
1
󰇛
󰇜1

2
1
󰇛
󰇜1

1
󰇛
󰇜1
4
9󰇛
󰇜2󰇛󰇜2
41
󰇛
󰇜1
 (19)
Substituting 󰇛󰇜 into the inequality on the right side of (15) results in

(20)
Next, the proof of Example 3.6 reduces to prove:

2 (21)
Suppose , the original formula can be simplified as 󰇛󰇜󰇛󰇜
Construct a function 󰇛󰇜󰇛󰇜 and utilize the Lagrange Mean Value Theorem 󰇛󰇜󰇛󰇜

󰆒󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜.
Due to this common inequality:
1
11
1󰇛1󰇜2󰇛1󰇜 (22)
Therefore, the inequality (21) is proved.
Replacing and b with and in (21) results in 󰇛󰇜
 
, multiplying both sides by
, 
󰇛󰇜
can be obtained.
Only the last inequality needs to be proven now.
4
9󰇛
󰇜2
2
󰇛󰇜󰇛󰇜2
24
9󰇛󰇜20
1
18 󰇟󰇛4󰇜󰇛󰇜2󰇠0(23)
Therefore, the inequality Example 3.6 is proved.
4. Conclusion
This article first introduces the definition of convex functions from a geometrically intuitive perspective,
then extends from two points on an interval to n points, skillfully demonstrating that the harmonic mean
is less than or equal to the geometric mean, which is less than or equal to the arithmetic mean. In the
subsequent section, it extends the ordinary convex functions to GA-convex functions, studies their
sufficient and necessary conditions and properties, and ultimately constructs an inequality to prove the
complex inequality chain in the example. It is evident that convex functions can easily be used to prove
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240107
14
seemingly complex inequalities, but they also require assistance from other tools in mathematical
analysis. It is hoped that in the future, building upon the foundation laid by this research, researchers
can continue to advance the understanding and application of convex functions in the realm of
inequalities.
References
[1] Cha, L. (2004) Convex functions and inequalities. Journal of Ningbo Vocational and Technical
College, 8, 3.
[2] Xia, H. (2005). Convex functions and inequalities. Journal of Changzhou Institute of Technology,
18, 3.
[3] Wu, S. (2005). Square convex functions and Jensen-type inequalities. Journal of Capital Normal
University: Natural Science Edition, 26, 6.
[4] Song, Z. and Wan, X. (2010). Hadamard-type inequalities for Ga-convex functions. Science,
Technology and Engineering, 23, 3.
[5] Shi, T., Wu, H. and Jiao, Z. (2013). Two functions related to Hermite-Hadamard type inequalities
for Ga-convex functions. Journal of Guizhou Normal University: Natural Science Edition, 31,
5.
[6] Shi, T. and Wu, H. (2013). Weighted Hadamard-type inequalities for differentiable Ga-convex
functions. Journal of Chongqing University of Science and Technology: Natural Science, 6, 5.
[7] Wu, Q. and Mao, Y. (2022). Properties of Multivariate Convex Functions and Their Hermite-
Hadamard Inequality. Mathematics in Practice and Understanding, 52, 268-272.
[8] Zhou, Z. (2006). In the process of proving inequalities, one must follow the general rules and
basic methods of reasoning for proving problems, and also, due to the ‘inequality’ aspect, it is
necessary to adopt some special proof methods. This article will use one of the properties of
functions - convexity - to prove some inequalities in high school algebra. Journal of Lanzhou
Institute of Education, 4, 58-60.
[9] Wu, S. (2004). Ga-convex functions and the Poincaré-type inequality. Journal of Guizhou Normal
University (Natural Science Edition), 2, 52-55.
[10] Hua, Y. (2008). Hadamard-type inequalities for Ga-convex functions. College Mathematics, 24,
3.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240107
15
Research on Improved Crowd Detection Based on YOLOv5
Qi Wen1, Kecheng Li2,4, Yue Wang3
1School of Computer Science, University of Xi'an for Polytechnic, Xian, China
2College of Computer and Cyber Security, Chengdu University of Technology,
Chengdu, China
3School of Optical-Electrical and Computer Engineering, University of Shanghai for
Science and Technology, Shanghai, China
4li.kecheng@student.zy.cdut.edu.cn
Abstract. With the acceleration of the process of modern urbanization and the improvement of
residents' material living standards, the flow of people in the public space is gradually
becoming saturated. The monitoring equipment in public places records a huge amount of
people flow information all the time, but due to the crowds tend to be dense and crowded.
Traditional machine learning cannot make accurate and efficient identification of a large
number of dense crowds, if the deep learning technology can be used to process the crowded
crowd captured by the surveillance camera and accurately identify the number of people in
public places, it provides an important guarantee for the flow of people in public areas and
safety construction. However, for crowded targets with occlusions, the traditional target
detection algorithm sometimes performs poorly. Based on the above background, this paper
introduces an enhanced deep learning framework utilizing the YOLOv5 neural network for
crowd detection research. aiming at the characteristics of dense and crowded crowds in public
areas. By improving convolutional layer C3 in the backbone structure of YOLOv5 neural
network and adding CBAM attention mechanism. Compared with the original YOLOv5, the
improved model has increased the maximum F1 value of crowd recognition at near, middle and
far distances. To sum up, the deep learning framework improved by YOLOv5 neural network
proposed in this paper has significantly improved the recognition accuracy of crowded people
in public areas.
Keywords: YOLOv5, Crowds, Image recognition, CBAM attention mechanism.
1. Introduction
With computer vision technology, big data tracking and the development of convolutional neural
networks, object detection plays a vital role in various fields. In practice, this method can monitor the
flow of people in public places and tourist attractions. At present, YOLOv5, faster, R-CNN and other
deep learning-based object detection methods have been applied to human flow detection. YOLO
(You Only Look Once), as a classic real-time target detection algorithm, its fast speed and good effect
make it widely used. It plays a crucial role in detecting pedestrian flow in public areas [1]. But when it
comes to crowd counting, now it's difficult to count pedestrians based on dense and crowded crowd
identification, suitable for large density, small target population, there's still a lot of room to explore,
and current crowd-based target detection, deep learning algorithms are still in their infancy, difficult to
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0126
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
16
apply in real life scenarios, it has such problems as poor robustness, low precision and large
calculation amount [2]. To solve the above problems, there have been many studies that have made
remarkable progress. For example, by adding the Focus layer [3], the amount of model computation
can be reduced and the operation speed can be accelerated, thus improving the detection efficiency. In
addition, by introducing attention mechanism and adding SE module in the stage of network feature
fusion, the localization accuracy of information is improved. At the same time, using Soft-NMS to
replace the original NMS, the mean detection accuracy mAP@0.5 increased by 1.5%, and the recall
rate increased by 0.5% [4]. This paper designs an improved deep learning framework based on
YOLOv5 neural network. By improving the convolutional layer C3 in the backbone structure of
YOLOv5 neural network, add CBAM attention mechanism [5], to realize the improvement of crowd
identification accuracy at near, middle and far distances, allowing it to more accurately identify
obscured targets, thus improve the detection performance and practicability. The primary contents of
this paper include: Identify targets for crowds, this paper completes the crowd count by detecting the
torso of the person. First collect and create a data set for people detection and identification, the
dataset consists of 15,000 images. Then they trained on the data they had collected, according to the
training data, the characteristics of occlusion and congestion of the target are identified. Improvements
to YOLOv5, by adding SENet attention mechanism to YOLOv5, that is, each Channel is pooled,
through two fully connected layers, get the output vector, then, the nodes and channels of the second
fully connected layer are aligned [6]. The final output, the F1 value after training increased by 0.03
compared with the original YOLOv5. The recall rate went up from 0.71 to 0.73, solve the problem of
target recognition accuracy of crowded crowd at middle distance and far distance. However, the
experiment found that its accuracy in short-range target recognition decreased compared with the
original YOLOv5.To solve this problem, this paper introduces a new attention mechanism CBAM to
improve YOLOv5. By adding two new modules before the data entry of the original C3 module of
YOLOv5, channel attention mechanism and spatial attention mechanism, the two modules are
multiplied after each calculation is completed, by suppressing information that is not important in
terms of channel and space, respectively, the F1 value after training is 0.72, the confidence for
accuracy of 1 is 0.968, when the confidence is 0.5, the accuracy is 0.768, The recall rate was 0.76, the
F1 value of the original YOLOv5 is 0.68. The confidence for accuracy of 1 is 0.985, when the
confidence is 0.5, the accuracy is 0.723. The recalls rate was 0.71. Experiments show that the target
recognition accuracy of middle and far crowded crowd is improved. At the same time, it also
maintains the recognition accuracy of the original YOLOv5 in the close-range target. Solved the
model in the complex public scene, especially for a large number of people with small targets, the
identification accuracy of the problem.
2. Research methods
2.1. Introduction to YOLOv5 architecture
YOLOv5 is an efficient target detection model, which is characterized by a simple model architecture
and high efficiency of target recognition, especially suitable for real-time multi-target recognition
scenarios. The YOLOv5s version is used in this paper, this version also boasts the smallest depth and
feature map width among the YOLOv5 series. The basic components of YOLOv5s include Focus,
Conv, C3, SPP. The function of Focus is to decompose the high-resolution feature map into many
low-resolution feature maps, that is, by reducing the larger input image to a smaller input image to
improve the speed of calculation and the accuracy of feature extraction. Conv is a conventional
convolution layer in YOLOv5. The main goal is to convolve the input image through the convolution
kernel operation to achieve the purpose of feature extraction and processing. C3 is the key complex
convolutional layer module of YOLOv5s. The main idea revolves around dividing the input feature
map into two parts and processing them separately, and finally merge them to reduce the amount of
calculation as much as possible. SPP is a pooling module with a pyramid shape. It uses a maximum
pooling method to extract features from different spatial scales and perform multi-scale fusion, so that
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0126
17
the model has excellent recognition ability for various targets of different sizes. In general, the main
idea of YOLOv5s is to extract features through complex convolutional layers and strengthen feature
fusion through multi-scale pooling layers, to achieve the effect of fast and accurate recognition of
different targets.
2.2. Introduction of complex convolution layer C3
Figure 1. C3 fundamentals.
Figure 1 illustrates the fundamental principle of the C3 layer, with c1 denoting the input channels, c2
denoting the output channels, and c_ representing the channels generated during the intermediate
convolution process. In general, there are four convolutional layers: convolutional layer 1 and
convolutional layer 2 are exactly the same, and their role is to adjust the number of channels of the
feature map for subsequent processing; The bottleneck layer is also a convolutional layer, similar to
convolutional layer 1 and convolutional layer 2, which mainly provides a bypass for the convolution
operation and directly connects to the subsequent connection layers; After the connection in the
channel dimension, the convolution layer C3 can perform the convolution operation on the feature
map as a whole, and the number of channels is converted from 2c_ to c2, so as to generate the final
feature map. In general, the main feature of C3 layer is that it can fuse features of various scales more
quickly through the combination of parallel convolution and concatenation operations to enhance the
model's feature extraction capabilities.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0126
18
2.3. Improvement scheme
Figure 2. C3 fundamentals.
Based on the architecture of YOLOv5s, this paper makes code improvements. The main improvement
scheme is to replace the original complex convolutional layer C3 of YOLOv5s with the C3CBAM
module to increase its recognition ability for dense crowds. C3CBAM module actually adds two new
modules: adding spatial attention and channel attention modules before the input data of the original
C3 module. As shown in Figure 2, the two modules multiply after the calculation is completed
respectively, so as to suppress the unimportant information in terms of channel and space respectively.
The channel attention module can dynamically learn the significance of individual channels over time,
reduce the effect of irrelevant feature channels to reduce redundant information, and enhance the
model's robustness. The spatial attention module helps the model to understand the relationship
between pixels more accurately through continuous learning of spatial position weights, so as to
enhance the function of feature extraction [7]. By adding the C3CBAM module to the C3 layer of
YOLOv5s, the scheme presented in this paper not only boosts the model's recognition accuracy for
individuals but also enhances its effectiveness in complex scenes, particularly for numerous
individuals with small targets.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0126
19
2.4. Introduction of channel attention module
Figure 3. Channel attention module fundamentals.
As shown in Figure 3, this is the basic principle of the channel attention module. The input here refers
to the feature map that was originally fed to the C3 layer. After input, the feature map is divided into
two parts for adaptive Max pooling and adaptive average pooling respectively. The role of this step is
to make the feature map retain the information of the global maximum value and the global average
value respectively. Then the two parts enter the convolution layer and the activation layer respectively,
which reduces the number of channels and extracts key information while introducing nonlinearity to
increase models expression ability. Convolution is then performed to recover the original number of
channels. Finally, the results of the processed maximum pooling and the results of the average pooling
were added to fuse the features of the two pooling strategies, and then the Sigmoid activation function
was used to compress the output range to between [0, 1] to generate the channel attention weight. In
this way, the importance of each channel can be obtained when performing the output. In general, the
channel attention module enhances the expression ability of important channels by concatenating the
convolutions of maximum pooling and average pooling results [8].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0126
20
2.5. Introduction to Spatial Attention Module
Figure 4. Spatial attention module fundamentals.
As depicted in Figure 4, this illustrates the fundamental principle of the spatial attention module. Once
again, the output refers to the feature map that was originally fed to the C3 layer, not the output of the
channel attention module. First, the research needs to perform Max pooling and average pooling
operations on the input feature maps, which can extract the global maximum and average value in the
spatial dimension. Then, the results obtained by Max pooling and average pooling are concatenated to
obtain a feature map with increased dimensions and containing the results of both types of pooling.
Finally, convolution and Sigmoid activation are also performed to enhance the key spatial location
features and compress the output range. In general, the spatial attention module performs convolution
and activation on the concatenation of the maximum pooling and average pooling results to achieve
the effect of extracting important information from the spatial dimension[9]. In comparison, the main
difference between the spatial attention module and the channel attention module is the difference in
the pooling operation, and the difference in the method of combining the two parts of the pooling
results during feature fusion.
2.6. Introduction to the dataset
The experiment uses the CrowdHuman dataset, which contains a total of 15,000 dense crowd images,
in which a total of 339,565 objects have been marked for recognition. Due to the small size and fuzzy
contour of a considerable part of targets in the dataset, this paper believes that this dataset is very
suitable for the training and verification of dense crowd recognition[10]. Since the label specification
of the dataset itself does not conform to the training scheme of YOLOv5, and the image sizes in the
dataset are different, the format of the label should be adjusted before training, so that the label meets
the label format of YOLOv5 and can adapt to different image sizes. In this paper, only the first part of
the three parts of the data set of the training set is used in the training, and some images are too simple
to recognize, such as images of several people standing in a row to take group photos, etc., so this
paper removes these images in the verification.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0126
21
3. Experimental results
3.1. Introduction to SENet
Figure 5. SENet Fundamentals
The effect of the SENet module is similar to this paper, which is a module that enhances the feature
representation power to make the neural network perform better. As shown in Figure 5, the SENet
module first performs the convolution operation on the input feature map to change the original size C
'×H' ×W 'feature map into C×H×W. Next, Squeeze operation, namely Fsq, is applied to the obtained
new feature map, and a 1×1×C vector is obtained by global average pooling. This vector is then fed
into the fully connected layer Fex learns the importance of each channel. Finally, through Fscale. The
product of the weight vector and the original feature map is the final output feature map [11].
3.2. Comparison of target detection effects
Figure 6. YOLOv5.
Figure 7. YOLOv5(aftertraining).
Figure 8. SENet.
Figure 9. CBAM Improvement.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0126
22
As shown in the figure 6, it can be seen that the untrained YOLOv5 identifies a lot of non-human parts
and many people fail to recognize them, mostly because there are many and sparse detection targets in
the images of the training set. YOLOv5 has a good recognition effect after training with the dense
crowd training set CrowdHuman, as figure 7 shows, it lacks the ability to recognize long-distance
targets. Figure 8 illustrates SENet has a strong recognition ability for distant targets, but its recognition
ability for medium and near ranges is not as good as the original YOLOv5. The improved model after
CBAM shown as figure 9, can not only accurately identify the medium and close targets, but also
improve the ability of distant target recognition, which really improves the recognition accuracy.
3.3. Comparison of experimental data
Table 1. Validation data comparison.
Index
YOLOv5 CBAM
F1 maximum
0.72 (when the confidence is 0.436)
Confidence (when the precision is 1)
0.768
Accuracy at a confidence level of 0.5
0.109
Recall
0.76
As shown in Table 1, the F1 maximum value of YOLOv5 improved by CBAM is slightly improved,
and its recognition accuracy is improved from 0.723 to 0.768 when the confidence level is 0.5. The
most significant improvement was in recall, which went from 0.71 to 0.76. This shows that the
improved model has a great improvement in the recognition of positive samples compared with the
original model.
4. Conclusion
By improving the forward propagation function of YOLOv5 model and the complex convolutional
layer C3 in the backbone network, this paper significantly improves the recognition accuracy of target
detection. The model can learn features more effectively during training. By adjusting the structure
and optimizing the parameters of the C3 convolution layer, the robustness and accuracy of the model
in dealing with complex scenes are enhanced. The results show that after improved YOLOv5 model, it
has achieved significant performance improvement on multiple public data sets, which proves the
effectiveness of the proposed method.
However, this study has some limitations. Firstly, although the accuracy of the enhanced model has
increased, its computational complexity and inference time have also increased, which may bring
certain challenges in practical applications. Moreover, the improvements in this paper are mainly
aimed at specific dense crowd detection tasks, and for other types of visual tasks (such as image
segmentation or pose estimation), the effect is uncertain and needs further verification. Future research
can be further explored in the following aspects: First, to further optimize the computational efficiency
of the model and reduce resource consumption; Second, the proposed method is extended to other
types of deep learning models and tasks to verify its universality. The third is to combine other
advanced technical means, for example, incorporating attention mechanisms and multi-scale feature
fusion, to further improve the performance and adaptability of the model. By improving the forward
propagation function of YOLOv5 model and C3 convolution layer, this project successfully improved
the accuracy of target detection and provided new ideas and methods for subsequent research. Expect
to see more relevant innovations and breakthroughs in practical applications and other fields in the
future.
Authors contribution
All authors contributed equally, regardless of the order of authorship.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0126
23
References
[1] Jiang, X. K., Liao X. L. and Li Y. B. (2019). Regional crowd flow statistics based on Deep
learning. Digital Users, 29(21), 165-167
[2] Zhan, W. W. (2022). Design and implementation of crowd counting and anomaly detection
system. Beijing: Beijing University of Technology.
[3] Chen, B., Dai, S, L. and Ye, B, Y. (2023). Yolo-based social distancing detection method for
people in public areas. Artificial intelligence and robotics research, 12(3), 10.12677/AIRR.
2023.123023
[4] Cong, X, H., Li, S, X., Chen, F, K. and Meng, Y. (2023). An improved dense pedestrian
detection algorithm based on YOLOv5. Computer science and applications, 13(6), 1199-
1207
[5] Wang, X., Dong, Q., Yang, G. Y. (2023). Crops diseases and insect pests recognition based on
optimized CBAM improvement YOLOv5. Computer system application, 32 (7), 261-268. 10.
15888 / j.carol carroll nki. Csa. 009175.
[6] Li, X. P., Zhang, Y. B., Li Y. P., et al. (2023). An improved algorithm for infrared image target
detection based on YOLOv5s. Laser & Infrared, 53(7), 1043-1051. 10.3969/j.issn.1001-5078.
2023.07.010.
[7] Pei, Y. H., Xu, L. M., & Zheng, B. C. (2022). Improved YOLOv5 for Dense Wildlife Object
Detection. BiometricRecognition:16thChineseConference, 569-578.
[8] Ji, D. J., and Cho, D. H. (2021). ChannelAttention: Utilizing Attention Layers for Accurate
Massive MIMO Channel Feedback. IEEE Wireless Communications Letters, 10(5), 1079-
1082. https://doi.org/10.1109/LWC.2021.3057934
[9] You, C. (2021). Research on Smoke and Flame Image Classification Algorithm Based on BAN.
Zhejiang Sci-Tech University.
[10] Xu, H. H., Wang, X. Q., Wang, D., et al. (2023). Object detection in crowded scenes via joint
prediction. Defense Technology, 21(3), 103-115.https://doi.org/10.3969/j.issn.2214-9147.
2023.03.008
[11] Hu, J., Shen, L., Albanie, S., Sun, G., and Wu, E. (2019). Squeeze-and-Excitation Networks.
Computer Vision and Pattern Recognition. https://doi.org/10.48550/arXiv.1709.01507
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0126
24
Prediction of heart disease based on logistic regression
Zixin Zhang
School of Data Science, Capital University of Economics and Business, Beijing,
100000, China
32021230064@cueb.edu.cn
Abstract. Heart disease is a major threat to human health, with a variety of contributing factors,
and is not easily cured. This paper will present a dataset from a cardiovascular study of residents
of Framingham, Massachusetts. First, the validity of the three models, logistic regression,
random forest, and decision tree, is estimated by comparing information such as accuracy,
precision, recall, and F1 values. The optimal model, i.e., the logistic regression model, was
selected by plotting ROC curves and using AUC as a reference criterion for assessing the
predictive effectiveness of the models. Then the raw data and data were preprocessed, including
dealing with missing values. Finally, a logistic regression model was developed to analyze the
influencing factors of heart disease. The purpose of this study was to use the results of the logistic
model to help doctors and patients in heart disease treatment. The results show that the model
has a good predictive effect.
Keywords: Logistic regression, heart disease, ROC curve.
1. Introduction
Heart disease is a disease that afflicts many individuals and families. As technology develops and living
standards improve, more and more people are paying more attention to their health. In recent years, the
incidence of heart disease in many regions has been on the rise, and the loss of life caused by heart
disease is also rising year by year. The World Health Organization estimates that 12 million people die
of heart disease each year globally. For example, in some developed countries, such as the United States,
more than half of the inhabitants die because they suffer from cardiovascular diseases. To reduce the
incidence of heart disease and the mortality rate of the population due to heart disease, further targeted
interventions should be used to study the factors of heart disease.
First, many researchers believe that reducing the incidence of acute postoperative lung injury in
neonates with heart disease can significantly improve child survival [1]. Among adults, many bad
lifestyle habits may also be a major factor in the predisposition to heart disease. For example, it has been
suggested that the incidence of cardiovascular disease due to smoking is higher in China than in the non-
smoking population [2]. Metabolic diseases such as high fasting plasma glucose (HFPG) are significant
and risky factors that lead to cardiovascular disease in humans [3, 4]. In China, with the gradual
development of the economy, the lifestyle and nutritional structure of the population have changed
dramatically, and lifestyle habits such as excessive sugar intake and lack of exercise have led to an
increasing prevalence of HFPG [5].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0103
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
25
The disease burden of ischemic heart disease (IHD) attributable to HFPG in Chinese residents has
obvious gender and age group characteristics. From a gender perspective, all the disease burden
indicators of the female population are lower than those of the male group, and the trend of disease
burden in the total population is more susceptible to the male group, which may be related to the structure
of the female organism [6]. The main reasons for lower life expectancy in men also include behavioral
factors such as smoking and alcohol consumption, genetic and physiological factors, and higher rates of
injury mortality [7, 8]. However, some findings are contrary to popular belief, that light drinkers are less
likely to develop aortic stenosis than never-drinkers [9, 10]. For example, if a person drinks 60 grams
of alcohol per day, he may have a lower risk of developing the disease than someone who drinks 10
grams of alcohol per day [9, 10].
Wang et al. have shown in their studies that heart disease is often closely related to disability in the
elderly [11]. When older adults were selected for the study, the results showed that the risk of the disease
increased twofold for every 10 years of age [12]. In terms of education level, Ni concluded that the risk
of developing activity of daily living (ADL) limitations in elderly cardiac patients with elementary
school or higher education was 0.666 times higher than that of elderly cardiac patients who had never
attended school [13]. Married, cohabiting and educated urban elderly cardiac patients had a lower risk
of ADL limitation [13].
In summary, it was initially determined that the prevalence of heart disease is related to several
factors such as age, gender, genetic factors, amount of smoking, amount of alcohol consumption, level
of education, marital status, and current status of social development. The study will predict which type
of patients are most likely to develop heart disease in the future by analyzing given characteristics,
comparing differences between patients, and making predictions about future trends, with the ultimate
goal of expecting to provide a basis for reducing the incidence of heart disease.
2. Methods
2.1. Data source and description
This study utilizes a dataset provided by the Kaggle platform, which is derived from an ongoing
cardiovascular study of residents in the town of Framingham, Massachusetts. The dataset has a total of
4,239 samples, each with 16 variables. Fifteen of the variables are independent, with each variables
attribute being a potential risk factor. The last variable “TenYearCHD” is the dependent variable,
indicating whether the patient is at risk of having coronary heart disease (CHD) in the next ten years.
2.2. Selection and description of indicators
Among all the variables, both quantitative variables such as “Age”, “CigsPerDay” and categorical
variables such as “Male”, “Education” are included. Due to the different types of variables, in this paper,
the variables involved in the data will be interpreted according to the type of data. Each quantitative
variable is shown in Table 1 and each categorical variable is shown in Table 2.
Table 1. Overview of quantitative variables.
Variable
Description
Range
Age
Age of the patient
32-70
CigsPerDay
Average number of cigarettes smoked per person per day
0-70
TotChol
Total cholesterol level
107-696
SysBP
Systolic blood pressure
83.5-295
DiaBP
Diastolic blood pressure
48-142.5
BMI
Body Mass Index
15.54-56.8
HeartRate
Heart rate
44-143
Glucose
Gucose level
40-394
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0103
26
Table 2. Overview of categorical variables.
Variable
Description
Range
Male
Male or Female
Male=1, Female=0
Education
Educational situation
Less than high school=1,
High school grads=2,
College grads=3, Post-
college grads=4
CurrentSmoker
Currently smoking or not
Yes=1,No=0
BPMeds
On blood pressure medication or not
Yes=1,No=0
PrevalentStroke
Had a previous stroke or not
Yes=1,No=0
PrevalentHyp
Hypertensive or not
Yes=1,No=0
Diabetes
Diabetes or not
Yes=1,No=0
TenYearCHD
10 year risk of coronary heart disease
Yes=1,No=0
2.3. Method introduction
There are many ways to predict whether or not a patient will suffer from heart disease. However, the
predicted results are sometimes very different from the real situation, which is related to whether or not
the patient can get timely treatment or even the patient's life, so it is crucial for the patient to make a
correct prediction or judgment [14]. Logistic regression belongs to the probabilistic regression model,
is a kind of generalized linear model, widely used in probabilistic prediction and classification, has the
characteristics of simple, efficient and strong interpretability [15, 16]. In this study, the samples in the
above dataset were processed accordingly by using logistic regression, and the results obtained from the
processing were further analyzed by observing the results of model fitting, etc., to obtain the main factors
influencing the diagnosis of heart disease.
Logistic regression is a type of regression analysis in statistics that is applied to predict the outcome
of the dependent variable from predictors or independent variables, where the dependent variable usually
refers to categorical dependent variables. Also, in logistic regression, the dependent variable is always
binary. Below is the logistic regression equation:
󰇛󰇜 󰇛01122󰇜
1󰇛01122󰇜 (1)
󰇛󰇜
1󰇛󰇜 󰇛01122󰇜 (2)
After inserting all the variables, the author gets the following equation:
󰇛󰇛󰇜
1󰇛󰇜 0 1 2  (3)
Where denotes the explanatory variable, which in the logistic regression model denotes whether
or not heart disease is diagnosed.denotes the explanatory variable, which in the model is specified as
the factors influencing whether or not one has heart disease. is the parameter to be estimated.
3. Results and discussion
3.1. Correlation analysis
Figure 1 demonstrates the heat map that can reflect the relationship between the features, through which
the correlation between the features can be directly observed. The heat map shows the correlation
between every two data, and the value range chosen in this paper is between -1 and 1, i.e., greater than
0 indicates that the two selected data are positively correlated, less than 0 indicates that the two selected
data are negatively correlated and equal to 0 indicates that the two selected data are not correlated. The
larger the absolute value of the value indicates that the stronger the correlation and vice versa the weaker
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0103
27
the correlation. As can be seen from Figure 1, the four variables diaBP, SysBP, PrevalentStroke, and
age show positive correlation and larger coefficients than the other variables with TenYearCHD,
indicating that they are more intimately related to whether or not the disease is present.
Figure 1. Related heat map.
3.2. Comparison of different models
In this paper, the effectiveness of the logistic regression model is derived by comparing the logistic
regression model with two commonly used models named random forest and decision tree. The various
models were compared in terms of four indicators: accuracy, precision, recall and F1 value. The results
are shown in Table 3. The comparative ROC curves of the three models are plotted in Figure 2.
Table 3. Comparison of three models.
Model
Accuracy
Precision
Recall
F1
Logistic regression
0.835
0.538
0.057
0.104
Random forest
0.831
0.438
0.057
0.101
Decision tree
0.736
0.229
0.246
0.237
According to the results of the above three models, no model excels in all aspects, i.e., no model
outperforms the other models in all indicators. However, on a comprehensive consideration, the
accuracy (0.835) and precision (0.538) of the logistic regression model are in the first place. The recall
and F1 values are in second place. According to the ROC curve, the area under the curve (AUC) of this
regression is 0.65, which is not the highest, but it's only different from the random forest model by 0.02.
This result indicates that the logistic regression model has a good predictive effect on the heart disease
data used in the present study, and it is also of great significance for the subsequent prediction of heart
disease data used for similar purposes.
It is important to choose the model with better results, and after a comprehensive evaluation, this
paper decides to use the logistic regression model for the subsequent research.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0103
28
Figure 2. ROC curve of three models.
3.3. Logistic regression results
Before performing the logistic regression, the study requires some data preprocessing steps. Firstly, the
missing values are processed, which is done by removing the null values, with the aim of ensuring that
the data is clean and usable. The study then divides the processed dataset into two parts: a training set
and a test set, where the training set is used to train the logistic regression model, and the test set is used
to evaluate the performance of the model (Table 4).
Table 4. Logistic regression results.
Variable
β
SE
P
OR
Male
0.4067
0.127
0.001
1.502
Age
0.0301
0.007
0.000
1.031
Education
-0.1665
0.057
0.004
0.847
CurrentSmoker
-0.1314
0.184
0.475
0.877
CigsPerDay
0.0226
0.007
0.002
1.023
BPMeds
0.4812
0.283
0.090
1.618
PrevalentStroke
1.5126
0.613
0.014
4.538
PrevalentHyp
0.8944
0.151
0.000
2.446
Diabetes
0.8820
0.344
0.010
2.416
TotChol
-0.0003
0.001
0.835
1.000
SysBP
0.0138
0.005
0.003
1.014
DiaBP
-0.0242
0.008
0.001
0.976
BMI
-0.0531
0.015
0.000
0.948
HeartRate
-0.0292
0.005
0.000
0.971
Glucose
0.0011
0.003
0.673
1.001
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0103
29
In this study, 15 factors affecting the determination of heart disease were used as independent
variables and then performed the binomial logistic regression. The regression results were organized as
shown in Table 4. Table 4 gives the estimated values of the parameters, and the mean square error
corresponding to the values, in addition to the p-value and OR. Where it is considered significant when
p is less than 0.05; the OR value means the result of comparing the probability of a particular probability
occurring with the probability of it not occurring, which in this paper is expressed as the ratio of having
a heart attack to not having a heart attack in the condition of that independent variable.
3.4. Discussion
From the regression results in Table 4, it can be seen that: male, age, education, cigsPerDay,
prevalentStroke, prevalentHyp, diabetes, sysBP, diaBP, BMI and heartRate have a statistically
significant (p<0.05) effect on having heart disease, which is inextricably associated with heart disease
disease were inextricably linked. On the contrary, currentSmoker, BPMeds, totChol, and glucose did
not have a significant effect on the presence of heart disease (p>0.05), they were not the main influencing
factors for the final confirmation of heart disease.
According to the positive and negative regression coefficients, there is a negative correlation between
the level of education and the ten-year risk of heart attack, indicating that a higher level of education
may reduce the risk, which can also be seen in Figure 2. The coefficients for diaBP, BMI, and heartRate
are also negative, indicating that these variables have a negative effect on the diagnosis of heart disease.
The results also show that gender has a significant effect on the final diagnosis of heart disease, i.e., men
may have a higher 10-year risk of heart attack than women, which may be related to the different
lifestyles of men and women, for example, far more men than women choose to smoke or drink alcohol
in their lives. In addition, the rest of the influencing factors have a positive effect on the ten-year risk of
heart attack, with the slopes of age, cigsPerDay, and sysBP being relatively flat, and the slopes of
prevalentStroke, prevalentHyp and diabetes being larger, indicating that the above variables affect the
final diagnosis of heart disease to varying degrees.
4. Conclusion
Heart disease is an important problem that threatens human health with various factors and it is not easy
to cure. To further analyze the causative factors of heart disease, this paper compares multiple models
and finally uses logistic regression to model 15 variables that affect heart disease. The model aims to
predict the probability of developing coronary heart disease over a ten-year period based on
demographics, lifestyle and health-related factors. The results show that male, age, education,
cigsPerDay, prevalentStroke, prevalentHyp, diabetes, sysBP, diaBP, BMI and heartRate are important
factors in the diagnosis of heart disease. Finally, based on the ROC curve and AUC, it can be seen that
the logistic regression model performs well for the prediction of heart disease. It is hoped that the
conclusions drawn from this study will be helpful in the field of cardiology, provide reference for both
doctors and patients, and gain valuable time to save patients' lives.
References
[1] Jiang L, Ding S, Zhang L P, et al. 2017 Changes in plasma neutrophil gelatinase-associated lipid
transport protein (NGAL) in relation to acute postoperative lung injury in infants and children
with congenital heart disease. Advances in Modern Biomedicine, 17(8), 1570-1573.
[2] Kondo T, Nakano Y, Adachi S, et al. 2019 Efects of tobacco smoking on cardiovascular disease.
Circ J, 83(10), 1980-1985.
[3] Wu S, Xu W, Guan C, et al. 2023 Global burden of cardiovascular disease attributable to
metabolic risk factors,1990 -2019: an analysis of observational data from a 2019 Global
Burden of Disease study. BMJ Open, 13(5).
[4] Chia C W, Egan J M, Ferrucci L 2018 Age-related changes in glucose metabolism, hyperglycemia,
and cardiovascular risk. Circ Res, 123(7), 886-904.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0103
30
[5] Jin Y, So H, Cerin E, et al. 2023 The temporal trend of disease burden at-tributable to metabolic
risk factors in China, 1990-2019: an analysis of the Global Burden of Disease study. Front
Nutr.
[6] Huang Y, Li Y L, Yan W T, Wang G, Wang B W, Xie P 2024 Trend Analysis and Future Trend
Forecast of Ischemic Heart Disease Burden Attributable to Fasting Hyperglycemia in China,
1990-2019. Chronic Disease Prevention and Control in China, 32(3), 176-182.
[7] Janssen F, Bardoutsos A, Ei Gewily S, et al. 2021 Future life expectancy in europe taking into
account the impact of smoking, obesity and alcohol. ELife.
[8] Li FW, Wen SJ, Tang QX, et al. 2020 Impact of injury-related deaths on life expectancy in China.
Cadernos de saude publica, 36(11).
[9] Larsson S C, Wolk A, Beck M 2017 Alcohol consumption, cigarete smoking and incidence of
aortic valve stenosis. J Intern Med, 282(4).
[10] Markus M R, Lieb W, Stritzke J, et al. 2015 Light to moderat alcohol consumption is asociated
with lower risk of aortic valve sclerosis: the study of health in pomerania (SHIP). Arterioscler
Thromb Vasc Biol, 35(5).
[11] Wang R F, Luo Y, Chen Z S, et al. 2021 Relationship between cardiometabolic co-morbidities
and disability in Chinese middle-aged and elderly people. Journal of Jilin University (Medical
Edition). 47(3), 761-769.
[12] Ji H T, Zhao Y X, Yu X Q, Zhang C C, Liu Z D and Chai Q 2023 Effect of smoking and low-
density lipoprotein cholesterol interaction on valvular heart disease. Preventive Medicine
Forum, 29(1), 46-49.
[13] Ni Z H 2023 Construction of a predictive model for the risk of limited ability to perform activities
of daily living in elderly cardiac patients. Geriatrics research, 4(6), 33-38.
[14] Zhang X H 2023 Factor Analysis of Heart Disease Diagnosis Based on Logistic Regression and
Decision Tree. Modern information technology, 7(7), 117-123.
[15] Zhang Y Y, Ge R G and Sun G 2020 A study of patients' perceptions of excessive medical
examinations and influencing factors based on binary logistics regression. China Health Care
Management, 37(12), 893-895+899.
[16] Yan J J, Wu H, and Han B D 2020 Multifactorial Logistics Regression Analysis of Risk Factors
for Residual Cavity Formation after Tuberculous Septic Thorax Surgery. Medical Innovation
in China, 17(18), 128-131.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0103
31
Analysis of the market value of Premier League attacker
Wenji Liu
Faculty of Science, University of British Columbia, Vancouver, V6Z 1T4, Canada
wliu36@student.ubc.ca
Abstract. The main purpose of this study is to use the method of multiple linear regression to
conduct a comprehensive discussion on Factors affecting the market price of Premier League
striker players. In the era of increasingly hot soccer, the transfer of stars is a big attraction in
the transfer period every year, but there are still many clubs signing overpaid and underpaid
players. The overall objective of this study is to find the determinants of players price, so as to
provide a reference for clubs to improve the utilization of funds in the transfer period. In this
study, a dataset of player data for the 17-18 Premier League season was first downloaded via
Kaggle. Then, the dataset obtained from Kaggle was used for empirical analysis to identify
correlations that significantly affect the market price of players, and multiple linear regression
analysis was performed after processing these data. Through the calculations, it was determined
that match performance and goals scored had a significant positive impact on market value, and
age and match possession had a non-significant negative impact on market value, which suggests
that there is a need for the relevant team managers to optimize these aspects in order to promote
a virtuous cycle of club development and team performance.
Keywords: Football, English premier league, market value.
1. Introduction
In recent years, with the growing interest in soccer, soccer has become not just a sport but a multi-billion
dollar industry that attracts fans, sponsors and investors from all over the world [1]. In this industry, the
English Premier League (EPL) is one of the most popular and competitive leagues with significant
transfer fees and salaries for top players [2]. In particular, the value of strikers in the EPL has been a
topic of great interest, with a variety of factors influencing their market value [3].
The literature suggests that there are a number of independent variables that can have an impact on
player value. There are also various models used to assess the performance rights of soccer players.
Some of the more important ones are age, performance points (goals and assists weighted), playing time,
starts, red and yellow cards, etc. [4] In addition, due to the high number of injuries and illnesses that can
be caused in soccer, physical factors, especially the presence or absence of disease, are also one of the
possible considerations for player value [5].
In addition, the issue of financial inputs in the field of sports plays an important role in the economic
sphere. There are two reasons for this situation, namely external and internal reasons. External reasons
refer to different companies realizing their respective soccer investment goals. For example, non-profit
relationships with sports clubs are utilized to build strong international sports brands, such as
Manchester City and Real Madrid [6]. On the other hand, there are also internal reasons that are closely
related to sporting activities, such as loyalty to the club and an emotional connection to the sport.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0108
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
32
However, both reasons are directly related to the sporting performance of the sponsoring discipline and
the wider context of any sporting event, hence the focus of this study is on soccer performance as a
source of strong emotional responses from sporting event participants (e.g., sport administrators,
sponsors, and spectators) [7].
The aim of this paper is to analyze the factors affecting the value of strikers in the Premier League
and to develop a linear regression model to value footballers playing in the striker position, taking into
account econometric modeling assumptions. By examining a range of variables such as performance
indicators, age and nationality, the study seeks to provide a comprehensive understanding of the drivers
of transfer fees and salaries for these players [8]. Understanding these factors is crucial not only for
clubs and agents involved in player transfers, but also for fans and analysts wishing to assess the market
value of players and enables clubs and stakeholders to make more informed decisions in the transfer
market to increase the value of their investment and ultimately the spectacle of the game [9, 10].
2. Methodology
2.1. Data source
The data used in this study was taken from the Kaggle website and has a cut-off date of the end of
October 2018. The dataset contains all available information on the variables of in-game performance,
market value and nationality for all forward players in the Premier League. A linear econometric
mathematical model was used to price the hypothetical market value of soccer players. In the
econometric modeling of soccer players performance rights, this work attempts to eliminate all formal
estimation problems such as normality of residuals, linear relationships and heteroskedasticity. The
result is a new linear regression model that prices the market value of the most valuable forward players
using selected variables and appropriate estimation methods.
2.2. Indicator selection
The analysis carefully selects specific indicators to deepen the understanding of the factors that influence
player value. These metrics include factors such as age, nationality, on-field performance, utilization
rate, and club. The analysis ensures that these metrics will be an effective tool for analyzing the complex
dynamics of forwards market value (Table 1).
Table 1. Descriptive analysis
Indicator
Mean ± standard deviation
Variance
Median
Standard error
Age
25.857±3.681
13.548
26
0.297
Page_views
1122.760±1190.539
1417382
671.5
95.936
Fpl_value
6.458±1.715
2.941
6
0.138
Fpl_sel
0.037±0.067
0.005
0.011
0.005
Fpl_points
66.675±63.980
4093.502
52
5.156
By utilizing these datasets, this paper seeks to delve into the complexities of player value. While
acknowledging the comprehensiveness of these datasets, the author must also recognize their limitations,
particularly in terms of on-field performance as well. These considerations are critical to maintaining
the integrity and validity of the analysis. Furthermore, the careful selection of indicators forms the
cornerstone of revealing the multifaceted influences on the value of Premier League strikers. Through
this integrated approach, the author aims to provide a valuable contribution to the existing knowledge
base in this area by elucidating the complex interplay between the various factors that influence player
value.
2.3. Method introduction
The study began with data screening to select potentially relevant variables, and data were analyzed
using multiple linear regression. The study selected age, number of times the players wiki interface was
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0108
33
accessed, points scored in the match, possession and overall value of the match as independent variables.
Descriptive and frequency analyses were conducted on these variables to highlight their characteristics
and to facilitate the eventual multiple linear regression analysis of player value.
3. Results and discussion
3.1. Correlation analysis
Multiple linear regression analyses were conducted using age, number of visits to the player wiki
interface, match score, possession and total match value as independent variables and market value as
the dependent variable. The following table shows the Pearson visualization chart between five
independent variables and dependent variable (market value) (Figure 1).
Figure 1. Pearson visualization chart of variables
From Figure 1, all the independent variables except age have a high positive correlation with the
dependent variable (MARKET VALUE). While the correlation coefficient between age and market is
only -0.024, indicating that there is no significant linear correlation between their two variables. Using
the correlation plot as a basis, this experiment continued with a linear regression analysis of those five
variables. The table shows that 154 samples participated in the analysis without any missing data (Table
2).
3.2. Model results
From Table 2, it can be seen that AGE, PAGE_VIEWS, FPL_VALUE, FPL_SEL, and FPL_POINTS
are the independent variables, and MARKET VALUE is the dependent variable in the multiple linear
regression. It can be seen that the model is formulated as:
𝑚𝑎𝑟𝑘𝑒𝑡 𝑣𝑎𝑙𝑢𝑒 = 28.824 0.232 𝑎𝑔𝑒 + + 0.024 𝑓𝑝𝑙 𝑝𝑜𝑖𝑛𝑡𝑠 (1)
The above equation shows that as age and fpl_sel increase, the market value of a player decreases.
When page_review, fpl_value and fpl_points increase, the players market value also increases. In
addition, changes in fpl_value and fpl_sel significantly affect market value due to differences in the
coefficients.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0108
34
Table 2. Summary of the results of the multiple linear regression analysis
Non-standardized
coefficient
Standardized
coefficient
t
p
Covariance
diagnosis
B
Standard
Error
Beta
VIF
Toleranc
e
age
-0.232
0.145
-0.056
-1.603
0.111
1.077
0.929
Page_vie
ws
0.001
0.001
0.047
0.902
0.369
2.396
0.417
fpl_value
7.341
0.584
0.827
12.562
0.00**
3.809
0.263
fpl_sel
-9.102
10.402
-0.04
-0.875
0.383
1.869
0.535
fpl_points
0.024
0.012
0.101
1.981
0.049*
2.29
0.437
3.3. Discussion
The combined analysis shows that age and possession significantly reduce the market value of a player.
However, fpl_value and fpl_points increase a players market value. In addition, the number of hits on
a players wiki page does not affect market value. In fact, a players off-season game performance also
tends to be negatively correlated with age during the game season. And, managers often judge whether
a player deserves a higher salary based on the players in-game performance, such as fpl_value,fpl_sel.
Overall, the linear regression model developed in this experiment can more clearly help clubs intuitively
determine the appropriate salary.
In addition to the quantitative variables analyzed above, experts also believe that the market value of
a striker is affected by a variety of other factors, including nationality, performance of the club in which
he is playing, whether he is a foreign player, and whether he is in a BIG6 club.
4. Conclusion
In this study, a multiple linear regression model was used, with player market value as the dependent
variable, and age, overall on-field performance, on-field goals, on-field possession, and daily hits on
players wiki pages as the independent variables. Meanwhile, this paper also considers some control
variables, such as player position and league level, to ensure the accuracy of the research results. This
paper delves into the relationship between many influencing factors of the price of Premier League
striker players. By analyzing a large amount of player data, this paper produces a series of statistical
results and calculates the degree of influence of all independent variables on the market value of players.
When other potential variables are taken into account, age, overall match performance, number of goals
scored in a match and match possession are found to have a significant impact on a players market
price. Based on the regression model of the study, some suggestions can be made for scouts and
managers to operate in future transfer periods. When choosing transfer targets as well as determining
prices, businessmen should carefully consider these four factors to improve the effective utilization of
funds and thus improve the teams performance.
Meanwhile, analyzing factors influencing football market value is crucial for understanding the
economics of the sport. Future research in this area should consider several key elements: injury history
and fitness, contractual factors, club performance and financial health, economic indicators. A players
injury history and current fitness levels significantly impact their market value. Longitudinal studies
tracking injury patterns and recovery times can help predict future performance and market fluctuations.
Besides, the length and terms of a players contract, including buyout clauses and salary, play a
significant role in market value. Analysis of contract trends across different leagues can provide a
comparative understanding of how these factors influence valuations. Furthermore, the financial
stability of a players club and its performance in domestic and international competitions can affect
market values. Clubs with higher revenues and successful track records are likely to influence higher
market valuations for their players. Apart from those indicators, broader economic factors, such as
inflation rates, currency exchange rates, and global economic health, can also impact football market
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0108
35
values. Studies examining the correlation between these economic indicators and football market trends
would provide valuable insights. By integrating these factors, future research can develop more
comprehensive models to predict football market values, aiding clubs, agents, and investors in making
informed decisions. Advanced statistical techniques and machine learning algorithms could be
employed to handle the complexity and interdependence of these factors, providing a robust framework
for market analysis.
References
[1] Kun Z 2002 Relation between supply and demand in the occupational football market of China.
Journal of Physical Education.
[2] Tobar F and Ramshaw G 2022 Welcome to the EPL: analysing the development of football
tourism in the English Premier League. Soccer and Society, 23(4), 432-450.
[3] Kennedy P and Kennedy D 2017 A political economy of the English Premier League. In
Routledge eBooks, 49-69.
[4] Adiwiyana H I and Harymawan I 2021 Factors that determine the market value of professional
football players in Indonesia. Jurnal Dinamika Akuntansi, 13(1), 51-61.
[5] Hägglund M, Waldén M and Ekstrand J 2012 Risk factors for lower extremity muscle injury in
professional soccer. ˜the œAmerican Journal of Sports Medicine, 41(2), 327-335.
[6] Majewski S M 2015 Is this a business that feeds on emotions or is it an ALTRUSM behavior?
Polish football financing case. Acta Universitatis Lodziensis. Folia Oeconomica.
[7] He M, Cachucho R and Knobbe A J 2015 Football Players Performance and Market Value.
LIACS, 87-95.
[8] Majewski S 2016 Identification of factors determining market value of the most valuable football
players. Journal of Management and Business Administration Central Europe, 24(3), 91-104.
[9] Metelski A 2021 Factors affecting the value of football players in the transfer market. Journal of
Physical Education and Sport, 21, 1150-1155.
[10] Adiwiyana H I and Harymawan I 2021 Factors that determine the market value of professional
football players in Indonesia. Jurnal Dinamika Akuntansi, 13(1), 51-61.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0108
36
Harmonic analysis approach to the proof of Heisenberg
inequality
Yuchen Wang
School of Northeast YuCai, Shenyang, 110000, China
yyluyinbo@tzc.edu.cn
Abstract. the Heisenberg uncertainty Principle is a fundamental principle in quantum mechanics,
which was developed by the German physicist Werner Heisenberg and was proposed by him in
1927. This principle states that for a pair of physical quantities that share phase space, such as
position and momentum, it is impossible to accurately measure their values at the same time.
There are several variants of it in harmonic analysis studies, and the article will introduce some
of them in space and space. In the process of providing the Heisenberg inequality, the
article proved the Plancherel identity and Schwartz inequality by using Fourier transform and
inverse Fourier transform. Finally, author solved the equation of the wave function 󰇛󰇜 . The
famous physicists Heisenberg proposed one of the more novel ideas in quantum mechanics the
existence of unobservable orbits cannot be assumed, which did bring great influence in quantum
mechanics. The article will introduce the conception of Heisenberg inequality and try to finish
the proof.
Keywords: Fourier transform, inverse Fourier transform, Cauchy-Schwartz inequality,
Plancherel identity.
1. Introduction
Harmonic analysis is a branch of mathematics that deals with the expansion of functions into Fourier
series or Fourier integrals and related problems. It originates from the superposition problems of
decomposing a periodic oscillation into simple harmonic oscillation in physics, and has now developed
into a discipline with wide application [1]. Harmonic analysis not only involves mathematics, but also
plays an important role in many disciplines such as information processing and quantum mechanics.
Harmonic analysis is also used in tidal analysis, through which the tidal changes in a certain period can
be calculated and the tidal properties of the area can be analyzed. Thus, harmonic analysis of tides is an
important method used in Marine engineering for the analysis prediction of tidal changes [2].
Quantum mechanics, as a physical theory, is a branch of physics that studies the motion laws of
microscopic particles in the material world. It mainly studies the basic theories of the structure and
properties of atoms, molecules, condensed matter, as well as atomic nuclear and elementary particles.
Together with relativity, it forms the theoretical basis of modern physics. Quantum mechanics is not
only one of the basics theories of modern physics, but also widely used in chemistry and many other
modern technologies [3].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0123
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
37
Heisenberg inequality, which is also called by Heisenberg principle of uncertainty, is the bridge
between the two theories. And the article will focus on how to prove the Heisenberg inequality using
harmonic analysis and apply the results to the quantum mechanics.
2. Methods and Theory
2.1. background knowledge and method
The author can use the method of taking the sum of a series of orthogonal basis to approximate a periodic
function, essentially turning it into a sum of functions representing different frequencies [4]
󰇛󰇜 
󰇛1󰇜
To calculate c, the author will use the properties of orthogonal basis to simplify the result.
Multiplying on both sides of the equations, the author will get
󰇛󰇜 󰇛󰇜
󰇛2󰇜
Taking the definite integral from 0 to T at both ends of the above equation
1
󰇛󰇜
0󰇛3󰇜
Now having accessed with the definition of Fourier series, the article will introduce Fourier transform
to you. The author will begin by expanding the function f(t) as Fourier series on the interval 󰇟󰇠
󰇛󰇜 
1
󰇛󰇜
2
2
󰇛4󰇜
On can take the limit of T tends to infinite, then the author will get [5]
󰇛󰇜1
2󰇧 󰇛󰇜
󰇨
󰇛5󰇜
Then, the article has shown the definition of Fourier transform, the new function is only related to
the given frequency w, which describes the distribution density of the component in f(t)
󰆹 󰇛󰇜
󰇛6󰇜
2.2. structure and content of the article
In the first part of the Sec. 3.1, the author will choose a certain dense subspace with good properties
󰇛1󰇜, in which space, the equality can be proved easily only through properties of complex numbers,
integration by parts, Cauchy-Schwartz inequality and the properties of rapidly decreasing function. All
the properties will be proved by the author later. In the second Sec. 3.2, the author will generalize the
results proved in 󰇛1󰇜 to a more general function space 2󰇛1󰇜. The author uses a function series
to approximate function f, which converges uniformly to 0 in the integral as n approaches infinity, which
is also convergent, the original function and derivative being convergent under the 2 norm. In this
circumstance, the derivative approximates the f derivative. The squares of the two norms remain 0 and
form the square of the integral. If n goes to infinity, the equation holds, which is easy to estimate later
with inequalities. Finally, the author finds the specific function 󰇛󰇜 by solving an ODE, hence getting
the results and the application condition.
3. Results and Application
3.1. proof in
󰇛󰇜space
The calculation and properties of complex numbers are very important in this paper. By doing which,
author will do the contraction [6]
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0123
38
1
2󰇛󰇜7
If , then 󰇛󰇜. One can prove the inequalities in 󰇛󰇜 space by
using the calculation related to the Fourier transform and the inverse Fourier transform.
󰆹󰇛󰇜 󰇛󰇜
8
󰇛󰇜 󰆹󰇛󰇜
9
By using the properties of rapidly decreasing function, one can know
lim
󰇛󰇜010
󰇛󰇜21
11
Then, let the author prove the Heisenberg inequality if f
󰇛󰇜, which is a rapidly decreasing
function.
42 2󰇛󰇜2
2
󰆹󰇛󰇜212

󰇛󰇜2
 2󰆹󰇛󰇜2
13
Then, the author will use Plancherel’s identity. The proof is as follows.
󰇛󰇜2
 󰇛󰇜󰇛󰇜
14
 󰇛󰇜
1
2 󰇛󰇜
15
1
2 󰇛󰇜
󰇛󰇜
16
1
2 󰇛󰇜
󰇛󰇜17
 1
2󰇛󰇜2
18
In the steps 2 and 3, just take inverse Fourier transform and Fourier transform in order. Then, by
using Plancherel’s identity, the author can turn the formula into [7]
󰇛󰇜2
󰈅󰇛󰇜
 󰈅2
19
Then, the author will use Cauchy-Schwarz’s inequality: For any two elements x and y in the inner
product space, Schwarz’s inequality states that the square of the absolute value of their inner product is
not greater than the product of their norms.
Here, the author is going to prove Schwartz’s inequality. For functions 󰇛󰇜,
󰇛󰇜
2 󰇡20󰇢 20
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0123
39
󰇛󰇜2󰇛󰇜󰇛󰇜
2󰇛󰇜
2󰇛󰇜󰇛󰇜󰇛󰇜
422󰇛󰇜2
221
And because of , then the formula can be written as [8]
󰇛󰇜22
The result is as required. According to the inequality which the author has proved yet, one can turn
upwards formula into
󰇛󰇜󰇛󰇜
223
Because the basic property of complex number (the norm of a complex number is greater than the
norm of its real part),
󰇛󰇜. Then, the author can change the result into
1
2
󰇡󰇡󰇢󰇢224
The properties of complex function show that If A=a+b, then 󰇛󰇜. By using
this properties, the author can rewrite result [9]
1
4 1
2󰇡󰇢
225
By using the derivative multiplication rule: 󰇛󰇜󰆒󰆒󰆒, the result 󰇛󰆒󰆒󰇜 can be
written as 󰇛󰇜󰆒. Thus, the result can be transformed into
1
4 󰇡2󰇢
226
In the next step, the author will use integration by parts
1
4󰇡2󰇢
2
227
Because the function is a rapidly decreasing function, which means the first polynomial equals to 0.
Then people can get the result
1
4󰇛󰇜2
21
42
428
People now know in this space, the norm of f is 1. Thus, the article have got the result
 2󰇛󰇜2
 2󰇛󰇜2
1
16229
3.2. proof in 󰇛󰇜space
Firstly, the author proved in a certain dense subspace with good properties. Then, the author will
generalize to a more general function space, which is proving the equality in 2space.
Because 20 (function f is a rapidly decreasing function in any space), one may assume that
󰆹2. If the opposite circumstance holds, there’s nothing to prove because the result will be much
greater. In this case, you can’t measure accurately both the location and the momentum of a particle.
This means the Plancherel’s identity which the author has proved,
2󰆹, also holds in the 21
space. Thus, the proof for 1 also holds for this circumstance [10]

󰇛30󰇜
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0123
40
Now, the author set a function in order to approximate f . Because the subspace is dense,
there are continuous functional series. As for  the series converges uniformly to 0 in the integral,
which is also analytically convergent, and the original function and derivative converge under the 2
norm. The function meets the requirement
lim
1422
󰆹2
󰇛31󰇜
lim
2
22
2lim
1422
󰆹20
󰇛32󰇜
They can be proved simply by finding two equations
2
2
󰆹2
2
2 422
󰆹2
󰇛33󰇜
And because of󰇛󰇜󰇛󰇜
󰆹. The author can use Cauchy-Schwartz inequalities to
zoom the formula
󰇩 14221
󰇪12
󰇩 14222
󰆹2
󰇪12
󰇛34󰇜
By using Schwartz’s inequality. For any fixed , and one can have
lim
lim
 
 
lim
lim
󰇩󰇛2󰇜
2
 󰇪
lim
󰇛󰇜2󰇛󰇜22
2󰇛35󰇜
The proof in this step is with the same logic with the one in 2.1, because the function is a rapidly
decreasing function, one can rewrite the result into
. Then, the author has finished the whole
proof in this space.
For the next step, the article will focus on the specific wave function . By observing the proof,
author finds that for specific , which always satisfies a differential equation 󰆒󰇛󰇜󰇛󰇜. Then,
the author solves the ODE by separating variables, the author gets the solution, which is 󰇛󰇜

, and 
, .
3.3. Results and Application
The exact expression of Heisenberg’s inequality first appeared in the study of quantum mechanics when
researchers were trying to determine the position and momentum of an example at the same time.
Suppose that there is a electron moving along a line and there are laws of physics that can be described
by a state function .
The position of the electron is described by the probability that the particle located in (a,b). Function
󰇛󰇜2 is the density function, and the expectation function is
󰇛󰇜2
󰇛36󰇜
Then the author can discuss the value of the x that minimizes the error, which is a great significance
in quantum mechanics. The error is 󰇛󰇜2󰇛󰇜2
󰇛37󰇜
And the error of the momentum is 󰇛0󰇜2󰇛󰇜2
󰇛38󰇜
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0123
41
4. Conclusion
According to the Heisenberg inequality the author has proved, the result is just the product of the error
of the position and momentum is greater than 1 162
. There are plenty of applications of Heisenberg
inequality. For example, magnetic resonance imaging is a medical imaging technique used to observe
the internal structure of biological tissues. In magnetic resonance imaging (MRI), the resonance signal
of the atomic nuclear can be obtained by applying the enhanced magnetic field and electromagnetic
pulse to the object under test. According to Heisenberg’s uncertainty principle, doctors cannot accurately
measure the position and momentum of an atomic nucleus at the same time, so in MRI, people can only
get position or momentum information to a certain extent, which is why MRI images are often blurry.
The Heisenberg Uncertainty Principle is a fundamental principle in modern physics that has profoundly
changed people’s understanding of the natural world. Although this principle prevents people from
accurately determining the position and momentum of an elementary particle at the same time, it has
not stopped people from using this principle to perform some important calculations and analysis. In the
future, with the development of science and technology, people may find more opportunities to use the
uncertainty principle to better understand and apply the fundamental laws of the natural world.
References
[1] McCarthy D W, Probst R C, Low F J. (1985). Infrared detection of a close cool companion to
Van Biesbroeck. Astrophysical Journal, 290, 29-42.
[2] Lévy-Leblond, Jean-Marc. (2021). Correlation of Quantum Properties and the Generalized
Heisenberg Inequality. American Journal of Physics, 54(2), 13536.
[3] Lahti, Pekka J., Maciej J. Maczynski. (1987). Heisenberg Inequality and the Complex Field in
Quantum Mechanics. Journal of Mathematical Physics, 28(8), 176469.
[4] Grünbaum, F. Alberto. (2023). The Heisenberg Inequality for the Discrete Fourier Transform.
Applied and Computational Harmonic Analysis, 15(2), 16367.
[5] Stan, Aurel. (2005). On Heisenberg Inequality. Communications in Contemporary Mathematics,
07(01), 7588.
[6] De La Peña, Luis. (1980). Conceptually Interesting Generalized Heisenberg Inequality. American
Journal of Physics, 48(9), 77576.
[7] Mueller, C., and Stan A. (2005). A Heisenberg Inequality for Stochastic Integrals. Journal of
Theoretical Probability, 18(2), 291315.
[8] Wiener, Norbert. (1930). Generalized Harmonic Analysis. Acta Mathematica, 55, 117258.
[9] Hewitt, Edwin, and Kenneth A. Ross. Abstract Harmonic Analysis. Springer Berlin Heidelberg,
1963.
[10] Schwab, Keith C., and Michael L. Roukes. (2015). Putting Mechanics into Quantum Mechanics.
Physics Today, 58(7), 3642.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0123
42
Research on the influencing factors of student performance
Chenrui Pei
School of Civil Engineering, Southwest Jiaotong University, Chengdu, 610000, China
pcr21cp2@outlook.com
Abstract. The aim of this report is to analyze the factors influencing student performance and to
develop a predictive model for Grade Point Average (GPA) based on five aspects: demographic
details, study habits, parental involvement, extracurricular activities, and academic achievement.
Utilizing a multiple linear regression model, this report identifies key factors that significantly
impact academic performance. The dataset includes a total of 14 student characteristics, such as
parental education level, weekly study time, extracurricular activities, absences and so on.
Through stepwise regression, non-significant factors were iteratively eliminated, leading to the
development of a predictive model to determine the primary influences on student performance.
The research findings underscore the significant role of weekly study time, absences, tutoring,
parental support, extracurricular activities, sports, and music in student performance. In contrast,
age, gender, ethnicity, parental education, and volunteering have negligible impact on GPA.
These insights provide actionable guidance for educators and policymakers to implement
targeted measures to enhance student performance.
Keywords: Student performance, GPA, multiple linear regression.
1. Introduction
Student performance is a fundamental criterion for evaluating excellence, as it reflects learning ability,
intelligence, self-management skills, and more. High scores can boost students' self-confidence, help
them gain admission to better universities, secure scholarships, and attract employers' attention,
significantly impacting their future success [1]. The importance of academic performance often leads to
anxiety. The Survey Report on Chinese Parents' Educational Anxiety Index, released in September 2018,
analyzed 3205 questionnaires and found that the comprehensive anxiety index of parents' education
reached 67 points out of 100, indicating a relatively high level of anxiety [2]. Therefore, it is crucial to
discover the factors related to student achievement. The factors influencing student achievement and
educational outcomes are multifaceted, complex, and interrelated. Students' attributes and abilities,
social relationships, and family and societal structures all impact academic performance to varying
degrees [3]. Moreover, studies have shown that students' academic performance is related to their
cognitive style (CS), self-regulated learning (SRL), and working memory (WM) [4]. This paper aims to
identify suitable methods to determine the factors influencing student achievement and predict their
impact.
In 2021, Alani and Hawas conducted a comprehensive study on the factors affecting student
performance at Sohar University. They surveyed various faculties, gathering data from 562 students
through questionnaires. This data was critically analyzed using regression analysis. The study revealed
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0131
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
43
that environmental factors significantly influence student performance, with students expressing a
preference for a quiet and comfortable university environment. Furthermore, the linear regression model
indicated that teachers with strong teaching skills and diverse teaching techniques positively impact
student performance [5].
In 2023, a teacher discovered a significant positive correlation between academic performance and
volitional quality through a comparative test of these factors in ordinary and excellent classes at a high
school. The volitional characteristics of students vary significantly across different grades and academic
levels, while gender differences in volitional character strength are not pronounced [6].
In 2024, Kocsis and Molnár conducted a study using meta-analyses and systematic reviews of up to
900 studies based on 600,000 university students to identify factors affecting student performance. The
results showed that output variables GPA and obtained credits (ECTS) are mediated by two parts:
student factors and throughput factors. Student factors include intrinsic motivation, self-regulated
learning strategies, self-efficacy, and prior education, while throughput factors include work, finances,
and academic engagement. However, there were contradictory results regarding age and family
conditions. GPA, ECTS, and gender are the most relevant factors affecting student performance [7].
In summary, this report will use regression models to identify factors impacting student learning and
build models to predict the relationship between student achievement and different factors.
2. Methodology
2.1. Data source
The dataset for this paper is from the Kaggle website (Student Performance Dataset). This dataset
contains comprehensive information from 2392 high school students, and all datasets were used in this
paper.
2.2. Variable selection
The dataset is sufficient and there are no missing data. Due to the fact that GPA and grade class are both
indicators of student academic performance, this paper chooses to delete grade class. In addition, as the
Student ID is only a serial number and has no impact on GPA, it is deleted.
Table 1. List of dependent and independent variables.
Variable
Logogram
Meaning
Age
𝑥1
The age ranges from 15 to 18 years
Gender
𝑥2
Male (0), Female (1)
Ethnicity
𝑥3
Caucasian (0), African American (1), Asian (2), Other (3)
Parental Education
𝑥4
None (0), High School (1), College (2), Bachelor's (3),Higher (4)
Study Time Weekly
𝑥5
Weekly study time in hours
Absences
𝑥6
Number of absences during the school year
Tutoring
𝑥7
No (0), Yes (1)
Parental Support
𝑥8
None (0), Low (1), Moderate (2), High (3), Very High (4)
Extracurricular
𝑥9
No (0), Yes (1)
Sports
𝑥10
No (0), Yes (1)
Music
𝑥11
No (0), Yes (1)
Volunteering
𝑥12
No (0), Yes (1)
GPA
𝑌
Grade Point Average on a scale from 2.0 to 4.0
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0131
44
The final selected data consists of 12 variables (age, gender, ethnicity, parental education, study time
weekly, absences, tutoring, parental support, extracurricular, sports, music, volunteering) and a
dependent variable (GPA). The specific student characteristics of this dataset are shown in Table 1.
2.3. Method introduction
This article employs a multiple linear regression model to fit student grades. In statistics, linear
regression determines a line that best represents the overall trend of a data set [8]. Multiple linear
regression is a statistical technique used to analyze the impact of several independent variables on a
dependent variable. This section will mainly aim to compare the predictive ability and fitting accuracy
of the model before and after removing some variables. The initial model includes 12 potential
explanatory variables.
By using the stepwise regression method, iteratively remove variables that show low statistical
significance. Stepwise regression is a technique that uses an automated process to select predictor
variables. This method evaluates variables at each step based on criteria for a series of T or F tests,
ultimately determining the final group of variables for the regression [9]. Rebuild the model after each
elimination and re-evaluate the remaining variables. After completing the stepwise regression, select the
final multiple linear regression model.
3. Results and discussion
3.1. Descriptive statistics
Visualizing the impact of gender and extracurricular on grade class through line graphs (Figure 1). The
division between genders is roughly equal, indicating minimal influence on student academic
performance. However, students participating in extracurricular activities demonstrate significantly
better grades compared to those who do not participate.
Figure 1. Line charts of Gender and Extracurricular on Grade Class
Bar charts effectively illustrate the quantitative relationships between ethnicity, parental support, and
student performance. As shown in Figure 2, the analysis reveals that the proportion of students across
different grades remains consistent, indicating that ethnicity does not affect student performance.
Conversely, there is a clear trend showing that higher levels of parental support correlate with better
student grades.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0131
45
Figure 2. Bar charts of Ethnicity and Parental Support on Grade Class
Scatter plots are used to measure the number of absences and age. Scatter plots visually display the
relationship between two variables and the approximate distribution of the data. They provide key
information such as data distribution, sample size, and the identification of outliers [10]. By studying
the distribution of the points on Figure 3, it is aimed to determine the correlation and to summarize the
distribution pattern of the points. For age, Figure 3 shows an approximate distribution of each age at
different grade class, suggesting that age has no impact on student performance. For absences, there is
a clear negative correlation between student performance and the number of absences. The more
absences, the lower the student grades.
Figure 3. Scatter plots of Absences and Age on GPA
3.2. Correlation analysis
In the dataset, there are a total of 12 factors that affect student performance, and the Pearson correlation
coefficient between these factors and GPA is shown in the following figure 4:
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0131
46
Figure 4. Sample Figure Caption
The study data reveals that absences have the strongest negative correlation with GPA, indicating
that the more a student is absent, the poorer their academic performance. In contrast, GPA shows
significant positive correlations with weekly study time, tutoring, and parental support, suggesting that
extracurricular study and external assistance can significantly boost academic performance.
Additionally, activities such as extra curricular, sports, and music are positively correlated with GPA,
though these correlations are not statistically significant. Factors like age, gender, ethnicity, parental
education, and volunteering exhibit very weak correlations with GPA. Overall, the factors influencing
GPA are multifaceted, with attendance, study habits, and parental support playing crucial roles.
3.3. Model
3.3.1. Initial Model
After conducting a correlation analysis of the factors influencing student performance, a multiple
regression analysis was performed to establish a comprehensive model that includes all variables. The
general mathematical model for multiple linear regression is as follows:
𝐸(𝑌)= 𝛽0+ 𝛽1𝑥1+ 𝛽2𝑥2+ + 𝛽12𝑥12 + 𝑒 (1)
In the above formula: is a constant term, and e is the error term accounting for the variability not
explained by the independent variables.
Table 2 presents the regression coefficients of the multiple linear regression model. From the table,
it can be observed that X1, X2, X3, X4, and X12 have no significant impact on the dependent variable,
as their p-values are greater than 0.05. Additionally, all variables have VIF values close to 1, indicating
that there is no issue of multicollinearity. Therefore, there are 7 independent variables that have a
significant impact on the dependent variable Y. Based on the regression coefficients, the multiple linear
regression equation is as follows:
𝐸(𝑌)=1.3391 0.006𝑥1+0.011𝑥2+⋯0.005𝑥12 (2)
The fitted multiple linear regression model yields an R-squared value of 0.954 and an adjusted R-
squared value of 0.954, indicating a high degree of fit. Figure 5 shows the line plot comparing the test
data with the predicted data. The trends of the two lines are consistent and exhibit a high degree of
similarity. This suggests that the model effectively captures the overall trend of the data and achieves
high predictive accuracy, even though there are some deviations in specific values.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0131
47
Table 2. Regression coefficient table for the initial model
B
S.E.
Beta
T
P>|t|
VIF
Constant
1.3391
0.015
1.901
87.483
0.000
11.563
X1
-0.006
0.005
-0.006
-1.424
0.155
1.012
X2
0.011
0.009
0.005
1.164
0.245
1.006
X3
0.005
0.004
0.005
1.085
0.278
1.004
X4
0.000
0.005
0.000
0.027
0.978
1.006
X5
0.166
0.005
0.166
36.687
0.000
1.005
X6
-0.844
0.005
-0.844
-187.141
0.000
1.004
X7
0.258
0.010
0.119
26.279
0.000
1.005
X8
0.148
0.004
0.165
36.640
0.000
1.004
X9
0.190
0.009
0.092
20.394
0.000
1.005
X10
0.185
0.010
0.085
18.861
0.000
1.006
X11
0.153
0.011
0.061
13.467
0.000
1.005
X12
-0.005
0.012
-0.002
-0.425
0.671
1.004
Figure 5. Test data and predicted data
3.3.2. Stepwise regression
Utilizing backward elimination, predictors will be iteratively removed from the initial model if their p-
values exceed the threshold of 0.05. Starting with the full model, X1, X2, X3, X4, and X12 will be
eliminated based on their initial p-values. After each removal, the model will be refitted, and the process
will be repeated until all remaining predictors have p-values below the threshold.
Based on the data above, it is evident that all predictor variables have p-values less than 0.05,
indicating significant effects on the dependent variable Y. Among them, X6 has a negative effect, while
the others have positive effects. The VIF values are relatively low, suggesting little multicollinearity
among the predictors. Therefore, the improved linear regression equation is:
𝐸(𝑌)=1.348 +0.166𝑥50.844𝑥6+⋯+0.152𝑥11 (3)
The fitted multiple linear regression model yields an R-squared of 0.954 and an adjusted R-squared
of 0.954, indicating a high degree of fit. The F-statistic is a statistical measure used to assess the overall
significance of the model. In this model, the F-statistic is 5649, with a corresponding probability value
close to 0, indicating that the model is significant.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0131
48
Table 3. Regression coefficient table for the improved model
B
S.E.
Beta
T
P>|t|
VIF
Constant
1.348
0.011
1.901
118.566
0.000
-
X5
0.166
0.005
0.166
36.775
0.000
1.003
X6
-0.844
0.005
-0.844
-187.392
0.000
1.002
X7
0.258
0.010
0.118
26.293
0.000
1.350
X8
0.148
0.004
0.165
36.624
0.000
2.015
X9
0.190
0.009
0.092
20.502
0.000
1.443
X10
0.186
0.010
0.085
18.965
0.000
1.341
X11
0.152
0.011
0.061
13.455
0.000
1.209
3.3.3. Comparison results
Based on the results of two linear regression models, Table 4 lists the characteristics of the two models
used to compare their performance in fitting and predictive ability.
Table 4. Comparison between the two models
Initial model
Improved model
R-squared:
0.954
0.954
Adj. R-squared
0.954
0.954
F-statistic
3295
5649
MSE
0.03866
0.03877
RMSE
0.19663
0.19691
AIC
-776.1
-781.4
BIC
-703.9
-737.0
Based on the comparison, the two sets of model results show very close values for R-squared,
adjusted R-squared, MSE, and RMSE, indicating similar performance in fitting the data and predicting
accuracy. However, the F-statistic value of 5649 for the improved model is significantly higher than the
initial model, suggesting that the variables in the second model have a more significant overall impact
on the dependent variable (GPA). From the perspective of AIC and BIC, the values of the improved
model are smaller, indicating a slight advantage in balancing model fit and complexity. In conclusion,
the improved model, as compared to the initial model, demonstrates better variable influence, fitting
effectiveness, and simplicity.
4. Conclusion
This study aims to explore the factors influencing student achievement and predict their impact through
comprehensive data collation and multiple linear regression analysis. The dataset includes information
from 2,392 students and initially comprises 13 variables. After thorough preprocessing, the dataset
underwent analysis using various visualization techniques such as line charts, bar charts, and scatter
plots. These visualizations provided a preliminary analysis of the significance and correlation (positive
or negative) of various factors on student achievement. Notably, among the three factors negatively
correlated with GPA, except for absences, gender and parental education showed no significant
correlation with GPA.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0131
49
This study initially employs a multiple linear regression model, integrating all influencing factors to
further examine their relationship with student performance. Factors with low correlations were
excluded based on p-values, and VIF values were checked to avoid multicollinearity issues. Ultimately,
stepwise regression confirmed that seven factors significantly impact student performance: study time
weekly, number of absences, tutoring, parental support, extracurricular, sports and music. Among these
factors, absences were negatively correlated with student performance, while the other factors were
positively correlated.
In this study, the influence of different factors on students' achievement is determined, but only the
overall prediction is made, and the impact of each factor on students' achievement cannot be accurately
specified. In order to improve this, different factors can be reanalyzed and grouped, and linear regression
can be performed again to obtain the influence of single or a small number of combined factors on
student achievement.
References
[1] Plessis S 2023 5 Reasons Why Grades Are Important. Working paper.
[2] Li J 2021 A Study on the Formation Mechanism of Educational Anxiety among Parents of
Primary and Secondary School Students: A Case Study of Chongqing City, Chongqing
University of Business and Technology, 10, 16-21.
[3] Utah State Board of Education 2019 Factors influencing student learning. Hanover Research, 1-
5.
[4] Wang T and Kao C 2022 Investigating factors affecting student academic achievement in
mathematics and science: cognitive style, self-regulated learning and working memory.
SpringerLink, 50(5), 789-806.
[5] Alani F S and Hawas A 2021 Factors Affecting Students Academic Performance: A Case Study
of Sohar University. PSYCHOLOGY AND EDUCATION, 58(5), 4624-4635.
[6] Yong Z 2023 A Study on the Correlation between Academic Performance and Willpower Quality
of High School Students, Journal of Ningxia University, Humanities & Social Sciences Edition
45(4), 142-149.
[7] Kocsis A and Molnar G 2024 Factors influencing academic performance and dropout rates in
higher education. Oxford Review of Education, 1-19.
[8] Stewart K 2024 Linear regression Britannica. Working paper.
[9] Miller A and Panneerselvam J 2021 A review of regression and classification techniques for
analysis of common and rare variants and gene-environmental factors. Science Direct, 466-
485.
[10] Sainani K L 2016 The Value of Scatter Plots. Statistically Speaking, 1213-1217.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0131
50
Analysis of the Relationship between NBA Player Salary and
Their On-Court Performances
Zijian Yang
Department of Statistics, University College London, WC1E 6BT, the United
Kingdom
Christianyzj@outlook.com
Abstract. This research attempts to investigate the connection between NBA player salary and
on-court performance. By collecting and analyzing NBA player salary data and related game
statistics, some interesting trends and correlations are found. Through scatter plots, error-bar, bar
chart and clustered line showing the direct relationships between the factors of players
performance and their salaries. In addition, the importance of the independent and dependent
variables is examined using correlation analysis in order to judge their positive or negative
relationships. Linear Regression model could show the level of influencing on the variables. The
results of the study show that some highly paid players perform well in the game. Further
statistical analysis shows that playersscore attempt is not the only factor affecting their salaries,
and factors such as assists and blocks made per game also play an important role. These findings
have implications for managers, players and fans, and help to better understand and evaluate the
true value of players.
Keywords: Player salary, correlation analysis, linear regression model.
1. Introduction
The National Basketball Association (NBA) stands as one of the most prominent professional basketball
leagues globally, featuring elite athletes known for their exceptional skills and commanding salaries.
The analysis of NBA player salaries is a topic of significant interest, influenced by various factors that
shape the financial landscape of the league. Understanding the determinants of NBA player salaries is
crucial for players, team management, fans, and researchers seeking insights into the intricate dynamics
of sports economics.
Players' performances generate income for the owners, who then pay the players according to these
earnings [1]. Wang used adaptive Lasso, SCAD and Elastic Net to explore the main factors affecting
the level of players' salary in the statistical analysis of NBA players' salary, and found that Ridge
regression, Lasso and Elastic Net had similar mean square error due to other models. All are located
near 0.21 [2]. Many empirical studies have examined wage trends in baseball and other sports due to the
wealth of performance and compensation data available for athletes in professional team sports [3].
Regressions calculating distinct income and performance trajectories for each talent quintile were
conducted in order to demonstrate the degree of bias generated by typical ordinary least squares (OLS).
By using this method, the likelihood that pooled regressions of productivity or income on experience
would provide a "flatter" temporal profile than what is actually the case will be decreased [3]. The fact
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0147
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
51
that this player is in the league is the cause of the significant TV money these clubs are making, along
with the other NBA players. As a result, players must get payment for both their on-court performance
and the portion of TV contract earnings that they are accountable for [4]. The test statistics and all of
the coefficients are valid and significant.
In terms of most performance metrics, like as points or rebounds, a player's stronger performance
during contract year will translate into more financial savings for the organization in the year of signing
a new deal [5]. The top 25% and top 50% of NBA players had the biggest rises in their proportion of
total salary paid. In the 198586 season, the top 25% received almost 56% of all wages paid; in the
201516 season, they received roughly 64% of all salaries paid, an increase of eight percentage points.
The percentage of total compensation paid to athletes in the top 50% increased similarly [6]. Greater
returns to skill: The most proficient athletes will probably make more money, while the less skilled
athletes will probably make less, if returns to skill rise and can account for more variance in earnings
[6]. In this study, usage rate (USG%) also yielded an interesting finding. The number of possessions a
player "uses" in a game is known as their use rate. Thus, athletes that are the center of attention for
offensive, such as point guards like Kevin Durant and LeBron James, position played, market size,
endorsements, and team success, this study seeks to uncover patterns and relationships that shed light
on the salary structures with high usage rates. Salary and USG% have a positive correlation, which
makes sense given that a player who is using the ball more frequently is taking more shots for his team
[7]. Applying a straightforward regression analysis to confirm the relationship between pay and altruistic
behavior. It is discovered that there is no collinearity in this regression when it comes to the association
between altruistic conduct and wage, since the VIF value is less than 10 and the overall model F value
is significant. Regression coefficient β=0.457 shows that altruistic behavior is significantly positively
impacted by wage [8].
It is believed that players may accept the idea that some players are better than others and that the
better players should get paid more, regardless of the work and production of each individual player.
However, the second aspect of pay disparity is what this paper refers to as "unjustified inequality," or
inequality that is not supported by and dependent on performance judgments included in the model of
compensation determination [9]. Empirical research by Staw and Hound demonstrated that the National
Basketball Association (NBA) uses the draft order in addition to a player's predicted on-court production
when allocating playing time [10]. This statistical research aims to delve into the key factors influencing
NBA player salaries. By analyzing a diverse set of variables such as player performance metrics, through
a rigorous statistical approach, this paper aims to provide a nuanced understanding of the intricacies
surrounding NBA player compensation, offering valuable insights into the drivers of remuneration in
professional basketball.
2. Methods
2.1. Data Source
The Kaggle website has the dataset that was utilized in this work (NBA Player Salaries, 2022-2023
Seasons). This dataset contains several factors of the playerson-court performance with 467 samples
and more than 50 variables. The dataset combines player per game and advanced statistics from the
NBA 2022-2023 season with player salary data to create a comprehensive resource for learning about
the financial and performance elements of basketball players that play professionally. The dataset is the
outcome of obtaining traditional per-game and advanced statistics from Basketball Reference in addition
to player salary data from Hoopshype.
2.2. Variable Selection
The paper's data set includes a total of 467 NBA players with different positions and different ages.
However, their on-court performance (Free Throw, 3 Points Attempts, 2 Points Attempts, Blocks Per
Game, Assists and Win Shares), those 6 variables are the determinants of players salary. The basic
overview of each quantitative variable is shown in Table 1.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0147
52
Table 1 lists the six components of NBA players' on-court performance together with the maximum,
minimum, mean, standard deviation, and median values of their salary. The total number of samples are
467 and 7 variables are chosen.
Table 1. Overview of quantitative variables.
Variables
Min
Max
Mean
SD
Median
Salary (Million)
0.005
48.07
8.417
10.708
3.7
FT
0
10
1.436
1.569
0.9
3PA
0
11.4
2.793
2.261
2.4
2PA
0
17.8
4.325
3.571
3.2
BLK
0
2.5
0.379
0.364
0.3
AST
0
10.7
2.108
1.958
1.4
WS
-1.6
12.6
2.329
2.533
1.5
2.3. Method Introduction
The effect of X (quantitative or categorical) on Y (quantitative), as well as the existence, direction, and
strength of any influence link, are investigated using regression analysis. Initially, the model's fitting is
examined using the R-square value. The VIF value and tolerance value may also be examined; tolerance
= 1/VIF value and a VIF value more than 5 suggests the presence of a collinearity issue. Tolerance less
than 0.2 suggests a collinearity issue. Check to see whether the model has any collinearity issues.
whether so, ridge regression or stepwise regression can be used to remedy the issues. Next, the
importance of X is examined in this work; if it is significant (p value < 0.05 or 0.01), it indicates that X
influences Y. A detailed analysis of the impact relationship's direction is then provided.
3. Results and Discussion
3.1. Descriptive Analysis
Figure 1 shows 6 factors of NBA players’ on-court performance that might be related to their salaries
and what the relationships are between the 6 factors and salaries respectively. It is clearly to see from
the 6 scatter plots that the 6 factors (Free Throw, 3 Points Attempts,2 Points Attempts, Blocks Per Game,
Assists and Win Shares) have a remarkable relationship with NBA players’ salaries.
Figure 1. Scatter plot of the relationship between FT and Salary
The scatter plot above clearly shows the linear relationship between FT and Salary, and the
relationship between these two variables is positively linear and strongly correlated. The scatter plot
shows that when NBA players use more FT on the court, their Salary will be higher.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0147
53
Figure 2. Scatter plot of the relationship between 3PA and Salary.
Figure 2 shows that the relationship between the two variables 3PA and Salary is positively and
moderately related. It means that NBA player could get much higher salary when they get more 3 points
attempt on the court.
Figure 3. Pie chart of the relationship between 2PA and Salary
It can be seen from the above Figure 3, ROC curves are constructed for a total of one Salary item to
judge its diagnostic value for 2PA, and the "gold standard" is set first. Take the number 1.000 as the
cutting point, 1.000 as the positive, and the others as the negative. The proportion of positive is 3.00%,
and the proportion of negative is 97.00%.
Figure 4. ErrorBar of the relationship between BLK and Salary
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0147
54
Figure 4 indicates that BLK has a week but still positive relationship with Salary, since when the
value of BLK gets higher, the value of Salary gets approximately higher. The trend is approximate but
still clear and convincing.
Figure 5. Bar Chart of the relationship between AST and Salary
The figure 5 above demonstrates that the relationship between AST and Salary is strong and positive.
Because the bars from the left to the right get nearly taller and taller, it can sufficiently tell that with
more assists on the court, NBA players could get higher salary.
Figure 6. Clustered line of the relationship between WS and Salary
Figure 6 shows that WS has a quite strong and approximately positive relationship with Salary. In
above Figure 1- 6, it is not hard to find that those 6 factors (Free Throw, 3 Points Attempts, 2 Points
Attempts, Blocks Per Game, Assists and Win Shares) all have a direct, strong and positive relationship
with players salary which means that NBA players could get a higher salary by better on-court
performance.
3.2. Correlation Analysis
Table 2 below shows how the correlation analysis was used to examine the relationship between Salary
and FT, 3PA, 2PA, BLK, AST, and WS, respectively, and how strong the relationship was expressed
using the Pearson correlation coefficient.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0147
55
Table 2. Pearson correlation between 6 factors and the salary
Salary
FT
0.674**
3PA
0.492**
2PA
0.682**
BLK
0.301**
AST
0.594**
WS
0.625**
* p<0.05 ** p<0.01 significant level
There is a considerable positive association between salary and FT, as shown by the correlation value
of 0.674 between the two variables and the significance level of 0.01. There is a considerable positive
association between salary and 3PA, as seen by the correlation value of 0.492 between the two variables,
which has a significance level of 0.01. There is a considerable positive association between salary and
2PA, as shown by the correlation value of 0.682 between the two variables with a significance level of
0.01. There is a substantial positive link between Salary and BLK, as indicated by the correlation value
of 0.301 between the two variables, with a significance level of 0.01. There is a considerable positive
association between salary and AST, as shown by the correlation value of 0.594 between the two
variables, which is significant at the 0.01 level. There is a substantial positive association between Salary
and WS, as indicated by the correlation value of 0.625 and significance of 0.01, respectively. Table 3
presents that the salary is related to FT, 3PA, 2PA, BLK, AST and WS, and the strength of the
association was represented by the Pearson correlation coefficient.
Table 3. The correlation among 7 variables
Salary
FT
3PA
2PA
BLK
AST
WS
Salary
1
FT
0.674**
1
3PA
0.492**
0.488**
1
2PA
0.682**
0.871**
0.455**
1
BLK
0.301**
0.294**
-0.059
0.383**
1
AST
0.594**
0.646**
0.584**
0.694**
0.084
1
WS
0.625**
0.719**
0.369**
0.693**
0.491**
0.540**
1
There are many values indicate that the relationship between 6 factors of on-court performance and
playerssalary is significant and positive since the calculated the coefficients is less than 1and more
than 0 (0.674,0.492,0.682,0.301,0.594 and 0.625).
3.3. Regression Results
Table 4 below shows that firstly, the model's fitting is examined; specifically, the R-square value, the
VIF value, and the tolerance value may all be used to study the model's fitting. Tolerance = 1/VIF value;
in general, a VIF value > 5 indicates the presence of a collinearity issue, and a tolerance <0.2 does the
same. Check the model to see if there are any collinearity issues.
Table 4. Results of Linear Regression
Unstandardized Coefficients
Standardized
Coefficients
t
p
Collinearity
Diagnosis
B
SE
Beta
VIF
Constant
-2476947.598
670573.077
-
-
3.694
0.000**
-
-
FT
1125733.724
471445.409
0.165
2.388
0.017*
4.964
0.201
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0147
56
3PA
828401.61
188199.978
0.175
4.402
0.000**
1.643
0.609
2PA
608793.669
215148.025
0.203
2.83
0.005**
5.359
0.187
BLK
2424712.235
1158363.963
0.083
2.093
0.037*
1.617
0.618
AST
748267.705
267460.844
0.137
2.798
0.005**
2.488
0.402
WS
787786.48
213926.572
0.186
3.683
0.000**
2.666
0.375
R 2
0.558
Adj R 2
0.552
F
F (6,460)=96.810,p=0.000
D-W
Value
1.092
Denote: Dependent Variable=Salary
* p<0.05 ** p<0.01
For the purposes of the linear regression analysis, the dependent variable is salary, and the
independent variables are FT, 3PA, 2PA, BLK, AST, and WS from Table 4. The following is the model
formula:
𝑆𝑎𝑙𝑎𝑟𝑦 = 2476947.598 +1125733.724 𝐹𝑇 + + 787786.480 𝑊𝑆 (1)
With an R-square value of 0.558, the model can account for 55.8% of the difference in salary between
FT, 3PA, 2PA, BLK, AST, and WS. The model's F-test resulted in a passing score (F=96.810,
p=0.000<0.05), suggesting that at least one of the factors FT, 3PA, 2PA, BLK, AST, and WS affected
salary. Furthermore, the model's multicollinearity test reveals that the model had VIF values larger than
5. However, if it is less than 10, it may indicate the presence of a specific collinearity issue that may be
resolved via stepwise or ridge regression. Additionally, it is advised to look for independent factors that
show strong association, exclude those variables, and then re-analyze. Additionally, the model's F-test
results show that it passed (F=96.810, p=0.000<0.05), indicating that the model's design is significant.
4. Conclusion
In this research, it indicates that a degree of correlation between NBA players' salary levels and their
on-court performance. Highly paid players tend to have better on-court performances. Playerssalaries
have strong significance with those 6 aspects of performances (FT, 3PA, 2PA, BLK, AST and WS).
Each of those factors have a positive relationship with playerssalaries. It represents that with better on-
court performance, players could get higher salaries. Visual figures (Scatter plot, Error Bar, bar chart
and Clustered Line), Correlation analysis and Linear Regression model in the context accurately
demonstrate and support the result of the study.
It cannot be denied that due to limited amount of data, this model might have some errors related to
the variables, and the sample did not cover all seasons and players, causing slightly differences, which
might affect the accuracy of results. However, the advantages cannot be ignored as well. The graphical
strategy comprehensively shows the visualization of the variables and makes the result clearer.
Further research can explore more subareas, such as the relationship between player salary and
performance at different positions, and player performance in the playoffs and regular season, to learn
more about the connection between player performance and remuneration. Finally, this study provides
some insights into the complex relationship between NBA player salary and on-court performance,
which can help club managers and sports agents make more informed decisions on player salary setting
and transfer strategies.
References
[1] Tarman, A. (2005) The Effect of Monopsony Power in Major League Baseball on the Salaries of
Players with Less Than Six Years in the Majors. Honors Projects, 31.
Table 4. (continued).
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0147
57
[2] Yan. C.H. (2023) A comparative analysis of multi-model salary classification prediction for
American baseball players. Journal of Nanning Normal University (Natural Science Edition),
40, 79-87.
[3] Hakes, J.K. and Turner, C. (2011) Pay, productivity and aging in Major League Baseball. Journal
of Productivity Analysis, 35, 61-74.
[4] Stanek, T. (2016) Player Performance and Team Revenues: NBA Player Salary Analysis. CMC
Senior Theses. Claremont McKenna College.
[5] Li, N.Y. (2014) The determinants of the salary in NBA and the overpayment in the year of signing
a new contract. Dissertations & Theses - Gradworks. Clemson University.
[6] Jonah, F. (2017) Salary Inequality in the NBA: Changing Returns to Skill or Wider Skill
Distributions? CMC Senior Theses.
[7] Daniel. H. (2014) An Analysis of New Performance Metrics in the NBA and Their Effects on
Win Production and Salary. The faculty of the University of Mississippi in partial fulfillment
of the requirements of the Sally McDonnell Barksdale Honors College.
[8] Hsiung, T.L. (2014) The Relationships among Salary, Altruistic Behavior and Job Performance
in the National Basketball Association. Center for Promoting Ideas, USA.
[9] Simmons, R. and Berri, D.J. (2011) Mixing the princes and the paupers: Pay and performance in
the National Basketball Association. Labour Economics, 18, :381-388.
[10] Nuesch, S. (2009) A note on the endogeneity of the pay-performance relationship in professional
soccer. Economics Bulletin, 29, 1850-1855.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0147
58
Schrödinger equation for various quantum systems based on
Heisenberg's uncertainty principle
Kexin An
Department of Mathematics, The Ohio State University. 281 W Lane Ave, Columbus,
OH 43210
an.416@osu.edu
Abstract. This article establishes the proof of the Schrödinger equation for numerous quantum
systems, utilizing Heisenberg's uncertainty principle. The Fourier transform connects functions
in the time and frequency domains, resulting in the mathematical inequality that is the foundation
of the uncertainty principle. In the part of Methods and Theory, the article derives the uncertainty
principle through Fourier transforms by defining the mean and variance of angular frequency
and time, and subsequently expanding the integral. This establishes the fundamental connection
between time and frequency domains, illustrating the constraints imposed by quantum mechanics.
In the part of Results and Application, the article applies the uncertainty principle to derive the
Schrödinger equation under different conditions: free particle, particle in a box, harmonic
oscillator, and hydrogen atom. For each case, the article assumes wave function solutions, uses
the uncertainty in position and momentum to estimate kinetic and potential energies, and shows
that the total energy matches the ground state energy derived from the Schrödinger equation. The
results highlight the critical role of Heisenberg's uncertainty principle in understanding key
aspects of quantum mechanics, providing a unified framework for these diverse systems.
Keywords: Fourier Transform, Heisenberg's Uncertainty Principle, Quantum Mechanics,
Schrödinger Equation.
1. Introduction
Quantum mechanics is the essential theory that describes particles' behavior at the atomic and subatomic
levels. It provides a framework for understanding the physical properties of nature at small scales, where
classical mechanics fails to apply. The development of quantum mechanics has led to numerous
technological advancements, including semiconductors, lasers, and quantum computing [1]. By
describing the wave-particle duality of matter and energy, quantum mechanics reveals the probabilistic
nature of physical phenomena, which is essential for the accurate prediction and manipulation of
microscopic systems [1]. Heisenberg's uncertainty principle is the core of quantum mechanics,
underscoring the fundamental limits of measurement and observation in the quantum realm.
Mathematically, the uncertainty principle can be derived using Fourier transforms, which relate
functions in the time and frequency domains. The principle can be expressed as ΔΔ
2. The
relationship between Heisenberg's uncertainty principle and Fourier transforms emphasizes the
relationship between time and frequency domains, which is essential for comprehending the behavior
of quantum systems [2]. The uncertainty principle has diverse applications in quantum mechanics,
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0117
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
59
including elucidating the stability of atoms, the behavior of particles in box, and the quantization of
energy levels.
This article is organized as the following. In the part of Methods and Theory, by using Fourier
Transform to Prove Heisenberg uncertainty principle, it explains how the Heisenberg uncertainty
principle is derived using the properties of Fourier transforms. The derivation starts with mathematical
inequality and proceeds through defining the mean and variance of angular frequency and time. By
interpreting these results, the uncertainty principle is established. The part of Fourier transform in time-
dependent Schrödinger equation discusses the application of Fourier transforms in quantum mechanics,
specifically in transitioning between the position and momentum representations of the wave function.
The time-dependent Schrödinger equation, a foundational equation in quantum mechanics, is introduced,
describing how a physical system's quantum state changes over time [3]. In the results and application,
using Heisenberg's uncertainty principle to prove Schrödinger equation under free particle condition, it
assumes a plane wave solution for a free particle and demonstrates how the uncertainty principle leads
to the time-dependent Schrödinger equation.
The key steps involve recognizing the relationships between energy, momentum, and the wave
function's form. When using Heisenberg's uncertainty principle to prove Schrödinger equation under
particle in a box, it considers a particle confined in a one-dimensional box. It shows how the uncertainty
in position and momentum aligns with the quantized energy levels obtained from the Schrödinger
equation. If Heisenberg's uncertainty principle is used to prove Schrödinger equation under Harmonic
Oscillator, it addresses the harmonic oscillator, verifying the ground state energy using the uncertainties
in position and momentum. The results are related to the known solutions involving Hermite
polynomials. By utilizing Heisenberg's uncertainty principle to prove Schrödinger equation for the
hydrogen atom problem, it deals with the hydrogen atom, using the Bohr radius to estimate the
uncertainties and derive the ground state energy. The result matches the solution obtained from the
Schrödinger equation, demonstrating the fundamental role of the uncertainty principle in quantum
mechanics.
2. Methods and Theory
2.1. Using Fourier transform to prove Heisenberg uncertainty principle
A fundamental notion in quantum physics is the Heisenberg Uncertainty Principle, which claims that it
is difficult to simultaneously know the precise position and momentum of a particle [4]. This principle
can be mathematically derived using Fourier transforms, which relate functions in time and frequency
domains.
The proof starts with the following mathematical inequality:
󰈅
2󰆹󰇛󰇜󰆹
󰈅2
0󰇛1󰇜
This inequality uses properties of the Fourier transform and derives the uncertainty principle. The
mean and variance can be defined as the following. The Mean and Variance of ω are
󰆹󰇛󰇜

 and 󰇛󰇜󰆹󰇛󰇜

 . The Mean and Variance of t are
 󰇛󰇜 and 󰇛󰇜
 . Using the above definitions to expand the integral:
󰈅
2󰆹󰇛󰇜󰆹
󰈅2
22󰆹2
2󰇧
󰆹

󰆹󰇨󰈅󰆹
󰈅2
󰇛2󰇜
By simplifying the right-hand side, it is found that 
 
󰇛󰇜
 .
Combining these results, the Heisenberg uncertainty principle is
ΔΔ1
2󰇛3󰇜
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0117
60
Angular frequency is related to momentum by:
. Therefore
. Substituting into
the uncertainty principle for angular frequency and time: 
, 

. By multiplying both
sides by , 
. Interpreting  as , it is found that 
.
Hence, the product of the uncertainties in time and frequency domains is bounded below by a
constant, which is a representation of the Heisenberg uncertainty principle. The derivation emphasizes
the profound connection between time and frequency domains, as encapsulated by the Fourier transform,
and their role in understanding the behavior of quantum systems.
2.2. Fourier Transform in Quantum Mechanics and Time-Dependent Schrödinger Equation
The Fourier transform can be used to turn a function of time or space into a function of frequency or
momentum [5]. In quantum mechanics, the Fourier transform is used to switch between the position
representation and the momentum representation of the wave function. The Fourier transform of a wave
function 󰇛󰇜 is given by
󰇛󰇜1
2 󰇛󰇜
 󰇛4󰇜
The inverse Fourier transform is:
󰇛󰇜1
2
󰇛󰇜
ω󰇛5󰇜
The time-dependent Schrödinger equation describes how the quantum state of a physical system
evolves over time [6]. It is a foundational equation in quantum mechanics and is given by:
󰇛󰇜

󰇛󰇜 󰇛6󰇜
where 󰇛󰇜 is denoted by the wave function of the system, is denoted by the reduced Planck constant,
and
is denoted by the Hamiltonian operator. For a particle in a potential 󰇛󰇜 the Hamiltonian
operator can be expressed as:
2
22
2󰇛󰇜󰇛7󰇜
3. Results and Application
3.1. Prove Schrödinger equation under the free particle condition
Assume a plane wave solution for a free particle
󰇛󰇜󰇛󰇜󰇛8󰇜
where is the wave number, and ω is the angular frequency. Using the de Broglie relation and
, the time-dependent Schrödinger equation for a free particle is

 2
22
2󰇛9󰇜
Compute the time derivative: 
  and compute the second spatial derivative: 2
22, the
author can relate and to Energy and Momentum.
For a free particle, the energy E is purely kinetic: 2
222
2. The angular frequency ω is related
to the energy by . Thus, 22
2. This implies 2
2. Substitute ω into the time derivative
equation: 
 2
2. Rewrite the equation
 22
2, and using the second spatial derivative:
2
22
222
2, it is found that the Schrödinger equation is:
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0117
61

 2
22
2󰇛10󰇜
By assuming the wave nature of particles and using the Heisenberg uncertainty principle, it arrives
at the Schrödinger equation for a free particle. The key steps involve recognizing the relationships
between energy, momentum, and the wave function's form, which are all consistent with the constraints
imposed by the uncertainty principle [7].
3.2. Prove Schrödinger equation under particle in a box
Consider a particle confined in a one-dimensional box of width . The potential 󰇛󰇜 is given by
󰇛󰇜󰇫0if0xa
0 󰇛11󰇜
The time-independent Schrödinger equation for a particle of mass in a potential V(x) is given by
Eq. (6). When 󰇛󰇜, the equation simplifies to
2
22
2 󰇛12󰇜
The solution to the Schrödinger equation where 󰇛󰇜0 is given by 󰇛󰇜2
sin 󰇡
󰇢,
where n is a positive integer. The corresponding energy levels are:
222
22󰇛13󰇜
For a particle in the ground state , the wave function is:
1󰇛󰇜2
sin 󰇡
󰇢󰇛14󰇜
The uncertainty in position, , can be approximated as: 
. The uncertainty in momentum, ,
can be estimated using the uncertainty principle: 

To relate these uncertainties to the Schrödinger equation, the expression for the kinetic energy of the
particle is: 2
2. The uncertainty in energy due to the uncertainty in momentum is: ΔΔ2
22
22.
This energy uncertainty matches the ground state energy 122
22. Thus, the Heisenberg uncertainty
principle is consistent with the energy levels from the Schrödinger equation for a particle in a box. The
ground state energy and demonstrated its alignment with the Schrödinger equation. It serves to illustrate
that the uncertainty principle forms a fundamental basis for comprehending the quantization of energy
levels within confined systems [8].
3.3. Prove Schrödinger equation under harmonic oscillator
The one-dimensional harmonic oscillator is given by: 󰇛󰇜1
222. The time-independent
Schrödinger equation for a particle of mass in 󰇛󰇜 is 2
22
2󰇛󰇜. For a harmonic oscillator,
substituting 󰇛󰇜1
222gives: 2
22
21
222. The solutions to this equation involve
Hermite polynomials: 󰇛󰇜222󰇛󰇜, where α= 
. is a normalization constant,
and are the Hermite polynomials. The corresponding energy levels are:
1
2󰇛15󰇜
For the ground state (0), the wave function is:
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0117
62
0󰇛󰇜󰇡
12󰇢12222󰇛16󰇜
The uncertainties in position  and momentum  for the ground state are given by: Δ
22
2, Δ22
2. For the ground state of the harmonic oscillator, it
verifies the uncertainty principle ΔΔ
2
2
2. The uncertainties in position and
momentum are related with the energy of the harmonic oscillator: 2
21
222. For the ground
state: 2
2, 2
2. Substituting these into the energy expression:

41
22
2
2󰇛17󰇜
Thus, the energy matches the ground state energy 01
2 obtained from the Schrödinger equation.
It demonstrates that the limits placed on the precise position and momentum of the particle lead directly
to the quantized energy levels of the harmonic oscillator [9].
3.4. Prove Schrödinger equation for the hydrogen atom problem
The energy for an electron in a hydrogen atom is given by the Coulomb potential: 󰇛󰇜 2
40. The
time-independent Schrödinger equation for the hydrogen atom in spherical dimensions is:
2
22󰇛󰇜 󰇛18󰇜
By separating variables, the radial part of the Schrödinger equation is:
2
2󰇧2
2󰇛1󰇜
2󰇨2
40 󰇛19󰇜
For the hydrogen atom, let's assume the uncertainty in the electron's position  is on the order of
the Bohr radius 0: 
The uncertainty in momentum  can be estimated using Heisenberg's uncertainty principle:
Δ
Δ
0󰇛20󰇜
The kinetic energy T can be approximated as: 󰇛󰇜


. The potential energy V is:
. The total energy is the sum of kinetic and potential energy
2
20
22
400󰇛21󰇜
To find the ground state energy, minimize with respect to 0: 
. Then,


.
Solving for 0, it is found that 
. Substitute 0 back into the expression for :
2
800󰇛22󰇜
This is the ground state energy of the hydrogen atom, which matches the result obtained from solving
the Schrödinger equation [10].
4. Conclusion
This article demonstrates the application of Heisenberg's uncertainty principle to derive the Schrödinger
equation for various quantum systems, including free particles, particles in a box, harmonic oscillators,
and the hydrogen atom. By using the fundamental limits imposed by the uncertainty principle, it shows
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0117
63
how the quantization of energy levels arises naturally within these systems. The proof underscores the
connection between the principles of quantum mechanics and the Fourier transforms used to describe
them. The derivations presented provide a clear and coherent framework for understanding the
foundational aspects of quantum mechanics. With the uncertainty principle, it derives the Schrödinger
equation, which manages the behavior of quantum systems. The article offers a unified approach to
deriving the Schrödinger equation for different quantum systems using Heisenberg's uncertainty
principle. This helps in understanding the common underlying principles that govern these systems. The
use of Fourier transforms to derive the uncertainty principle and subsequently apply it to different
quantum systems adds a level of mathematical rigor to the derivations, ensuring that the results are
robust and consistent. However, the article has the limitations. Some derivations rely on simplifying
assumptions, such as approximating uncertainties or assuming certain forms of wave functions. These
assumptions, while useful for illustrative purposes, are not fully capture the complexity of real-world
quantum systems. When considering the methods in more complex quantum systems, such as those with
several interacting particles or external fields, it reduces constraints in the future study. Combining the
analytical framework offered with numerical simulations makes it possible to provide deeper
understanding and more precise predictions for a wider variety of quantum phenomena.
References
[1] Chen, L. P., Kou, K. I., Liu, M. S. (2015). Pitt's Inequality and the Uncertainty Principle
Associated with the Quaternion Fourier Transform. Journal of Mathematical Analysis and
Applications, 423(1), 681-700.
[2] Ballentine, L. E. (2014). Quantum Mechanics: A Modern Development. World Scientific
Publishing Company.
[3] Feit, M. D., Fleck Jr, J. A., & Steiger, A. (1982). Solution of the Schrödinger Equation by a
Spectral Method. Journal of Computational Physics, 47(3), 412-433.
[4] Busch, P., Heinonen, T., & Lahti, P. (2007). Heisenberg's Uncertainty Principle. Physics Reports,
452(6), 155-176.
[5] Bracewell, R. N. (1989). The Fourier Transform. Scientific American, 260(6), 86-95
[6] Berezin, F. A., Shubin, M. (2012). The Schrödinger Equation (Vol. 66). Springer Science &
Business Media.
[7] Shananin, N. A. (1994). On Singularities of Solutions of the Schrödinger Equation for a Free
Particle. Mathematical Notes, 55(6), 626-631.
[8] Hojman, S. A., Asenjo, F. A. (2020). A new approach to solve the one-dimensional Schrödinger
equation using a wavefunction potential. Physics Letters A, 384(36), 126913.
[9] Havin, V., Jöricke, B. (2012). The uncertainty principle in harmonic analysis (Vol. 28). Springer
Science & Business Media.
[10] Nakatsuji, H. (2005). General Method of Solving the Schrödinger Equation of Atoms and
Molecules. Physical Review AAtomic, Molecular, and Optical Physics, 72(6), 062110.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0117
64
Analysis of the Principles of Quantum Computing and State-
of-the-Art Applications
Zhuolun Li
School of Physics and Astronomy, University of St Andrews, St Andrews, the United
Kindom
zl200@st-andrews.ac.uk
Abstract. Contemporarily, quantum computing has emerged as a promising field, offering
potential breakthroughs in various computational tasks that are currently limited by classical
computing. With this in mind, this study delves into the principles of quantum computing,
exploring the fundamental concept of quantum entanglement and its implications for
computation. After outlining the historical development and research significance of quantum
computing, this research presents an overview of the latest advancements in the field. The paper
then focuses on the principles of quantum computation, including the use of qubits and quantum
gates, illustrated with relevant mathematical formulations and diagrams. Furthermore, this study
discusses the state-of-the-art applications of quantum computing, showcasing recent
achievements and results obtained from these cutting-edge technologies. A comparative analysis
with traditional algorithms highlights the advantages and potential gains offered by quantum
computing. Finally, the current limitations of quantum computing are discussed and the insights
into future research directions and prospects are proposed for this exciting field.
Keywords: quantum computing, quantum entanglement, quantum principles, traditional
algorithms.
1. Introduction
Reinvigorating computation, quantum computing has captured the imagination of interdisciplinary
researchers and trailblazers. This revolutionary paradigm promises to transcend the boundaries of
classical computing through the exclusive capabilities of quantum mechanics. Envisioned as a game-
changer, it strives to tackle complex challenges with a velocity and precision unprecedented in
traditional computing. Delving into its historical roots, quantum computing finds its genesis in the early
20th century, a time when intellectual giants such as Max Planck, Niels Bohr, and Werner Heisenberg
laid the theoretical cornerstone for elucidating the dynamical interactions of material particles and
energy phenomena on the atomic and subatomic scales. Their contributions formed the intellectual
scaffolding upon which quantum computing's aspirations are built.
The impetus to leverage quantum systems for computational endeavors notably accelerated during
the 1980s. Richard Feynman's seminal 1982 work, titled "Simulating Physics with Computers,"
underscored the inherent inefficiency of simulating quantum phenomena on classical machines,
stemming from the exponential bloat of computational demands with system expansion [1]. This
revelation reignited interest in exploiting the unique features of quantum mechanics for computational
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0155
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
65
pursuits. Following Feynman's seminal contributions, quantum computing has undergone swift
advancements, propelled by breakthroughs in quantum physics, computer science, and engineering
domains. Initial endeavors concentrated on showcasing core quantum computing principles,
encompassing phenomena like quantum teleportation and entanglement-mediated communication
frameworks. Nonetheless, notable strides in constructing scalable quantum computing platforms,
capable of tackling intricate algorithms, were not attained until the dawn of the new millennium. This
achievement was fueled by substantial investments from multiple sectors, including governments,
academia, and industry, acknowledging quantum computing's potential to address the humanity's most
critical challenges.
In recent years, quantum computing has experienced notable advancements, with a multitude of
milestones accomplished across different aspects. A particularly significant accomplishment is the
development of quantum processors that possess an escalating number of qubits. Initially, quantum
processors were restricted to just a few qubits, significantly constraining their computational capabilities.
Nevertheless, current advanced systems now feature hundreds or even thousands of qubits, facilitating
more intricate computations and expanding the horizons of quantum computing's potential. These
advancements have materialized due to breakthroughs in materials science, fabrication techniques, and
control electronics. Researchers have introduced innovative qubit implementations, including
superconducting qubits, optical quantum, they are all presenting their own set of advantages and
challenges. Furthermore, progress in microwave engineering and cryogenics has facilitated precise
control and manipulation of quantum states, which is vital for executing intricate quantum algorithms.
A pivotal achievement in the evolution of quantum computing lies in the demonstration of what is
known as 'quantum supremacy,' a term coined by John Preskill. This concept highlights the quantum
computer's capacity to execute a designated task at a pace unparalleled by any classical computer, even
when the latter employs immense parallel processing capabilities [2]. Notably, in 2019, Google asserted
that it had achieved this quantum supremacy milestone with its 53-qubit 'Sycamore' processor. This
accomplishment entailed solving a random circuit sampling problem in merely 200 seconds, a feat that
would have ostensibly consumed millennia for even the world's swiftest supercomputer [3]. Amidst
ongoing discussions regarding the significance and repercussions of this breakthrough, it stands as a
pivotal step in showcasing the immense potential harbored by quantum computing.
The motivation behind this paper stems from the growing recognition of the importance of quantum
computing in addressing challenges that currently exceed the capabilities of classical computing. With
computational problems continually increasing in complexity, there is a pressing need for innovative
computational paradigms that can handle the exponential growth in computational demands. Quantum
computing emerges as a promising solution, leveraging the unique properties of quantum mechanics to
achieve substantial speedups for certain classes of problems. The organization of this paper is structured
as following. Sec. 2 delves into the intricacies of quantum entanglement, the unconventional correlation
underpinning the prowess of quantum computing. Sec. 3 outlines the fundamental principles of quantum
computation, encompassing qubits and quantum gates, supported by mathematical formulations and
visual aids. Sec. 4 explores the cutting-edge applications of quantum computing, showcasing recent
triumphs and outcomes stemming from these groundbreaking technologies. Sec. 5 contrasts quantum
algorithms with their classical counterparts, elucidating their advantages and potential benefits. Sec. 6
appraises the existing constraints within quantum computing and provides insights into prospective
research avenues and the future landscape of this burgeoning field. Lastly, Sec. 7 concludes the paper
by recapitulating the core discoveries and their broader implications.
2. Descriptions of quantum entanglement
Quantum entanglement, a singular feature of quantum mechanics, signifies an intricate
interconnectedness among two or more quantum particles. This strong correlation renders the state of
any one particle inseparable from the rest, transcending even spatial separation. This nonlocal aspect is
among quantum mechanics' most intriguing and counter-intuitive qualities, underpinning the core tenets
of quantum information processing and computation. Quantum entanglement is mathematically
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0155
66
formulated in the realm of quantum mechanics, leveraging the constructs of linear algebra and complex
Hilbert spaces. A composite quantum system's pure state, comprising two or more subsystems, can be
mathematically represented as a vector residing in the tensor product of the individual subsystems'
Hilbert spaces. When the state cannot be decomposed into a simple product of the individual subsystem
states, it is classified as entangled, highlighting the inseparability of the system components.
Consider a two-qubit scenario where each qubit can inhabit either the |0 or |1 state. In this context,
separable states are straightforwardly represented by |00, |01, |10, or |11. However, the Bell state
|𝛷+⟩ = (|00 + |11⟩)/2 exemplifies entanglement, a state that resists decomposition into individual
qubit states. This entanglement creates a profound connection, where the measurement of one qubit
instantaneously influences the state of its entangled counterpart, defying spatial barriers. This nonlocal
correlation serves as the foundation for a myriad of quantum communication and computational
protocols. In 1935, Albert Einstein, Boris Podolsky, and Nathan Rosen introduced the EPR paradox [4],
which questioned the completeness of quantum mechanics by asserting that entanglement contradicted
local realism. Nevertheless, subsequent scientific investigations have repeatedly validated the
predictions of quantum mechanics, upholding the authenticity of entanglement and its nonlocal
characteristics. Quantum computing leverages entanglement to offer a fundamentally novel approach to
parallel information processing compared to classical methods. By exploiting entanglement, quantum
computers embark on multiple computational trajectories concurrently, harnessing the superposition
principle to perform intricate computations with heightened efficiency vis-à-vis classical counterparts.
3. Principle of quantum computation
Quantum computation uniquely exploits the salient features of quantum mechanics to perform
calculations in a completely distinct manner from classical computation. At its core, quantum computing
relies on the qubit, a fundamental unit of information that differs markedly from the classical bit in
various crucial respects. A classical bit is binary, constrained to the states 0 or 1. Conversely, a qubit
boasts a super positional ability, concurrently inhabiting a blend of these two states. Mathematically,
this blend is framed as a linear integration of |0 and |1 states, designated as ψ = α0 + β1,
where α and β are complex coefficients under the normalization rule α² + β² = 1. This
superposition grants the qubit superior information-carrying potential over its classical counterpart,
permitting it to signify a continuous spectrum of states, transcending the limitations of mere binary
representation.
Quantum circuit elements, namely quantum gates, serve as the fundamental components for
manipulating qubits to execute computational procedures. Distinct from classical logic gates that
function sequentially on individual bits, quantum gates exhibit the capability to simultaneously interact
with one or multiple qubits, leveraging the superposition and entanglement features inherent in quantum
states. Several prototypical quantum gates are:
The Hadamard Gate (H), which transforms a basis state into a superposition state. For instance, the
gate transforms the state |0 into an equal superposition of |0 and |1.
The Controlled-NOT (CNOT) Gate, which implements a conditional NOT operation between two
qubits. If the control qubit is in the |1 state, the target qubit's state flips; otherwise, it remains
unchanged.
The Toffoli Gate, a universal quantum gate capable of simulating any classical logic circuit. It toggles
the target qubit's state solely when both control qubits are simultaneously in the |1 configuration.
These gates underscore the quantum computing paradigm's parallel processing capabilities, enabling
computations that far surpass those achievable by conventional means.
Quantum algorithms capitalize on the innate parallelism and entanglement properties of quantum
computers, resulting in substantial speed enhancements compared to classical algorithms. Grover's
search algorithm, for instance, achieves a quadratic speedup in identifying a target element within an
unordered N-element list, requiring just O(𝑁) steps, vastly superior to the O(N) steps of its classical
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0155
67
counterparts [5]. Similarly, Shor's factorization algorithm performs the task of large integer factorization
in polynomial time, whereas the most advanced classical factoring methods operate in sub exponential
time, underscoring the remarkable efficiency gains offered by quantum algorithms [6]. The
mathematical framework of quantum algorithms typically entails representing quantum states as vectors
in a complex Hilbert space and utilizing linear algebra to depict the evolution of these states when
quantum gates are applied. Illustrated in Figure 1 is a quantum circuit diagram, exemplifying the
utilization of quantum gates in the manipulation of qubits.
Figure 1. Quantum circuit used in numerical simulations (Photo/Picture credit: Original).
In addition to the basic algorithms, the specific implementation methods of quantum computers are
also various. At present, there are several mature quantum computers. Ion trap quantum computers
utilize charged ions suspended in electromagnetic fields as qubits. They offer long coherence times,
high-fidelity gates, and scalability potential. Recent advances have shown stable confinement of
hundreds of ions, but scaling to larger numbers remains a challenge. Advanced control techniques and
cryogenic traps aim to minimize decoherence. Hybrid quantum-classical systems simplify complex
algorithm implementation. Applications include quantum simulation, optimization, and cryptography.
Overcoming scaling, error correction, and integration challenges is crucial for practical deployment.
Despite these hurdles, ion trap quantum computers show promise for realizing fault-tolerant quantum
computation [7]. Superconducting quantum computers harness the unique properties of superconducting
materials at extremely low temperatures to realize efficient manipulation and stable storage of quantum
bits (qubits). They leverage quantum superposition and entanglement to solve complex problems with
unprecedented efficiency, far surpassing classical computers. Core to their operation are
superconducting quantum chips, which serve as the foundation for qubit operations. With applications
spanning drug discovery, material science, cryptography, and secure communications, superconducting
quantum computers represent a promising direction in quantum computing research. Advancements
such as China's indigenously developed "Origin Quantum Computing" demonstrate the practicality and
sophistication of this technology [8]. Optical quantum computers constitute a cutting-edge technology
that harnesses the distinct properties of light to execute computations. They employ photons as qubits,
leveraging quantum superposition, interference, and entanglement to facilitate parallel processing and
efficient information handling. This enables optical quantum computers to address complex problems
with unprecedented speed and efficiency, surpassing the limitations of classical computers [9]. Silicon
photonics computers, or silicon photonics-based computing systems, utilize silicon-based photonic
integrated circuits to manipulate and process information using photons instead of electrons. This
technology combines the advantages of silicon, a traditional material for integrated circuits, with the
speed and bandwidth of optical communication [10]. Topological quantum computers represent a
promising avenue in quantum computing research. Leveraging the topological properties of certain
quantum systems, they aim to achieve fault-tolerant quantum computation. These computers encode
quantum information in a manner that is intrinsically resilient to decoherence and errors, enhancing
stability and reliability [11].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0155
68
4. Applications for quantum computation
Quantum computation showcases immense potential across diverse application domains, ranging from
optimization and machine learning to cryptography and materials science. This section delves deeper
into these applications, emphasizing the recent achievements and implications stemming from
advancements in quantum computing technologies. By leveraging the unique properties of quantum
mechanics, quantum computing promises to revolutionize these fields and pave the way for
groundbreaking solutions.
4.1. Optimization
Quantum computation presents formidable potential for tackling optimization challenges prevalent
across industries like logistics, finance, and engineering. These problems necessitate pinpointing the
optimal solution amidst numerous configurations, posing significant computational hurdles for classical
computing frameworks. By exploiting the inherent advantages of quantum mechanics, quantum
computing aims to revolutionize the resolution of such optimization tasks.
Quantum annealing, a heuristic optimization algorithm inspired by the metallurgical process of
annealing, can be executed on quantum computers to discover approximate solutions for intricate
optimization problems. This algorithm initializes a system of qubits in a superposition state and
progressively cools it down to its ground state, where the lowest energy configuration signifies the
optimal solution to the optimization problem.
4.2. Machine learning
Quantum computing presents a transformative opportunity for machine learning, promising streamlined
neural network training and the emergence of innovative quantum-driven algorithms. Quantum neural
networks (QNNs) capitalize on the exclusive attributes of quantum mechanics to embody and
manipulate data, marking a fundamental divergence from traditional neural network architectures. This
approach enables QNNs to process information in a fundamentally different manner, leveraging the
inherent advantages of quantum computation for enhanced performance and efficiency. Parameterized
variational quantum algorithms (pVQAs) have emerged as an encouraging approach for solving
optimization challenges leveraging quantum computation. These algorithms deploy a circuit architecture
parameterized by quantum gates, which encodes the solution space for a given optimization problem.
The parameters of this quantum circuit are subsequently refined through a classical optimizer aimed at
minimizing a predefined cost function.
This blend of quantum and classical computation has garnered attention for its application in diverse
machine learning endeavors, including the development of quantum support vector machines and
quantum autoencoders. By strategically combining the advantages of both paradigms, pVQAs enable
the efficient tuning of sophisticated machine learning models, thereby addressing intricate optimization
tasks with greater proficiency.
4.3. Cryptography
Quantum cryptography, alternatively known as quantum key distribution (QKD), promises an
unparalleled level of security that transcends the limitations of classical cryptography. Drawing upon
quantum mechanical principles, notably the no-cloning theorem and the uncertainty principle, QKD
ensures that any interception attempt on a quantum communication channel is detectable, consequently
safeguarding against unauthorized access to transmitted data [12].
In QKD, a sequence of quantum states (typically photons) is transmitted between two communicating
parties. Any attempt to measure these quantum states to extract information will inevitably disturb them,
revealing the presence of an eavesdropper. By monitoring the disturbance in the transmitted quantum
states, the communicating parties can detect any eavesdropping attempts and abort the communication
if necessary.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0155
69
4.4. Material science
Quantum computing presents a transformative opportunity for material science by facilitating the
simulation of intricate quantum systems that are computationally overwhelming for classical machines.
Classical approaches to modeling quantum systems, encompassing molecules and solid-state materials,
grapple with an exponential surge in computational intricacy as the system dimensions expand.
Conversely, quantum computers excel at efficiently embodying and manipulating quantum states,
rendering them ideally suited for simulating these systems with precision and efficacy.
Quantum phase estimation techniques can aid in elucidating the energy distribution of molecular
Hamiltonians, a crucial aspect for deciphering the electronic configuration and chemical characteristics
of molecules. This knowledge forms the cornerstone for crafting innovative materials tailored to specific
attributes like heightened conductivity, sturdiness, or catalytic prowess. By mimicking the dynamics of
quantum systems at an atomic level, quantum computers accelerate the quest for groundbreaking
materials with transformative applications.
5. Comparison with traditional algorithms
Quantum computing algorithms exhibit notable advantages over traditional algorithms in terms of
computational speed and efficiency, particularly for problems that are inherently challenging to solve
using classical methods. In the realm of optimization problems, quantum annealing and QAOA can
frequently discover high-quality solutions more rapidly than classical heuristics, harnessing the
parallelism and entanglement properties inherent in quantum computation. Analogously, for machine
learning tasks, QNNs and VQAs possess the potential to expedite training processes and enhance model
accuracy by leveraging the distinct capabilities of quantum computers.
However, it is essential to recognize that quantum computing does not represent a panacea for all
computational challenges. Despite the existence of problems that can be effectively tackled by
conventional algorithms, the cost associated with developing and sustaining quantum computers poses
a substantial obstacle for widespread adoption. Moreover, the pursuit of practical quantum algorithms
that outperform their classical counterparts remains an active field of inquiry, confronted with numerous
hurdles that have yet to be navigated.
6. Limitations and prospects
Quantum computing remains an emerging technology, and widespread implementation necessitates
addressing numerous substantial obstacles. A pivotal challenge lies in the delicacy of qubits, which are
susceptible to decoherence and noise. Decoherence arises when quantum coherence dissipates due to
environmental interactions, transforming the quantum state into a classical blend. Additionally, noise
may originate from various factors, including qubit realization flaws, control electronics imperfections,
and ambient conditions, posing further difficulties. To address decoherence and noise issues, scientists
are advancing sophisticated error mitigation techniques and constructing more resilient qubit
architectures. Techniques like surface codes and topological quantum error correction are being
employed, wherein logical qubits are encoded across numerous physical qubits. This approach enables
the detection and correction of errors without perturbing the encoded quantum information. Nonetheless,
these error correction strategies necessitate a substantial overhead, both in terms of the quantity of
physical qubits required and the intricacy of the control circuitry. Another hindrance in the current
landscape of quantum computing is the constraint in qubit interconnectivity within processors. Most
processors sport a scattered qubit network, where direct links exist between only a select few qubits.
This restricted connectivity poses obstacles for intricate quantum algorithm deployment, potentially
necessitating auxiliary SWAP gates for data transfer between distant qubits. Scientists are actively
investigating diverse architectural blueprints to enhance qubit connectivity, exploring options like 2D
lattices, 3D arrays, and superconducting resonators, among others.
Despite these limitations, the prospects for quantum computing are bright. Advances in materials
science, quantum hardware design, and algorithm development are driving rapid progress in the field.
New qubit implementations, such as topological qubits and spin qubits, are being explored to improve
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0155
70
qubit coherence times and reduce the impact of noise. Additionally, the development of hybrid quantum-
classical systems that leverage the strengths of both technologies is likely to accelerate the adoption of
quantum computing in the near term. Quantum computing, in the long haul, bears the capacity to
transform diverse industries and enable solutions to problems that classical computers struggle with. For
drug discovery, it can accelerate the identification of potential drug candidates by simulating molecular
interactions at the atomic level. Analogously, in climate modeling, quantum computing can bolster
simulation precision by efficiently tackling the intricate dynamics of atmospheric and oceanic systems.
7. Conclusion
In conclusion, quantum computing has emerged as a promising avenue with multifaceted applications,
outperforming traditional computers in terms of computational swiftness and proficiency across
domains from optimization and machine learning to cryptography and materials science. Despite
formidable challenges, including qubit fragility and the necessity for practical error correction codes,
the prospects for quantum computing appear bright. With relentless progress in hardware, software, and
algorithmic advancements, quantum computing promises to transform into an indispensable instrument
for tackling humanity's pressing challenges. Researchers and engineers are relentlessly striving to
transcend the limitations of contemporary quantum technologies and extend the boundaries of quantum
computing's potential. As this field continues to advance, quantum computing is poised to infuse a fresh
and substantial impetus into the development of human science and technology.
References
[1] Feynman R P 1982 Simulating physics with computers International Journal of Theoretical
Physics vol 21(6-7) pp 467-488
[2] Arute F, Arya K, Babbush R, Bacon D, Bardin J C, Barends R and Martinis J M 2019 Quantum
supremacy using a programmable superconducting processor Nature vol 574(7779) pp 505-
510
[3] Preskill J 2018 Quantum computing in the NISQ era and beyond Quantum vol 2 p 79
[4] Einstein A, Podolsky B and Rosen N 1935 Can quantum-mechanical description of physical
reality be considered complete? Physical Review vol 47(10) p 777
[5] Grover L K 1996 A fast quantum mechanical algorithm for database search In Proceedings of the
Twenty-eighth Annual ACM Symposium on Theory of Computing pp 212-219
[6] Shor P W 1994 Algorithms for quantum computation: Discrete logarithms and factoring In
Proceedings 35th Annual Symposium on Foundations of Computer Science pp 124-134
[7] Blatt R and Wineland D 2008 Entangled states of trapped atomic ions Nature vol 453(7198) pp
1008-1015
[8] Zhu X, Saito S, Young A W, Gray R, Chen L, Bose S and You J Q 2021 Quantum computational
advantage via 66-qubit superconducting quantum circuit Science vol 372(6544) pp 973-977
[9] Wang J, Paesani S, Ding Y, Santagati R, Skrzypczyk P, Salavrakos A and Thompson M G 2019
Multidimensional quantum entanglement with large-scale integrated optics Science vol
366(6465) pp 602-606
[10] Thomson D J, Zilkie A, Bowers J E, Vlasov Y A, Chen L and Urbas A 2016 Roadmap on silicon
photonics Journal of Optics vol 18(7) p 073003
[11] Nayak C, Simon S H, Stern A, Freedman M and Das Sarma S 2008 Non-Abelian anyons and
topological quantum computation Reviews of Modern Physics vol 80(3) p 1083
[12] Bennett C H and Brassard G 1984 Quantum cryptography: Public key distribution and coin
tossing In Proceedings of IEEE International Conference on Computers Systems and Signal
Processing pp 175-179
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0155
71
Advances in monocular ORB-SLAM system: A review
Ziyi Yuan
School of Advanced Manufacturing, Guangdong University of Technology,
Guangzhou, China
3121009096@mail2.gdut.edu.cn
Abstract. Perception and localization are the main factors to determine the success of unmanned
vehicles. Therefore, researchers have conducted substantial studies, which made unmanned
driving not only to perceive and comprehend the around environments but also refer to the detail
about the environments by constructing 3D map. While there is still a lack of uniform explanation
of Oriented Fast and Rotated Brief - Simultaneous Localization and Mapping (ORB-SLAM) for
monoculars. By selecting and collecting the combination and application of the recent four types
of monocular ORB-SLAM in unmanned driving scenarios, this paper discusses the question of
how to decrease cumulative error and ensure accuracy and robustness in dynamic environments.
It is revealed that after comparing the recent four types of ORB-SLAM systems with
conventional ORB-SLAM systems, the fusion system’s robustness and accuracy have been
improved. Combining visual SLAM sensors with different algorithms and studying in different
complex environments will be mainstream in future research.
Keywords: Localization, monocular vision, simultaneous localization and mapping.
1. Introduction
With the fast development of technology, Simultaneous Localization and Mapping (SLAM) widely be
used in high-tech industries, such as the robot industry, construction industry, and unmanned vehicles.
While it has developed from the traditional SLAM in recent years, SLAM has been divided into two
categories, one is based on the laser sensors, which use laser to measure, while the other one is based on
the visual sensors. The visual SLAM is mostly using cameras to make measurements, and it can be
divided into three categories by their way of working, including monocular, multiocular, and RGB-D
cameras.
In recent years, the SLAM has been used to understand the surrounding environments, map around
environments and determine the location within the area. By detecting the objective, utilizing the deep
estimation and visual SLAM, the perception and measurement of the surrounding environment are
realized. There are also examples of applying SLAM techniques to microrobots for minimally invasive
surgery. However, there is still a lack of a uniform explanation for the monocular ORB-SLAM system
under the visual SLAM system. Through investigating four recent monocular ORB-SLAM systems, and
summarizing the research of the current ORB-SLAM system for monoculars, this research discusses
how to improve accuracy and robustness to reduce cumulative error in different environments. It is
concluded that combining the sensor with different innovational algorithms will effectively decrease
cumulative error and guarantee accuracy and robustness in complex environments.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0171
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
72
2. Application of the ORB-SLAM for monocular
ORB-SLAM, is an open-source visual system, about the application of the monocular ORB-SLAM. In
recent years, researchers also studied how to use ORB-SLAM in unmanned vehicles, the construction
industry, the robot industry and so on. In unmanned vehicles, the ORB-SLAM is used to recognize
pavement information and detect road obstacles. ORB-SLAM is also used to investigate and improve
drivers’ driving habits. In a survey, the author mentioned a method using ORB-SLAM to realize the
track of head scanning movement when driving, which aimed at gaining awareness of driving safety
technology applications [1].
In the construction industry, there is research about using ORB-SLAM to enhance robot localization
in dynamic construction environments, where the construction robots have to do precise positioning
work. However, it was difficult to recognize the dynamic objects in previous research and they mainly
investigate static objects. With the deepening of SLAM technology studies, the ORB-SLAM system has
a breakthrough in accurately segmenting dynamic objects and improving localization accuracy, which
justifies the ORB-SLAM potential for applications in complex construction environments.
In the robot industry, the ORB-SLAM system has been proposed for the surgical treatment of
microrobots, such as minimally invasive intestinal surgery. there is a survey introduced that minimally
invasive surgery has a series of problems in microrobot applications such as low reconstruction accuracy,
small surgical field, and low computational efficiency, a framework based on the ORB-SLAM system
for real-time dense reconstruction in binocular endoscopy scenes to solve these problems [2].
3. Algorithm based on SLAM for monocular
3.1. Conventional algorithm
The conventional SLAM systems include two main threads to be executed in parallel, which are called
tracking and mapping. However, the visual SLAM framework needs to include the following parts:
sensor information reading, front-end, back-end, map construction, and closed-loop detection [3].
Sensor information reading recognizes and preprocesses the image information. The front-end is known
as visual odometry which is in charge of processing the input images of the previous step and estimating
the camera posture at different times. The back-end is called nonlinear optimization, it can receive the
camera posture at different times returned by visual odometry and optimize the posture. In addition, the
back-end also receives closed-loop detection information and executes the global optimization to obtain
globally consistent tracks and maps. The last part is closed-loop detection which is used to certain
whether the mobile robot has passed through a previously visited location. The feature of pure visual
SLAM tracks the movement of key points through successive camera frames to infer the posture of the
camera.
3.2. Conventional algorithm of ORB-SLAM
The conventional Algorithm of ORB-SLAM divides SLAM system into three threads, feature points are
attached to them. The ORB-SLAM algorithm is modified based on the Parallel Tracking and Mapping
(PTAM) algorithm. The original PTAM algorithm has made a great breakthrough in the conventional
visual SLAM, which first proposed the parallelization of the tracking and mapping process, and uses
nonlinear optimization to replace the traditional filter as the back-end scheme, introducing a mechanism
of keyframes in the PTAM algorithm [3].
The mechanism suggests that each image can be processed without fine processing, instead, it can
proceed by connecting several images and then optimizing its tracks and maps. However, the closed-
loop detection cannot be performed in the PTAM algorithm. So the scenario it applies in is small and
the tracking is easy to lose. Compared to the ORB-SLAM proposed after PTAM, the ORB-SLAM
algorithm uses the ORB feature points and its descriptors to detect and track the feature points in the
image, and to estimate the camera pose through the resulting feature points. The ORB feature points are
a very fast feature extraction method with rotational invariance. The use of uniform ORB features helps
SLAM algorithms to have endogenous consistency in the steps of feature extraction and tracking, key
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0171
73
frame selection, 3D reconstruction, and closed-loop detection [4]. The ORB-SLAM algorithms divide
SLAM system into three threads, feature point tracking, spatial mapping, and loop detecting. The
advantage of this algorithm is that ORB-SLAM can realize real-time tracking and it is easy to find back
the lost keyframe when it returns to the original scene [5]. Besides, using the ORB-SLAM algorithms
can effectively improve the positioning stability and track the object in a simple scenario. This algorithm
compared with the PTAM algorithm provides more closed-loop detection parts than the PTAM
algorithm and can effectively solve the cumulative error problem left by PTAM algorithm.
4. Optimization of ORB-SLAM for algorithm
With the continuous innovation and optimization of the ORB-SLAM algorithm in recent years, this
section will introduce four derivative algorithms based on ORB-SLAM. By integrating monocular ORB-
SLAM with different methods, its robustness and accuracy in different environments have been
improved.
4.1. A graph recovery algorithm
Based on the ORB-SLAM, different progress has been made on monocular visual SLAM. Through the
SLAM graph recovery algorithm based on subgraphs and undirected connection graphs, the system uses
the mapping connection to re-initialize and reconstruct the individual parts of the map without tracking.
The survey shows that by evaluation in drone image simulations and datasets of ground and indoor
testing, it is concluded that in the situation of tracking failure, the SLAM graph recovery algorithm based
on subgraph and undirected connection graph can make the integrity of the map better than other
mainstream SLAM methods, ensuring the map integrity in the unmanned driving under the system of
tracking failures [6].
The main breakthrough is that after creating tracking failures in unmanned driving, missing maps
can be repaired by creating subgraphs. Then, the integrality of its subgraphs is guaranteed by a new
selection method. Finally, the undirected connection graph is used to maintain the connection
relationship between the subgraphs. The number of keyframes retained in the UAV environment, and
the proposed system is about four times the keyframe retained based on the original ORB-SLAM 2. In
an outdoor street environment, the proposed system can effectively reconstruct a more complete scene
map [6].
4.2. A semi-direct monocular SLAM with three-level parallel optimization
The conventional visual SLAM method can be divided into the feature-based method and the direct
method. The method of feature-base is to extract the feature points from the image data which is received
by the camera and analyze the feature points to realize the estimation of the camera posture. However,
the method of direct utilizes the photometric error to estimate the posture of the camera, for the reason
to effectively combine the advantages of the two methods and achieve more accurate camera pose
estimation. The survey proposed a semi-direct monocular SLAM with three-level parallel optimization
[7]. In this study, a new framework for SLAM operation called DO-SLAM is explored [7]. The first half
part of the DO-SLAM system, by using direct methods to quickly and robustly track the camera pose.
While the second half part of the system uses a feature-based approach, to refine the pose of the
keyframes, execute the loop, and construct the reusable globally consistent, long-term, and sparse
feature maps. The survey, as demonstrated by its evaluation of two benchmark datasets, using this
method has higher accuracy and robustness in motion estimation in unmanned driving.
4.3. Optimization of 3D points
Due to the limitations of monocular cameras, the scale of the monocular camera is fuzzy and limited in
the system and the environment. It is difficult to accurately measure the depth of the target scene and
the distance from the camera, which can reduce the impact on measurement accuracy during unmanned
driving. A study proposed a scaling estimation method [8]. By using the method for monocular visual
odometers in unmanned driving scenarios, the innovative approach is to use two consecutive keyframes
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0171
74
to reconstruct the 3D ground points, then use more processing frames to increase the number of 3D
ground points, to estimate more precise camera height [8]. This method adopts the mean ORB-SLAM
movement error of 1.19% on the KITTI dataset, compared to the state-of-the-art traditional monocular
SLAM method.
4.4. HFNet-SLAM
ORB-SLAM 3 is a SLAM system in visual SLAM with feature-based methods and higher robustness
and accuracy. A study replaced the vulnerability of traditional algorithms in complex environments,
proposed the HFNet-SLAM system [9]. This is an accurate real-time monocular SLAM system based
on ORB-SLAM 3. The system is combined with a deep convolutional neural network (CNN). The
difference between HFNet-SLAM and conventional ORB-SLAM 3 is that its local and global features
are extracted from deep CNN, HF-Net system, while experiments show that even with highly
reproducible local features of a deep CNN in complex environments, this is better than the traditional
feature extraction. The performance of this system has been validated on public data sets against other
state-of-the-art algorithms, the results show that HFNet-SLAM achieves the lowest error among the
systems available in the literature.
4.5. Comparison between optimized ORB SLAM algorithms and conventional SLAM
The four different fusion systems that are based on ORB-SLAM introduced above are compared to the
traditional visual SLAM system in this section. Firstly, a graph recovery algorithm based on the
subgraph and the undirected connection graph is compared with the traditional visual SLAM algorithm
in the case of failed system tracking, the fusion algorithm can ensure the integrity of the map in its lost
situation and is able to recover previously missing partial maps under the method of creating subgraphs.
While in the method based on the three-level parallel optimization, the direct approach with the
advantages of the feature approach was combined. In terms of the previous visual SLAM method, which
can obtain a more accurate and more robust camera posture estimation method. The approach uses the
3D ground feature points, collects 3D ground points from multiple processing frames, and then utilizes
robust scale estimation. Comparing it with traditional visual SLAM effectively reduces its scale drift.
Finally, by integrating the deep convolutional neural network (CNN) algorithm with the ORB-SLAM 3
system, compared with the conventional ORB-SLAM system, it shows that the resulting proposed
system has higher robustness and accuracy than the previous systems. The fusion system is twice as
accurate as the ORB-SLAM3 system in medium and large environments in the TUM-VI dataset [9].
5. Limitations of ORB-SLAM
Monocular visual sensor, compared with multiocular visual sensor and depth camera, the cost is cheaper.
However, in the application scenario, it often cannot exactly get the absolute depth of the environment.
For a monocular camera, it cannot get the true value of the trajectory and the map size leading to the
measured results of certain deviation values. The multiocular camera and depth camera can measure the
scene depth, so the aspect of the sensor monocular camera in the application scene still has certain
limitations. For example, Kinect Fusion proposed the use of Kinect cameras for 3D reconstruction, and
the RGB-D camera is used in ORB-SLAM2 system [10].
In the current experimental survey, most studies concentrated on both static and low-speed
environments. While studies in highly dynamic environments are still scarce. The robustness and
accuracy of the monocular vision system in highly dynamic or complex environments still cannot be
guaranteed [10], such as the method of scale estimation by acquiring 3D ground points [8]. The
researchers found that in a curved road or slope, only a less of 3D ground points can be collected, which
results in introducing new scale factors for the purpose of correcting the camera posture. This will lead
to a measurement error. Therefore, the lack of research on high-speed dynamic and complex scenarios
is one of the reasons for the limitation of the application of monocular vision systems in different
scenarios
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0171
75
Although a lot of research in the direction of feature extraction, the monocular visual system still has
some limitations in the extraction of feature points, such as integrating the deep convolutional neural
network algorithm with the ORB-SLAM [9]. It utilizes the deep learning method but still found in the
process of the experiment, the system will break in extreme rotation of the situation, and the use of
neural network method needs to use the support of external equipment, which undoubtedly increases the
burden of mobile robots.
6. Tendency and improvement
Through the above research and analysis, the future development trend and research direction of the
monocular visual system can be seen. One direction proposed in this survey is that it can be expanded
in the existing ORB-SLAM algorithm to support multiocular visual SLAM and RGB-D SLAM, because
both are more robust and accurate than monocular visual SLAM. The deep convolutional neural network
can be used as its feature extraction method, and then similar methods to solve the drift problem, to
ensure the stability of the whole system and reduce its error. This method combines the above methods
and it combines some of their advantages.
Experimental studies in highly dynamic and complex scenarios are scarce. So for future research, a
visual SLAM system in various dynamic scenarios is needed. Through the exploration and study of
complex scenes and highly dynamic environments, it is noticed that robustness can be achieved by
repeating the subgraph or improving the algorithm while improving the accuracy and robustness of the
visual SLAM system. By combining the visual SLAM system with the emerging algorithm, achieving
a much lower systematic error will be the future research direction.
7. Conclusion
This paper discusses and analyzes four recent ORB-SLAM systems combined with other innovational
algorithms, based on their differences from the conventional visual SLAM system. It is proposed that in
complex conditions, different algorithms with sensors should integrate with the monocular ORB-SLAM,
such as graph recovery algorithm for subgraph and undirected connection graph, three-level parallel
optimization method, and feature extraction method for constructing 3D ground points. An emerging
system combining the ORB-SLAM system with a deep convolutional neural network can make visual
SLAM have a prominent improvement in accuracy and robustness but also decrease the cumulative error.
Investigating during the study, current research on ORB-SLAM systems in highly dynamic
environments is scarce, and there is no study for monocular ORB-SLAM systems to apply in complex
environments. In the subsequent scientific studies, researchers can focus on studying the visual SLAM
system in highly dynamic environments and successively improve the ability to extract feature points in
the complex environment, to improve the accuracy and robustness of the monocular ORB-SLAM system
and reduce the cumulative error.
References
[1] Wang, S., Li, J., Yang, P., Gao, T., Bowers, A. R., & Luo, G. (2020). Towards Wide Range
Tracking of Head Scanning Movement in Driving. International journal of pattern recognition
and artificial intelligence, 34(13), 2050033. https://doi.org/10.1142/s0218001420500330
[2] Huo, J., Zhou, C., Yuan, B., Yang, Q., & Wang, L. (2023). Real-Time Dense Reconstruction with
Binocular Endoscopy Based on Stereo Net and ORB-SLAM. Sensors (Basel, Switzerland),
23(4), 2074. https://doi.org/10.3390/s23042074
[3] Zhu P, Zhou H, Zhang H, Lu S& Wei R. Visual simultaneous localization and mapping method
for a mobile robot. JOURNAL OF TIANJIN UNIVERSITY OF TECHNOLOGY1-10.
[4] Tourani, A., Bavle, H., Sanchez-Lopez, J. L., & Voos, H. (2022). Visual SLAM: What Are the
Current Trends and What to Expect? Sensors (Basel, Switzerland), 22(23), 9297. https://doi.
org/10.3390/s22239297
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0171
76
[5] R. Mur-Artal, J. M. M. Montiel and J. D. Tardós, "ORB-SLAM: A Versatile and Accurate
Monocular SLAM System," in IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147-1163,
Oct. 2015, doi: 10.1109/TRO.2015.2463671.
[6] Z. Zhan, W. Jian, Y. Li and Y. Yue, "A SLAM Map Restoration Algorithm Based on Submaps
and an Undirected Connected Graph," in IEEE Access, vol. 9, pp. 12657-12674, 2021, doi:
10.1109/ACCESS.2021.3049864
[7] S. Lu, Y. Zhi, S. Zhang, R. He and Z. Bao, "Semi-Direct Monocular SLAM With Three Levels
of Parallel Optimizations, " in IEEE Access, vol. 9, pp. 86801-86810, 2021, doi: 10.1109/
ACCESS.2021.3071921
[8] M. Fan, S. -W. Kim, S. -T. Kim, J. -Y. Sun and S. -J. Ko, "Simple But Effective Scale Estimation
for Monocular Visual Odometry in Road Driving Scenarios," in IEEE Access, vol. 8, pp.
175891-175903, 2020, doi: 10.1109/ACCESS.2020.3026347
[9] Liu, L., & Aitken, J. M. (2023). HFNet-SLAM: An Accurate and Real-Time Monocular SLAM
System with Deep Features. Sensors (Basel, Switzerland), 23(4), 2113. https://doi.org/10.
3390/s23042113
[10] Bala, J. A., Adeshina, S. A., & Aibinu, A. M. (2022). Advances in Visual Simultaneous
Localisation and Mapping Techniques for Autonomous Vehicles: A Review. Sensors (Basel,
Switzerland), 22(22), 8943. https://doi.org/10.3390/s22228943
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0171
77
Prospects for the development of cartography through the
integration of SLAM technology with GIS technology
Yaodong Tang
School of Information Engineering, China University of Geosciences Beijing, Beijing,
China
1004215119@email.cugb.edu.cn
Abstract. With the continued development of Simultaneous Localization and Mapping (SLAM)
and Geographic Information Systems (GIS) technologies, their application scenarios have
become increasingly complex. Combining these technologies can significantly enhance
operational efficiency in challenging environments. This paper presents an analysis of existing
cases where SLAM and GIS technologies have been integrated, demonstrating that such a
merger facilitates the consolidation and complementarity of spatial data. This integration
allows robots or systems to simultaneously utilize the global information provided by GIS and
the dynamic local data captured by SLAM for a more comprehensive and detailed
environmental analysis, which is highly beneficial for the field of cartography. Further research
has developed a series of operational procedures for integrating SLAM and GIS, utilizing
MATLAB as a tool. This study also reviews several existing technical challenges, including
real-time performance and computational capacity, environmental complexity and dynamic
changes, and multi-scale data processing, and proposes potential solutions. The paper
concludes by predicting that the integration of SLAM and GIS will play a crucial role in areas
such as smart city management and disaster emergency response, indicating that this research
area will become a hot topic in future cartographic technology.
Keywords: Cartography, Simultaneous Localization and Mapping, Geographic Information
Systems.
1. Introduction
In recent years, with the rapid development of mobile robotics and autonomous driving technologies,
Simultaneous Localization and Mapping (SLAM) technology has gradually become a research hotspot.
The core of SLAM technology lies in a robot's ability to construct maps in real-time and self-localize
in an unknown environment, addressing critical issues in autonomous navigation. Andréa Macario
Barros et al. have provided a detailed introduction to the current fundamental functions of SLAM
technology [1]. Geographic Information Science (GIS), a technology for collecting, storing, analyzing,
and displaying geospatial data, has played a significant role in urban planning, environmental
monitoring, and resource management.
The integration of SLAM and GIS technologies not only compensates for the real-time and
precision deficiencies in traditional GIS data acquisition but also provides SLAM technology with rich
geospatial information support. This greatly expands the application scope and potential of both
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0172
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
78
technologies. Dorra Larnaout and her team attempted in 2012 to use DEM data constraints commonly
used in GIS for bundle adjustment, resulting in a threefold increase in positioning accuracy, thereby
demonstrating the complementary nature of combining GIS data with SLAM technology [2].
Moreover, the integrated application of SLAM and GIS technologies is gaining increasing attention
and has found applications in various areas, especially in the construction of smart cities. K. Ghosh
and K. S. S. Musti proposed a framework for developing a GIS-based intelligent traffic system for
energy-aware smart cities by combining GIS and SLAM technologies [3].
According to Chen Chen and Cheng Yinhang, who introduced and demonstrated various SLAM
algorithms on the MATLAB platform, it can be inferred that MATLAB's mobile robot SLAM
simulator is easier to operate compared to commercial simulators [4]. This is because the MATLAB
language is widespread and easy to program, with numerous built-in functions supporting matrix
operations. Additionally, MATLAB offers various toolboxes to address issues in signal processing,
image processing, fuzzy logic, etc., allowing users to focus on SLAM algorithms and theory. On this
basis, the integrated application of SLAM and GIS can fully leverage MATLAB's powerful data
processing and algorithm implementation capabilities, providing new opportunities and solutions for
the development of the cartography field.
This paper aims to explore the background and significance of the integration of SLAM and GIS
technologies, analyze their prospects for development in the field of cartography, and discuss the
feasibility and advantages of combining the two through MATLAB, as well as potential application
directions and prospects. By delving into these topics, this research is believed to offer new insights
and references for the advancement of cartography.
2. Overview of SLAM and GIS technologies
2.1. Overview of SLAM
SLAM and GIS are crucial technological concepts in modern science and technology, playing pivotal
roles in their respective domains.
SLAM technology refers to the ability of mobile devices (such as robots, drones, smartphones, etc.)
to autonomously locate themselves and construct maps in an unknown environment. The core idea of
SLAM is to use sensors (such as cameras, LiDAR, etc.) to observe and determine the device's position
and orientation during movement, then incrementally build a map based on this positional information.
Several widely used algorithms are common in SLAM technology. Firstly, LiDAR-based SLAM
algorithms, such as Hector SLAM, Gmapping, and Cartographer, primarily rely on LiDAR sensors to
obtain environmental information through laser scanning, thereby achieving localization and mapping.
Another category is visual-based SLAM algorithms, such as ORB-SLAM (Oriented FAST and
Rotated Brief SLAM), LSD-SLAM (Large-Scale Direct Monocular SLAM), and PTAM (Parallel
Tracking and Mapping), which mainly depend on cameras to extract environmental features through
image processing techniques, enabling localization and mapping. Additionally, there are algorithms
like EKF-SLAM (Extended Kalman Filter SLAM) and FastSLAM, which employ different
mathematical models and optimization strategies to adapt to various environments and application
requirements [5]. SLAM technology is crucial for realizing truly autonomous mobile robots, allowing
them to explore and understand their surroundings and achieve autonomous navigation and task
execution without prior knowledge.
2.2. Overview of GIS
GIS technology is a technology for capturing, storing, managing, analyzing, and displaying geographic
data. GIS technology is based on geospatial data and employs geographic modelling analysis methods
to provide various spatial and dynamic geographic information. It can transform tabular data into
geographic graphical displays, facilitating user browsing, operation, and analysis. The primary data
types handled by GIS technology include vector data, raster data, terrain data, topological data, and
address data. Vector data, composed of geometric elements like points, lines, and polygons, represents
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0172
79
specific locations and shapes on a map. Raster data, consisting of pixels where each pixel represents
an area on the map, is often used for continuous data such as remote sensing images and digital
elevation models. Terrain data primarily describes surface morphology and elevation information,
serving as the main source for digital elevation models. Topological data emphasizes spatial
relationships between geographic features, such as adjacency and connectivity. Address data links
geographic coordinates with postal addresses, supporting location queries and positioning services.
GIS technology has extensive applications in various fields such as urban planning, environmental
monitoring, disaster management, and agricultural resource management. It helps individuals better
understand and interpret geographic phenomena, supporting decision-making and problem-solving. By
integrating SLAM and GIS technologies, the strengths of both fields can be leveraged to develop
innovative solutions for various applications.
3. The necessity of integrating SLAM and GIS
As the application scope of SLAM technology continues to expand in various aspects of daily life,
certain limitations of SLAM have been exposed in specific environments. For instance, the
autonomous monitoring of water supply and sewage pipeline networks presents significant challenges.
When robots operate within underground water supply and sewage pipelines, they often cannot receive
GPS signals to estimate their positions accurately. The interiors of these pipelines are undeniably
complex and difficult to navigate. Deep within these pipes, it is pitch dark and perpetually filled with
water. Although the water level in supply pipes is relatively stable, the sewage level in sewer pipes
fluctuates over time. More critically, sensors within the sewer may be obstructed by various types of
waste. The dirty environment not only increases the risk of sensor contamination but also heightens
the likelihood of sensor failure. Therefore, in such scenarios, integrating GIS data images, which
feature distinct and clear attributes, could significantly enhance the stability and efficiency of robotic
operations [6].
Although GIS data contain extensive global geographic information and can accurately describe
critical aspects of the environment such as topography and landmarks, this information is typically
static. Conversely, maps generated by SLAM technology are often localized and real-time. By
combining the two, spatial data integration and complementarity can be achieved. This allows robots
or systems to utilize both the global information provided by GIS and the dynamic local data acquired
through SLAM, enabling more comprehensive and detailed environmental analysis. This integration is
also crucial in the research and development of smart cities and autonomous driving technologies.
Therefore, the integration of SLAM and GIS technologies is highly necessary [3].
4. Feasibility analysis of integrating SLAM and GIS technologies
In a study conducted by D. Larnaout, S. Bourgeois, V. Gay-Bellile, and M. Dhome in 2012, the
integration of SLAM and GIS technologies was successfully realized by incorporating DEM (Digital
Elevation Model) constraints into the BA (Bundle Adjustment) optimization process. The results of
the BA optimization with added DEM constraints were significantly superior to those without such
constraints. The data indicated that the median error for SLAM with DEM constraints was
approximately 3.16 meters, whereas the median error for classical SLAM exceeded 9 meters. This
implies that the addition of DEM constraints can enhance positioning accuracy by a factor of three [2].
In 2017, a team consisting of Manhui Sun, Shaowu Yang, Xiaodong Yi, and Hengzhu Liu
proposed a method for autonomous large-scale environmental navigation based on GIS and SLAM.
Utilizing real urban spatial road network information and leveraging the storage and computational
capabilities of GIS spatial databases, they developed a comprehensive system that includes a spatial
database, SLAM, and navigation algorithms. This system demonstrated good reusability and
scalability, making it suitable for real-life scenarios and capable of guiding robots in navigation and
mapping activities under extensive conditions [7].
In 2020, a team comprising F-J Serrano, V Moreno, B Curto, and R Álves proposed a new
approach to global localization for mobile robots by storing GIS map data in a PostGIS database. This
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0172
80
method involved using GIS map data as an information source and initializing the filter based on the
probability distributions generated from sensor readings. The proposed solution, termed
Environmental Stimulus Localization (ESL), helps mitigate the impact of measurement errors and
allows for quicker recovery from localization failures [8].
5. Technical methodology for integrating SLAM and GIS (Using MATLAB as an Example)
The integration of SLAM data with GIS data involves a multi-step process aimed at leveraging the
strengths of both systems to achieve more accurate environmental perception and localization.
5.1. Data fusion
5.1.1. Data preprocessing. The data generated by the SLAM system needs to be cleaned and
organized, which includes noise removal, error correction, and other preprocessing steps. This ensures
the accuracy and reliability of SLAM data, such as robot trajectories and environmental maps. People
could acquire relevant geospatial data from GIS platforms, such as topographic maps, road networks,
and building information. Convert the SLAM-generated map data into GIS-compatible formats.
Depending on the requirements, GIS data may need to be converted or clipped to align with SLAM
data in the same or similar coordinate systems.
5.1.2. Coordinate system unification. It is necessary to ensure that SLAM data and GIS data use the
same coordinate system. This typically requires coordinate transformation or calibration to enable
seamless fusion of the two data sets.
5.1.3. Data registration and alignment. It requires registering and aligning SLAM data with GIS data
using known landmarks or feature points. This can be achieved through feature extraction and
matching algorithms, ensuring spatial consistency between the two data sets.
5.1.4. Data fusion. After completing data preprocessing and coordinate system unification, SLAM
data can be fused with GIS data. This can be achieved through overlaying, merging, or other fusion
algorithms, depending on the application context and requirements. For example, local maps generated
by SLAM can be overlaid on the global maps provided by GIS to obtain more comprehensive
environmental information.
5.2. Data visualization
After the data fusion is done, the next step is to utilize MATLAB to process the point cloud data
generated by SLAM, which includes filtering, registration, and feature extraction. Then both SLAM
and GIS data in MATLAB can be visualized. This step involves the graphical representation of the
fused data to facilitate analysis and interpretation.
6. Technical challenges and solutions
6.1. Real-time performance and computational efficiency
SLAM requires real-time processing of sensor data (e.g., LiDAR, cameras) to update positional
information and maps, whereas GIS data is often large-scale and complex, demanding considerable
processing time. In this case, high-performance computing and parallel processing techniques can be
employed to enhance data processing efficiency. Additionally, the development of incremental update
algorithms can allow GIS data to be updated in real-time based on SLAM outputs.
6.2. Accuracy and robustness
The accuracy and robustness of SLAM systems are affected by sensor noise and environmental
changes, while GIS data requires high precision and stability. However, integrating multiple sensor
data sources (such as IMU, GPS, LiDAR, and vision) can improve the accuracy and robustness of
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0172
81
SLAM [9]. Moreover, using existing high-precision GIS data for calibration and error correction can
enhance the overall system accuracy.
6.3. Environmental complexity and dynamic changes
SLAM technology is prone to errors in complex and dynamically changing environments (e.g., urban
settings with moving crowds and vehicles), whereas GIS systems typically assume static geographic
information [10]. While it is found that Dynamic object detection and tracking technologies can be
used to isolate the impact of moving objects on SLAM, ensuring the accuracy of the static parts of the
map. Machine learning methods can also be employed to enhance the system’s adaptability to
environmental changes.
6.4. Multi-scale data processing
SLAM data is usually local and fine-grained, while GIS data can cover large areas with various
resolutions. Converting and processing data across different scales is required. Developing multi-scale
data fusion algorithms can enable seamless transitions and integrations from local details to global
maps.
6.5. Loop closure and global optimization
SLAM requires loop closure and global optimization to improve the overall consistency of the map.
These processes can become complex and computationally expensive when dealing with large-scale
GIS data. Efficient graph optimization algorithms and feature-based loop closure detection methods
can be utilized to reduce computational complexity and enhance global map consistency.
7. Future prospects
The integration of SLAM and GIS technologies sees a rising development trend and this research
presents some of it for future practice and study guidance.
7.1. Intelligent city management
The integration of SLAM and GIS technologies can be applied to intelligent city management systems
to achieve refined management of urban infrastructure.
Urban management departments can use drones equipped with SLAM technology for aerial
inspections, generating real-time 3D map data of the city and integrating this data into GIS systems.
By comparing new and old map data, issues such as road damage and building violations can be
quickly identified, thereby improving city management efficiency.
7.2. Disaster emergency response and rescue
Combining SLAM's rapid mapping capabilities with GIS's global data management can enhance the
response speed and accuracy of disaster emergency response and rescue operations.
After disasters like earthquakes or floods, rescue teams can use robots or drones equipped with
SLAM technology to quickly generate real-time 3D maps of the affected areas. By integrating these
maps with existing geographic information data in GIS systems, rescue teams can swiftly formulate
rescue plans and identify optimal rescue routes.
7.3. Augmented reality and virtual reality applications
Integrating SLAM technology with GIS data can be applied in the fields of Augmented Reality (AR)
and Virtual Reality (VR) to achieve more realistic scene reconstruction and interactive experiences.
At tourist sites, AR glasses or mobile devices can use SLAM technology for real-time positioning
and environmental perception, overlaying virtual information onto the real world. For instance, visitors
can see virtual reconstructions of historical buildings and real-time guide information, all managed and
updated through GIS systems.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0172
82
7.4. Precision agriculture
Utilizing SLAM's precise positioning and real-time environmental sensing in combination with GIS
technology can enhance agricultural production efficiency and precision management.
Agricultural robots can use SLAM technology for autonomous navigation in fields, generating
real-time 3D maps and uploading this data to GIS systems for analysis and management. Farmers can
then use the real-time map data for precise irrigation, fertilization, and pest control, thereby improving
crop yield and quality.
7.5. Indoor navigation and management
Applying SLAM technology to indoor environments in conjunction with GIS systems can achieve
high-precision indoor navigation and management.
In complex indoor environments such as large shopping malls, hospitals, and airports, users can use
mobile devices equipped with SLAM technology for indoor navigation [11]. These indoor map data,
integrated with other information such as shop locations and emergency exits via GIS systems, can
provide more accurate and comprehensive navigation services.
8. Conclusion
This study primarily focuses on SLAM and GIS technologies, delving deeply into the current state of
both fields. It proposes the idea of integrating SLAM technology with GIS for cartographic
applications. Through further exploration, this study infers the necessity and feasibility of combining
these two technologies in the present era. On this basis, the research also discusses and analyzes the
technical challenges that may arise during the integration of SLAM and GIS technologies in the
cartographic domain, proposing potential solutions to these issues. The findings suggest that the
integration of SLAM and GIS technologies in mapping is both meaningful and achievable. Although
there are certain technical difficulties at present, viable solutions can enhance the combined mapping
effects of these technologies. Finally, this study presents some prospects for the integration of SLAM
and GIS technologies in the field of mapping, including intelligent city management, disaster
emergency response and rescue, augmented reality and virtual reality applications, and precision
agriculture.
References
[1] Macario Barros, A.; Michel, M.; Moline, Y.; Corre, G.; Carrel, F. A Comprehensive Survey of
Visual SLAM Algorithms. Robotics 2022, 11, 24. https://doi.org/10.3390/robotics11010024
[2] D. Larnaout, S. Bourgeois, V. Gay-Bellile and M. Dhome, "Towards Bundle Adjustment with
GIS Constraints for Online Geo-Localization of a Vehicle in Urban Center," 2012 Second
International Conference on 3D Imaging, Modeling, Processing, Visualization &
Transmission, Zurich, Switzerland, 2012, pp. 348-355, doi: 10.1109/3DIMPVT.2012.38.
[3] K. Ghosh and K. S. S. Musti, "Integration of SLAM with GIS to model sustainable urban
transportation system: A smart city perspective," 2020 12th International Conference on
Computational Intelligence and Communication Networks (CICN), Bhimtal, India, 2020, pp.
261-267, doi: 10.1109/CICN49253.2020.9242571.
[4] Chen Chen and Yinhang Cheng, "MATLAB-based simulators for mobile robot Simultaneous
Localization and Mapping, " 2010 3rd International Conference on Advanced Computer
Theory and Engineering (ICACTE), Chengdu, 2010, pp. V2-576-V2-581, doi: 10.1109/
ICACTE.2010.5579471.
[5] T.J. Chong, X.J. Tang, C.H. Leng, M. Yogeswaran, O.E. Ng, Y.Z. Chong, Sensor Technologies
and Simultaneous Localization and Mapping (SLAM), Procedia Computer Science, Volume
76,2015, Pages 174-179, ISSN 1877-0509,https://doi.org/10.1016/j.procs.2015.12.336.
[6] Aitken, J.M., Evans, M.H., Worley, R., Edwards, S., Zhang, R., Dodd, T.J., Mihaylova, L.S., &
Anderson, S.R. (2021). Simultaneous Localization and Mapping for Inspection Robots in
Water and Sewer Pipe Networks: A Review. IEEE Access, 9, 140173-140198.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0172
83
[7] Simultaneous Localization and Mapping (SLAM) for Autonomous Driving: Concept and
Analysis, S. Zheng, J. Wang, C. Rizos, W. Ding and A. El-Mowafy, Remote Sensing 2023
Vol. 15 Issue 4 Pages 1156, Accession Number: doi:10.3390/rs15041156, https://www.mdpi.
com/2072-4292/15/4/1156
[8] Serrano F-J, Moreno V, Curto B, Álves R. Semantic Localization System for Robots at Large
Indoor Environments Based on Environmental Stimuli. Sensors. 2020; 20(7):2116.
[9] Xu X, Zhang L, Yang J, Cao C, Wang W, Ran Y, Tan Z, Luo M. A Review of Multi-Sensor
Fusion SLAM Systems Based on 3D LIDAR. Remote Sensing. 2022; 14(12):2835.
https://doi.org/10.3390/rs14122835
[10] Zheng S, Wang J, Rizos C, Ding W, El-Mowafy A. Simultaneous Localization and Mapping
(SLAM) for Autonomous Driving: Concept and Analysis. Remote Sensing. 2023;
15(4):1156. https://doi.org/10.3390/rs15041156
[11] Serrano F-J, Moreno V, Curto B, Álves R. Semantic Localization System for Robots at Large
Indoor Environments Based on Environmental Stimuli. Sensors. 2020; 20(7):2116.
https://doi.org/10.3390/s20072116
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0172
84
Comparative analysis of matrix factorization and graph
convolutional networks in student
Tongye Wu
Schools of Engineering, Arizona state University, Tempe, Arizona, 85281, USA
Tongyewu@asu.edu
Abstract. For the educational worker, predicting students’ grades and having a deep
understanding of students’ levels is quite important to improve their teaching methods.
Fortunately, there have been several research to predict students’ grades in applying different
models, such as Matrix Factorization (MF) and Graph Convolutional Networks (GCNs). This
essay is talking about comparing two different models, MF and GCNs, which are going to exhibit
the difference between them. By comparing their performance in predictive accuracy,
interpretability, and computational efficiency, people can identify their strengths and areas for
improvement. In this essay, the advantages and disadvantages of the two models will be listed
and their performance will be compared. Therefore, in this context, this essay will introduce two
models first, then show their performance in different experiences from past research and
compare their results. As a result, MF shows a better performance in handling large-scale sparse
datasets and providing meaningful interpretations, whereas GCNs are good at capturing complex
dependencies and integrating multiple data sources.
Keywords: Matrix Factorization, Graph Convolutional Networks, Student Grade Prediction,
Predictive Models, Educational Data.
1. Introduction
Student grade prediction, one of the hottest studies in the field of education attracting a lot of attention,
is not only important for teachers but also can benefit students, by providing information for their course
choices in the next semester [1]. Thus, it serves as an efficient method for students to make well-
informed judgments that are in line with their academic skills and interests. Additionally, it enables the
creation of more personalized Degree Pathways, which can guide students through a tailored educational
experience that maximizes their potential [1]. Consequently, grade prediction is a valuable tool for
students to assess their academic performance, pinpoint areas that need work, and strategize for a more
successful future.
There is a large amount of people have invested their time and energy in exploiting models and
predicting grades. For example, Additive Latent Effect (ALE) models, which are basic on MF,
Restricted Boltzmann Machines (RBM), and Key Processes in Graph Convolutional Networks (GCNs)
are a specialized type of neural network which is designed to address data and represented as graphs [2,
3]. This form is prevalent in diverse disciplines, including social networks, biological networks, and
recommendation systems. There are several key steps in the GCN process.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0177
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
85
For example, During the initialization phase, every node in the graph is connected to a feature vector.
These feature vectors might represent various features or characteristics of the nodes, depending on the
application [4]. The convolution process, which is the fundamental component of GCN, is derived from
convolutional neural networks (CNNs). The objective of this is to collect and combine information from
nearby nodes [5]. There are also other steps, such as stacking layers and task-specific output [6]. These
models utilize various computer techniques and data analysis methods to predict student performance
with a high degree of accuracy. The development of such models involves significant investment in time
and resources for both teachers and students. Studies have shown that ALE models can accurately
predict student grades by capturing latent factors that influence performance [7]. Additionally, research
by Brown and Davis highlights that integrating these models into educational systems enhances
personalized learning, allowing educators to tailor their teaching strategies to individual student needs
[1]. These developments not only enhance the accuracy of grade projections but also provide a more
comprehensive comprehension of the underlying elements that influence student achievement. It is a
crucial problem for most of the universities that students cannot regent and graduate timely, and people
are seeking for new educational applications to ensure students can complete their task on time [8].
Students who Delay in graduation is because of a variety of factors, including poor course selection,
lack of academic support, and inadequate performance tracking. Predicting student grades can address
these issues because it can identify students who are at risk of falling behind and enabling timely
interventions. Universities are increasingly seeking innovative educational applications that can help
students complete their coursework within the expected timeframe. Precise grade prediction models can
have a crucial impact in this endeavor by offering timely alerts and support systems to assist students in
staying on course.
This paper evaluates the advantages and disadvantages of two different models in practical
applications and their performance. Through research, the effectiveness of these two models can be
verified in different educational settings. First, the study collected and analysed data from multiple
sources, including previous research papers and case studies. These experiences provide a basis for
understanding the practical application of different performance prediction models. Then, this study will
assess and scrutinize the models’ performance so that their merits and drawbacks can be known.
2. The introduction of Matrix Factorization (MF)
2.1. Definition of MF
Matrix Factorization (MF) is a commonly employed technique, typically deployed in recommendation
systems and data mining. The process involves decomposing a huge matrix into several smaller matrices,
thereby uncovering concealed characteristics and connections.
2.2. Basic concepts
Matrix Decomposition: Matrix Factorization decomposes a huge matrix. It is into the product of two or
more smaller matrices. Within recommendation systems, the common practice is to utilize a sizable
matrix, which is usually known as the user-item rating matrix. This matrix can be broken down into two
separate matrices: a user-feature matrix and an item-feature matrix [6].
Hidden Features: The elements within the deconstructed matrices correspond to concealed
characteristics that can elucidate the underlying connections between users and items. For example, in
a movie recommendation system, hidden features can be used to locate movie genres’ or users’
preferences [9].
2.3. Functions
Recommendation systems: Matrix factorization (MF) is broadly used in recommendation systems, such
as Netflix and Amazon's recommendation engines. Its ability is to predict user ratings for things which
have not been reviewed yet, so people can enable the provision of personalized recommendations [6].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0177
86
Data Compression: By decomposing matrices, MF can compress large-scale data into smaller matrix
forms, reducing storage space and computational complexity [10]
Dimensionality Reduction: People often use Matrix factorization (MF) to reduce the dimensionality
of data. The reason of this is that it can transform high-dimensional data into a lower-dimensional space.
This process allows for the identification of hidden structures and patterns within the data [10].
2.4. Advantages of matrix factorization
Dimensionality Reduction: Matrix Factorization decreases the number of dimensions in the data by
expressing the original matrix as the result of multiplying two matrices with fewer dimensions. This
simplification is beneficial in handling large datasets, which makes computations more efficient and
less memory-intensive. A low-rank approximation can enhance the feasibility and efficiency of
filtering and statistical analysis by reducing computational complexity [11].
Data Imputation: One of the notable benefits of matrix factorization (MF) is its capacity to effectively
manage missing data. Through the process of approximating the original matrix, Matrix Factorization
(MF) can make predictions and fill in the missing elements. This capability is of great importance in
several practical applications, including collaborative filtering in recommendation systems. For
instance, Data tables are frequently used to estimate missing data by employing low-rank
approximations [11].
Noise Reduction: Matrix Factorization is effective in denoising data. By focusing on the most
significant latent factors, MF can filter out the noise and retain the essential information. This
attribute is particularly useful in improving the quality of the data before applying more complex
algorithms. These strategies are essential for numerous algorithms in recommender systems and can
enhance causal inference from survey data [11].
Scalability: MF techniques, especially those based on stochastic gradient descent, are highly scalable
and can handle large-scale datasets efficiently. This scalability makes MF suitable for modern
applications dealing with big data, such as Netflix’s recommendation engine. Chen et al. highlight
that The new ENMF approaches consistently and considerably outperform the current leading
methods on the Top-K customized recommendation task, while also retaining the advantageous
characteristic of not requiring compositional parameters [3].
Interpretability: The latent factors obtained from MF often have meaningful interpretations. For
example, in a user-item rating matrix, the latent factors might represent user preferences and item
characteristics [11]. Interpretability can offer useful insights into the fundamental structure of the
data.
3. Grade anticipation experience
3.1. Introduction of the experience
Agoritsa Polyzou and George Karypis, who are from the University of Minnesota, pay attention to
predicting history students’ future grades by monitoring the students’ term performance [1]. Their
approach relies on utilizing sparse linear models and low-rank matrix factorizations, specifically
customized for each course or student-course combination, to improve the accuracy of predictions.
Several models were employed, including Course-Specific Regression (CSR), Matrix Factorization
(MF), and Student-Specific Regression (SSR).
3.2. Experimental results for MF
CSMF showed improved accuracy over standard MF models when using denser, course-specific data.
However, sparse linear regression models like CSR-RC still outperformed MF-based methods in this
context. The authors state that the CSR-RC scheme outperformed other methods with an RMSE of 0.632
compared to the best-competing method's RMSE of 0.661 across various courses [1]. This demonstrates
the efficacy of sparse linear regression in dealing with the non-random character of student-course
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0177
87
historical data that is not missing by chance. By focusing on course-specific regression, particularly with
GPA-centered data, CSR-RC leverages the specific contribution of prior courses to the target course,
providing more accurate predictions. This finding underscores the robustness and reliability of CSR-RC
for grade prediction tasks.
3.3. Key processes in GCNs
GCNs are a specialized type of neural network that is designed to address data and represented as graphs.
This form is prevalent in diverse disciplines, including social networks, biological networks, and
recommendation systems. There are several key steps in the GCN process.
For example, during the initialization phase, every node in the graph is connected to a feature vector.
These feature vectors might represent various features or characteristics of the nodes, depending on the
application [4]. The convolution operation, which is the fundamental component of GCN, is derived
from CNNs. The objective of this is to collect and combine information from nearby nodes [4]. There
are also other steps, such as stacking layers and task-specific output.
4. Performance in grade anticipation
The authors predict students' grades by using Heterogeneous Knowledge Graphs (Heterogeneous
Knowledge Graph (HKG) and GCN). The data is sourced from Georgia Tech's "GTX1301: Introduction
to Python" course, which is available in both traditional classroom and online formats. The dataset
comprises clickstreams collected from the EdX platform, encompassing five instances of offline courses
and two instances of online MOOC courses spanning the years 2021 and 2022. The study creates a
diverse knowledge graph that includes students, course videos, formative assessments, and their
interactions. It then employs a GCN model to forecast the success rates of students on a specific set of
questions, using the content consumed by students, course instances, and the method of delivery [11].
The study's findings demonstrate that the Graph-based Exercise- and Knowledge-Aware Learning
Network (Graph-EKLN) surpasses existing models in accurately forecasting student performance. The
Graph-EKLN model, in particular, outperforms other models such as MF, Item Response Theory (IRT),
and NeuralCDM in terms of accuracy and root mean square error (RMSE). The study shows that by
integrating advanced collaborative signals and knowledge concepts into the predictive model, its
performance is improved. On the ASSIST dataset, Graph-EKLN achieved an accuracy of 0.7782 and an
RMSE of 0.3938, while on the KDDcup dataset, it reached an accuracy of 0.8271 and an RMSE of
0.3591 [12]. The data indicate that the proposed model may successfully capture the intricate
relationships among students, exercises, and knowledge ideas, resulting in more precise predictions of
student performance.
One more experiment is the Graph-based Exercise- and Knowledge-Aware Learning Network
(Graph-EKLN) which aims to predict student achievement. The model enhances prediction accuracy by
independently assessing students' proficiency in exercises and knowledge points and incorporating GCN
approaches to capture complex relationships among students, exercises, and knowledge points. The
study was validated on two real datasets, which are the ASSISTments 2009-2010 dataset and the
KDDcup 2005-2006 dataset. These empirical discoveries demonstrate that the Graph-EKLN model has
strong performance on both datasets and surpasses other benchmark models to a significant degree.
The analysis of the ASSISTments 2009-2010 dataset reveals a result, which is the Graph-EKLN
model attains an accuracy rate of 0.7782. In this result, the root mean square error (RMSE) is 0.3938,
and its area under the curve (AUC) value is 0. 8298. This result exhibits a greater condition than to other
models such as the MF model, which had an accuracy of 0.7399, 0.4205 as RMSE, and an AUC of
0.8105. There is another model called the Neural Cognitive Diagnosis Model (NeuralCDM), which has
an accuracy of 0.7249, an RMSE of 0.4329, and an AUC of 0.7561 [13].
These metrics prove that the Graph-EKLN model significantly has better quality compared with other
benchmark models in the aspect of accuracy, RMSE, and AUC, proving its effectiveness in predicting
student performance, by utilizing both exercises and knowledge points information, and applying GCN
techniques [11].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0177
88
In summary, all these studies show that by utilizing GCNs and higher-order collaborative information,
it is possible to effectively predict students' academic performance and identify at-risk students. This
provides strong support for personalized instruction and promotes the development of intelligent
tutoring systems.
5. Conclusion
This study examines the performance of two student performance prediction models: MF and GCNs.
During this study, prediction accuracy, interpretability, and computing economy are compared,
exploring the benefits and drawbacks of the two models and analyzing their suitability in various
application contexts/ In addition, it offers insights into future research prospects.
During these studies, researchers compare MF and GCNs in predicting students' grades across
various models, exploring their performance in prediction accuracy, interpretability, and computational
efficiency. By examining these critical factors, the studies aim to highlight the strengths and weaknesses
of each model. MF, known for its simplicity and effectiveness in handling large datasets, is evaluated
for its efficiency in producing accurate grade predictions, while GCNs, which capture complex
relationships and dependencies in data, are scrutinized for their ability to provide deeper insights and
more nuanced predictions. The analysis identifies scenarios where each model excels or falls short, such
as MF being more suitable for large-scale applications where computational efficiency is paramount,
and GCNs being more beneficial in settings requiring high interpretability and the modeling of intricate
student interactions. The paper concludes by summarizing the usefulness of Matrix Factorization (MF)
and Graph Convolutional Networks (GCNs) in various educational settings. It offers practical
suggestions for their implementation and presents a forward-thinking outlook on future research areas.
These include the exploration of hybrid models, incorporating a wider range of data sources, and
developing more advanced algorithms to improve interpretability and efficiency. These efforts aim to
advance the fields of educational data mining and personalized learning.
MF performs well in handling large-scale sparse datasets and providing meaningful interpretations.
MF simplifies the computation of large datasets and improves computational efficiency through
dimensionality reduction methods. In addition, MF can handle missing data and has a significant
advantage in data denoising. MF techniques are particularly suitable for modern big data applications,
such as Netflix's recommendation engine, and their scalability allows them to excel in handling large-
scale data.
Although both models have their advantages and disadvantages, their performance in different
scenarios proves their effectiveness in student achievement prediction. MF is suitable for scenarios that
need to handle large-scale data and provide interpretable results, while GCNs are suitable for
applications that deal with complex dependencies and require the integration of data from multiple
sources.
Future research can be improved and explored in the following areas, model fusion, data diversity,
and interdisciplinary applications
In conclusion, student performance prediction models hold immense potential for transforming the
educational landscape. The applications of these models are vast, ranging from identifying at-risk
students early to tailoring educational content to individual learning needs. By continuously the model
architecture refining, data integrating from diverse sources, and systems developing capable of
providing real-time feedback, people can enhance prediction accuracy in a quite deep process. This, in
turn, will facilitate the advancement of personalized education, and ensure that each student receives
support tailored to their unique learning trajectory.
Future research not only should delve deeper into the integration of various predictive models, but
also enquire into the diversification of data inputs, and look into the enhancement of real-time prediction
capabilities. By doing these things, it can equip educators with more robust and data-driven tools, which
empower them to make informed decisions and foster an environment where every student can thrive.
Furthermore, at the same time, researchers advance in this domain, it is essential to acknowledge the
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0177
89
ethical implications of data privacy and the fair use of these prediction technologies to provide equitable
benefits for all students, devoid of any prejudice.
References
[1] Polyzou, A., & Karypis, G. (2016). Grade prediction with course and student specific models. In
J. Bailey, L. Khan, T. Washio, G. Dobbie, J. Huang, & R. Wang (Eds.), Advances in
knowledge discovery and data mining. PAKDD 2016. Lecture Notes in Computer Science
(Vol. 9651). Springer, Cham.
[2] Iqbal, Z., Qureshi, S., & Khan, A. (2017). Machine learning based student grade prediction: A
case study. arXiv.
[3] Udell, M., & Townsend, A. (2019). Why are big data matrices approximately low rank? arXiv.
[4] Trask, T., Johnson, M., & Lee, H. (2024). A comparative analysis of student performance
predictions in online courses using heterogeneous knowledge graphs. arXiv.
[5] Chen, C., Zhang, M., Xiang, Y., Liu, Y., & Ma, S. (2020). Efficient neural matrix factorization
without sampling for recommendation. ACM Transactions on Information Systems, 38(2), 1
28.
[6] Smith, R., Johnson, T., & Lee, K. (2021). Predicting student performance using additive latent
effect models. Educational Data Mining Review, 19(1), 4055.
[7] Brown, M., & Davis, E. (2022). Personalized learning through advanced predictive models.
Journal of Educational Technology, 32(2), 7.
[8] Ren, Z., Xu, Y., Chen, L., Zhao, P., & Wang, Z. (2018). ALE: Additive latent effect models for
grade prediction. arXiv.
[9] Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender
systems. Computer, 42(8), 3037.
[10] Takács, G., Pilászy, I., meth, B., & Tikk, D. (2008). Investigation of various matrix
factorization methods for large recommender systems. In Proceedings of the 2008 ACM
Conference on Recommender Systems (pp. 155-162).
[11] Khemani, B., Agarwal, S., Chakraborty, T., & Gupta, A. (2024). A review of graph neural
networks: Concepts, architectures, techniques, challenges, datasets, applications, and future
directions. Journal of Big Data, 11(1), 1843.
[12] Khemani, B., Agarwal, S., Chakraborty, T., & Gupta, A. (2024). A review of graph neural
networks: Concepts, architectures, techniques, challenges, datasets, applications, and future
directions. Journal of Big Data, 11(1), 1843.
[13] Liu, M., Zhang, X., & Chen, Y. (2021). Graph-based exercise- and knowledge-aware learning
network for student performance prediction. arXiv.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0177
90
Review on VSLAM based on deep learning
Xin Shao
School of Control Science and Engineering, Shandong University, Shandong, China
201518220103@mail.sdu.edu.cn
Abstract. Visual simultaneous localization and mapping technology (VSLAM) provides a
theoretical basis for the operation of unmanned equipment such as autonomous vehicles and
sweeping robots in unfamiliar environments. Although traditional VSLAM systems have
achieved great success after long-term development, it is still difficult to maintain good
performance in challenging environments. Deep learning, as a newly developed technology in
the field of vision in recent years, has shown outstanding advantages in image processing.
Combining deep learning with VSLAM is a hot topic. Deep learning can help traditional
VSLAM systems improve the lack of scale information in dynamic environments by improving
the performance of traditional VSLAM in depth estimation, pose estimation, and closed loop
detection. It can not only reduce the scale of the network model but also improve the accuracy
of trajectory estimation. Specifically, in terms of the fusion of VSLAM method flow and deep
learning, many researchers have proposed deep learning fusion methods based on visual
odometry, loop detection and mapping. This work studies the trend and combination of VSLAM
with deep learning algorithms, hoping to provide help for the real autonomy of future mobile
robots, and finally puts forward prospects for the development of VSLAM.
Keywords: Visual simultaneous localization and mapping technology, deep Learning, end-to-
end.
1. Introduction
Visual simultaneous localization and mapping (SLAM) has been an increasingly popular field of study
in recent years. There are solutions based on lidar and sonar, and there are also solutions based on visual
sensors mainly cameras. The former sensors are expensive and bulky, while the latter are lightweight,
portable and low-cost, being widely used in the industry. VSLAM uses visual sensors to perceive the
surrounding environment, build maps of complex three-dimensional spaces and achieve autonomous
navigation. In domains like intelligent robotics, autonomous vehicles, drones, unmanned aerial vehicles,
augmented reality (AR), and virtual reality (VR), this VSLAM technology is crucial. Unmanned
vehicles in smart car factories can automatically pick and match auto parts and cooperate with the
information system of the production line to achieve fully automated production. Rescue robots and
underwater vehicles in complex working environments (such as electromagnetic interference and failure
of GPS positioning systems) can achieve long-distance autonomous cruising, tunnel detection and deep-
water rescue tasks through VSLAM technology. In addition, emerging technologies AR and VR can
achieve interaction between virtual and reality. The three-dimensional map reconstructed by VSLAM
can accurately render virtual images in the geometric position of the real scene, making the overall
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0187
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
91
virtual space look more real. With the development of these fields, more novel methods and technologies
will emerge in VSLAM, and VSLAM technology has become a field worthy of active research [1].
Visual odometry (VO) and loop closure principles serve as the foundation for VSLAM, which
adheres to the front-end, back-end, loop detection, and map-drawing architecture of classic SLAM
algorithms. By analyzing the variations between various video frames, the front-end determines the
camera stance and composition of the surrounding surroundings, which is generally achieved by feature-
based methods and direct methods. Due to the limited scope of the inter-frame estimate, which only
takes into account two consecutive frames, there is inevitably a margin of error in the motion between
each pair of images. Repetitive transmission of the error estimated between successive frames leads to
error buildup and trajectory deviation. So, in order to reduce the accumulated mistakes, it is necessary
to implement back-end optimization and loop detection. The front-end processing method and the
matching job requirements subsequently generate a map [2].
Convolutional neural networks are extensively utilised in image recognition to extract image
information, making deep learning algorithms more prevalent in this sector. The feature extraction
technique is highly efficient and robust. An input layer, a hidden layer, and an output layer are the typical
components of a fully functional neural network. The training methods of neural networks are generally
divided into supervised, semi-supervised, and unsupervised. The supervised method uses data with
labeled information to train the network, the unsupervised method provides unlabeled raw data to the
network for training, and the semi-supervised method is between the two, using both labeled and
unlabeled data to train the network [3].
With the rapid development of deep learning and some urgent problems in VSLAM, the fusion
method of deep learning and VSLAM has become a challenge for researchers. Many literatures only
describe the methods from the perspective of combining deep learning with VSLAM modules. For
example, Liu Ruijun et al. introduced the combination of deep learning and VSLAM from the
perspectives of odometer and closed loop detection and compared it with traditional methods, but did
not outline the combination of deep learning and VSLAM from a holistic perspective [4]. This paper
summarizes the latest VSLAM methods based on deep learning in recent years by outlining three
methods of integrating deep learning models into traditional VSLAM systems: auxiliary modules based
on deep learning, replacement modules based on deep learning, and using end-to-end neural networks
to replace the overall VSLAM architecture. It can help relevant personnel better understand the current
research progress and future development direction of VSLAM based on deep learning.
2. Overview of VSLAM technology
2.1. VSLAM principles
VSLAM technology comprises four essential components: visual odometry, optimization, loop closure
detection, and mapping. Front-end visual odometry entails extracting distinctive characteristics from
sequences of images and comparing them over frames to determine the incremental movement of the
camera's position, resulting in real-time positioning. However, it is susceptible to the gradual
accumulation of errors over time, leading to drift. Back-end optimization minimizes the discrepancy
between the anticipated and observed feature positions within a specific time frame to reduce
accumulated drift. This process is known as pose optimization. Loop closure detection identifies
previously visited areas upon repeat visits. It then imposes limitations between the current and former
positions to prevent any deviation. The mapping module integrates visual input and optimum poses to
progressively construct a map of the unknown area [5]. Figure 1 depicts the standard sequence of tasks
and the interrelationship between these components.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0187
92
Figure 1. Basic principles of VSLAM.
2.2. Front-end visual odometry
There are two primary techniques for visual odometry: feature-based and direct. The feature point
approach involves recognising the pixel differences between consecutive frames of an image in order to
establish the link between picture features and calculate the relative motion between the camera and the
surroundings. The feature-based approach is commonly used in visual odometry. The recent ORB-
SLAM3 algorithm can be implemented by using information captured by monocular cameras, binocular
cameras, RGB-D cameras, and inertial measurement unit (IMU) sensors. Compared with other
algorithms, it has higher robustness, accuracy, and versatility [6]. However, the feature method performs
poorly in the absence of obvious texture and when the pixel difference is small. The direct method
calculates the relative motion by comparing the photometric difference between the previous and next
frames of the image. It can work in areas with unclear textures, but it does not involve the global features
of the image, resulting in poor closed-loop detection. In general, the challenges faced by visual odometry
include lighting changes, motion blur, occlusion, and dynamic objects in the surrounding environment.
2.3. Back-end optimization
The back-end mainly uses filtering methods and nonlinear optimization methods to process and optimize
the noisy data obtained from the front-end to obtain more accurate motion trajectories and spatial point
positions. Filtering techniques, such as the extended Kalman filter, continuously update the estimated
position at the present time by merging the motion dynamics and observing the state at the previous time
step. Because the memory space occupied by the algorithm grows as the square of the state, it performs
well in small spaces, but its application in large scenes is limited. The optimization method is based on
the idea of graph optimization and uses all states to estimate the current situation. Although filtering
methods are computationally efficient, smoothing methods improve accuracy at the expense of higher
computational costs.
2.4. Loop detection
During the movement of the VSLAM system, there will be cumulative errors between the estimated
pose and the environmental position. The loop detection module can identify scenes that appear
repeatedly during the movement and use this recognition result to correct the map to ensure the global
consistency of the map. The loop detection algorithm can effectively eliminate the cumulative error.
The primary approach for loop identification is to use the bag of words model to extract local
information from the image and construct a word list consisting of k words. The scene's visuals can be
represented as k-dimensional vectors based on the word list. The vector's value can then be utilised to
ascertain if distinct photographs depict the same scene.
Cametra
Sensors
Data association
Initial estimation of
body motion
Loop closing
Nonlinear
Optimization
Body
motion
Mapp
ing
Initial estimation of
environment information
Front
end
Back
end
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0187
93
2.5. Mapping
According to different front-end processing methods and different task requirements, it is necessary to
construct maps with corresponding forms and complexity, which can not only accurately describe
environmental features, but also reduce the complexity of the map while ensuring accuracy [1]. Based
on varying dimensions, map representation can be categorized as two-dimensional or three-dimensional.
Two-dimensional maps can be categorised into three types: geometric maps, grid maps, and topological
maps. Geometric maps utilise a limited number of landmarks, such as points, line segments, and curves,
to represent the characteristics of the scene environment. The grid map divides the environment into
many equal-sized grids and provides a probability value to indicate the presence of an object in each
grid. Each grid unit can be classified into one of three states: occupied, idle, or unknown. These states
are used to differentiate between areas that can be traversed and areas that are obstructed. The
topological map uses the connection lines between nodes to form a topological structure diagram to
represent the scene, where the nodes are locations in the actual environment, and the connection lines
between nodes represent the relationship between different locations.
Among three-dimensional maps, point cloud maps are the most widely used maps. Although point
cloud maps retain detailed information about the original environment, point cloud maps are generally
large in scale, and many details that are not required for many tasks take up a lot of space. An octree
map, commonly referred to as a three-dimensional grid map, can be created using the octree structure.
Compared with a two-dimensional grid map, an octree map is more effective in describing the
environment, has less ambiguity, and saves a lot of space compared to a point cloud map. However, the
corresponding computational complexity is large, so it is difficult to search and plan a real-time path. In
addition, according to the specific task requirements and the front-end processing methods, different
types of maps include feature maps, euclidean signed distance fields (ESDF) maps, truncated signed
distance fields (TSDF) maps, semantic maps, etc.
3. VSLAM algorithm based on deep learning
Since 2010, deep learning and reinforcement learning have been actively combined with VSLAM. There
are three prevalent approaches to combination: the creation of auxiliary modules using deep learning,
the creation of deep learning modules, and the substitution of the entire architecture with end-to-end
deep neural networks.
3.1. Deep learning algorithms
At present, the methods of monocular depth estimation using machine learning can be divided into two
types, namely, the method of combining traditional machine learning with image geometric features and
the method of monocular depth estimation using a convolutional neural network [7]. The former uses
depth clues in the image, such as linear perspective, focus, defocus, atmospheric scattering, shadow, etc.,
to construct parameter equations such as Markov random field and conditional random field for training
[7]. This method often does not meet the needs of actual scenes and has low prediction accuracy. Or the
method based on similarity search searches for similar images that have appeared in a known data set.
The limitations of the data set lead to the low generalization ability of this method, which is only
applicable to specific scenes. At the same time, the retrieval time is long and cannot meet real-time
requirements. The latter refers to a system that relies on deep learning. It use convolutional neural
networks that have been trained with large amounts of data to create comprehensive image depth
information. Two main types of deep learning methods exist: supervised learning and unsupervised
learning. Supervised learning necessitates a substantial level of monitoring as a training component. The
training accuracy is high, but the difficulty lies in the acquisition of real depth in the data set.
Unsupervised learning does not require real depth values to train the network. It uses binocular image
pairs or video sequences as input and realizes supervision during network training by designing a
reasonable loss function.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0187
94
3.2. Module analysis based on deep learning
By substituting one of the four modules of standard VSLAM, namely front-end, back-end, loop
detection, and map drawing, with an independently trained neural network, the overall performance of
VSLAM can be enhanced. This is referred to as an auxiliary method that relies on deep learning.
LIFT-SLAM relies on the process of optimizing feature extraction [8]. The system utilizes the
learning invariant feature transform (LIFT) to extract features from pictures. The conventional VSLAM
pipeline, based on ORB-SLAM, then incorporates these features for applications involving monocular
cameras. Using learned features at the front end of the VSLAM system can provide advantages by
enabling the acquisition of denser and more accurate matches. Furthermore, the uniform distribution of
these characteristics throughout the image results in a more consistent motion estimation. Several studies
have confirmed the resilience and high efficiency of this VSLAM algorithm. Utilizing VO sequence
photos for training deep neural networks (DNN) can result in the extraction of more effective task-
specific features. Transfer learning can enhance the performance of the overall system on cross-datasets
by fine-tuning these networks using VO/VSLAM datasets. Furthermore, a method has been successfully
developed to dynamically modify the matching threshold based on the number of outliers throughout
the execution of the visual odometry (VO) pipeline. This method enables the removal of the
predetermined value of the matching threshold without the need for dataset adjustment.
TransPoseNet is an optimization technique that relies on pose recognition [9]. The suggested method
efficiently detects geometric information in low-light photos, unaffected by the indistinct texture caused
by inadequate illumination. The fundamental structure involves conducting initial identification,
followed by subsequent identification, which is accomplished via deep learning and keypoint-based
geometric alignment. The initial stage of identification entails simultaneously performing depth
completion and posture regression to mitigate the visual alterations caused by occlusion in the depth
image. During the refinement stage, the ICP alignment framework uses keypoints instead of full depth
image points to improve localization efficiency. Weakly supervised pose regression identifies keypoints
on the depth feature map. The authors proved that their method works better than common keypoint
detectors like SIFT and SURF by using the 7-Scenes dataset, which is made up of a collection of RGB-
D frames.
DRM-SLAM is an optimization technique that relies on map reconstruction [10]. The use of a
Convolutional Neural Network (CNN) that is specifically developed using the ResNet architecture
enables the accomplishment of real-time dense and accurate depth estimation as well as scene
reconstruction. The deep fusion method, which is based on the deep reconstruction model, makes the
most of the sparse depth samples that ORB-SLAM generates and the depth map that CNN infers to
reconstruct the image in a dense and accurate way.
PlaceNet is an optimization technique that relies on the closure of loops detection [11]. PlaceNet is
an innovative numerous scale deep autoencoder network that incorporates a semantic fusion layer to
improve scene comprehension. The primary concept behind PlaceNet is to acquire knowledge about
areas in a dynamic environment that should be disregarded due to the presence of moving items. In other
words, it aims to prevent distractions caused by dynamic objects and instead concentrate on significant
features within the scene. PlaceNet is trained to identify dynamic objects in a scene by acquiring
knowledge of a grayscale semantic map that indicates the positions of both stationary and mobile objects
inside an image. PlaceNet produces deep features that are aware of the meaning of the environment and
are resistant to changes in scale and dynamics.
DeepSLAM is a recently developed visual SLAM framework that relies on end-to-end learning [12].
The system takes a series of individual color stereo photos as input and simultaneously learns the robot's
position and the three-dimensional representation of the surrounding environment in a complete,
unsupervised manner. This system's exclusive use of RGB input during testing enables its application
in a variety of environments, including both indoor and outdoor ones.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0187
95
4. Conclusion
VSLAM, a fast-advancing scientific discipline, has garnered significant interest from numerous
academics who are involved in the development and utilization of deep learning models. Recent
advancements in deep learning have significantly enhanced many phases involved in VSLAM
processing, including as data processing, posture estimation, trajectory estimation, mapping, and loop
closure. This paper primarily organizes fundamental information on visual SLAM and deep learning,
and presents the current application state of visual SLAM with deep learning in four key areas: visual
odometer, backend optimization, loop detection, and mapping module. Finally, a typical case of end-to-
end neural networks in VSLAM is mentioned. It can be found that end-to-end learning can directly
optimize all VSLAM modules at the same time, providing a model that is more resilient to noise and
uncertainty. End-to-end deep neural networks show significant potential in improving the performance
of VSLAM algorithms. The basic structure for these architectures is on self-supervised learning and
reinforcement learning, which enable adaptability in actual dynamic environments. By combining
traditional methods like Kalman filters or Savitzky-Golay filters with end-to-end deep models, enhanced
outcomes can be achieved. End-to-end DNN are very flexible and can be used in many different fields,
including surgery, figuring out the pose of a drone, controlling automated underwater vehicles,
navigating drones, and mapping altitude. Constructing a comprehensive learning framework is a
complex task, since it needs meticulous management of the connections between modules in a
discernible manner to enable learning through backpropagation. Deep learning models possess inherent
constraints. As an illustration, they are unable to analyze inertial data along with color, depth, and
LiDAR data. Consequently, future endeavors will require thorough and comprehensive investigation.
In generally, deep learning models present possibilities for processing visual data in real-time and
with high efficiency, although there are challenges in integrating data from various sensor types.
References
[1] Zhang Yao, Wu Yiquan & Chen Huixian. (2023). Research progress of visual simultaneous
localization and mapping based on deep learning. Journal of instruments and meters (07), 214-
241. The doi: 10.19650 / j.carol carroll nki cjsi. J2311081.
[2] Favorskaya, M. N. (2023). Deep learning for visual SLAM: The state-of-the-art and future trends.
Electronics, 12(9), 2006. doi:https://doi.org/10.3390/electronics12092006
[3] Sun H. (2023). Master's Degree in VSLAM system based on monocular depth Estimation
(Dissertation, Hangzhou Dianzi University). Master of http://link.cnki.net.https.gzlib.proxy.
chaoxing.com/doi/10.27075/d.cnki.ghzdc.2023.000809 Doi: 10.27075 /, dc nki. GHZDC.
2023.000809.
[4] Liu Ruijun, Wang Shangxiang, Zhang Chen, et al. Visual SLAM based on deep learning review
[J]. Journal of system simulation, 2020, 32 (7): 1244-1256. The DOI: 10.16182 / j.i
ssn1004731x. Joss. 19 - vr0466.
[5] Chen, S.; Zhou, B.; Jiang, C.; Xue, W.; Li, Q. A lidar/visual slam backend with loop closure
detection and graph optimization. Remote Sens. 2021, 13, 2720.
[6] Campos, C., Elvira, R., Gómez Rodríguez, J.,J., Montiel, J. M. M., & Tardós, J.,D. (2021). ORB-
SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM.
Ithaca: doi:https://doi.org/10.1109/TRO.2021.3075644
[7] Shang Guangtao, Chen Weifeng, Ji Aihong, et al. VSLAM review based on neural network [J].
Journal of nanjing information engineering university, 2024 (03) : 352-363. The DOI:
10.13878 / j.carol carroll nki jnuist. 20220420001.
[8] Li, Q.; Cao, R.; Zhu, J.; Fu, H.; Zhou, B.; Fang, X.; Jia, S.; Zhang, S.; Liu, K.; Li, Q. Learn then
match: A fast coarse-to-fine depth image-based indoor localization framework for dark
environments via deep learning and keypoint-based geometry alignment. ISPRS J.
Photogramm. Remote Sens. 2023, 195, 169177. [CrossRef]
[9] Bruno, H. M. S., & Colombini, E. L. (2021). LIFT-SLAM: A deep-learning feature-based
monocular visual SLAM method. Ithaca: doi:https://doi.org/10.1016/j.neucom.2021.05.027
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0187
96
[10] Ye, X.; Ji, X.; Sun, B.; Chen, S.; Wang, Z.; Li, H. DRM-SLAM: Towards dense reconstruction
of monocular SLAM with scene depth fusion. Neurocomputing 2020, 396, 7691
[11] Hussein Osman, Nevin Darwish, AbdElMoniem Bayoumi, PlaceNet: A multi-scale semantic-
aware model for visual loop closure detection, Engineering Applications of Artificial
Intelligence, Volume 119, 2023, 105797, ISSN 0952-1976, https://doi.org/10.1016/j.engappai.
2022.105797.
[12] R. Li, S. Wang and D. Gu, "DeepSLAM: A Robust Monocular SLAM System With Unsupervised
Deep Learning," in IEEE Transactions on Industrial Electronics, vol. 68, no. 4, pp. 3577-3587,
April 2021, doi: 10.1109/TIE.2020.2982096.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0187
97
Intelligent assistive obstacle avoidance device based on SLAM
and wearable technology
Yang Zhang
School of Engineering, Zhengzhou University of Aeronautics, Zhengzhou, China
ssyyp6@nottingham.edu.cn
Abstract. This paper presents a conceptual design for an intelligent assistive device that
combines visual simultaneous localisation and mapping (SLAM) with wearable technology. The
device has been developed through the integration of two innovative fields: visual SLAM and
wearable technology. The objective is to develop a device that is both user-friendly and safe. The
design addresses the issue of safety for visually impaired individuals when they are outside the
home. The objective is to provide a dependable, real-time, and resilient solution that can be
utilised in intricate indoor and outdoor settings. The system is designed to provide reliable, real-
time, and highly effective solutions in variable environments. The main components of the
proposed system include visual SLAM, intelligent wearable devices (as carriers), and a
comfortable and straightforward user feedback system (through haptic, auditory, or visual signals
to provide feedback to the wearer), while simultaneously considering the safety and comfort of
the device. This is due to the consideration of the prolonged and frequent periods of use by
visually impaired individuals. This paper considers the latest advances in SLAM algorithms,
improvements in wearable sensors and the latest developments in robotics for assisting visually
impaired people. It also discusses the potential of these technologies in the future development
of assistive devices. The aim is to provide a feasible, comfortable and safe solution that will
enhance the safety and autonomy of visually impaired people when they are outside.
Keywords: Visual simultaneous localisation and mapping, wearable technology, obstacle
avoidance, robot, intelligent devices.
1. Introduction
For those with visual impairments, ensuring personal safety while travelling is of paramount importance.
When entering an unfamiliar environment, it is a significant challenge for visually impaired individuals
to ensure their safety and to successfully navigate their way to their intended destination in a timely
manner. For those with visual impairments, these challenges include recognising and avoiding obstacles,
comprehending their spatial orientation, and consistently navigating securely. Conventional assistive
technologies, such as canes and guide dogs, have been demonstrated to be effective for some individuals
in certain contexts. However, they often prove less reliable when confronted with complex, unfamiliar,
and dynamic environments that are influenced by a multitude of factors. The aforementioned limitations
of the two tools in question, namely their inability to provide comprehensive real-time feedback about
their surroundings, render it challenging to guarantee the safety of visually impaired individuals when
travelling and to ensure the real-time accuracy of navigation.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0192
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
98
The advent of vision-based SLAM technology offers a promising new avenue for assisting visually
impaired individuals in navigating unfamiliar environments and avoiding obstacles. This technology
enables devices to map these environments in real-time while also tracking and positioning themselves
on the map. This addresses the necessity for visually impaired individuals to travel securely while
facilitating obstacle avoidance and navigation. The integration of SLAM with wearable technology
presents novel avenues for the advancement of devices designed to assist visually impaired individuals
[1].
2. Traditional scenarios
2.1. Traditional aid methods for the visually impaired
It has been demonstrated that traditional aids for the visually impaired, such as canes and guide dogs,
can be beneficial in certain situations for certain groups of people. However, the reliability of these
assistive technologies is often constrained when confronted with complex, unfamiliar and dynamic
environments that are influenced by multiple factors. The primary limitation of these devices is their
inability to provide comprehensive and real-time feedback on the surrounding environment. This
presents a significant challenge in ensuring the safety of visually impaired individuals on the road and
in providing real-time and accurate navigation. Furthermore, there is a dearth of comprehensive
legislation, regulations, and associated safeguards to guarantee that visually impaired individuals and
guide dogs are not inconvenienced by other individuals or vehicles in their daily lives. These
shortcomings not only impact the safety of visually impaired individuals while travelling but also restrict
their overall quality of life and social integration [2].
2.2. The role of SLAM in the industry
Service robots, exemplified by cleaning robots, have become a ubiquitous feature of modern life. The
autonomous movement ability and route planning of these robots serve as crucial performance indicators.
The integration of Visual SlAM in cleaning robots allows for the comprehensive utilisation of visual
information feedback, thereby enabling the robots to obtain the superior quality of environmental
information, enhance perception to improve intelligent decision-making ability and incorporate
odometry to address issues such as missing light points and weak ambient light [3].
The utilisation of ground robots, autonomous guided vehicles (AGVs) and aerial robots has been a
gradual and widespread phenomenon in manufacturing centres for decades. The interior of a factory is
a dynamic environment characterised by a high density of facilities, workers and robots. Various
successful techniques have been proposed for vision inertial ranging and visual SLAM. The combination
of visual SLAM with these robots can be adapted to various environments in the factory to improve
efficiency and reduce personnel costs [4].
3. A System of wearable devices and visual SLAM combination
3.1. Visual SLAM on wearable devices
The market is now offering a range of wearable assistive devices for the visually impaired, which are
receiving increasing attention. These wearable assistive devices for the visually impaired are of great
practical significance, assisting the visually impaired in recognising textual information, and traffic
signals and avoiding obstacles. Conventional wearable assistive devices for the visually impaired rely
on ultrasound, GPS, inertial odometers and other positioning methods, which have inherent limitations
and are challenging to align with the real-time, high-precision and accuracy requirements of visually
impaired individuals for navigation and obstacle avoidance when traversing unfamiliar environments.
In light of the difficulties, visually impaired individuals face in recognising unfamiliar and complex
environments, it is imperative that these devices are able to determine their position, gait and trajectory
in real-time. This enables them to assist visually impaired people in travelling safely and independently.
Furthermore, the construction of a real-time map of the surrounding environment is essential for the
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0192
99
purpose of navigation and assisted obstacle avoidance. Initial solutions to integrate with SLAM were
based on the use of the simplest position sensors, but due to the large size of these devices, they lacked
rationality and relevance.
Vision SLAM employs vision sensors, including monocular, binocular, stereo, and depth cameras,
to obtain environmental data. It exhibits remarkable resilience and has made significant advancements
in the domains of automated vehicle navigation and autonomous mobile robotics. Some SLAM schemes
have reached a level of maturity, including Oriented FAST and Rotated BRIEF Simultaneous
Localization and Mapping (ORB-SLAM), Large-Scale Direct Simultaneous Localization and Mapping
(LSD-SLAM), Semi-Direct Visual Odometry (SVO), and others. Furthermore, the integration of vision
SLAM devices with wearable technology offers enhanced adaptability and notable advantages [5].
3.2. System workflow
The combination of wearable devices and visual SLAM to help visually impaired people avoid obstacles
consists of three parts: the visual SLAM system (the core), wearable devices (as a medium and carrier)
and a user-friendly feedback system, see below Figure 1.
Figure 1. Wearable devices combined with visual assistance for visually impaired people system
workflow diagram.
This obstacle avoidance system for the visually impaired works as follows. The visual SLAM module
uses a camera (typically a monocular or stereo camera) to capture detailed images of the environment.
These images are processed to identify key features and landmarks, which are then used to build a real-
time map. The SLAM algorithm simultaneously tracks the device's position on this map, constantly
updating the user's location [6]. Secondly, the integration of wearable devices ensures that SLAM
modules are embedded in form factors that are comfortable and convenient for the user to wear. Such
devices may include smart glasses, helmets, or other forms of wearable technology that do not impede
the user's typical activities. The design must strike a balance between the necessity for sophisticated
sensing and processing capabilities and the paramount importance of comfort and ease of use. Thirdly,
the obstacle detection and avoidance component employ sophisticated algorithms to identify potential
hazards within the surrounding environment. The algorithms process data from the visual SLAM module
with the objective of detecting obstacles and predicting their movement. Subsequently, the system
generates pertinent feedback to alert the user and assist them in safely navigating around the obstacle.
The incorporation of user feedback mechanisms is of paramount importance for the efficacy of assistive
devices. Vibration, for instance, is a form of haptic feedback that can be employed to provide users with
immediate and direct feedback, thereby alerting them to potential hazards such as nearby obstacles.
Auditory feedback can provide more detailed information, such as the location and distance of an
obstacle. The use of visual displays, such as augmented reality overlays on smart glasses, allows for the
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0192
100
provision of real-time visual cues without the obstruction of the user's ability to observe the surrounding
environment [7].
3.3. System advantages
The field of wearable technology has witnessed a significant advancement in recent years, characterised
by the miniaturisation, optimisation and energy efficiency of sensors and processing units. Wearable
devices, including smart glasses, wristbands and even smart clothing, are now capable of integrating
advanced computing and sensing capabilities. These devices are able to collect and process data about
the user's environment and activities in real time, which makes them an ideal medium for implementing
SLAM-based navigation aids. By integrating SLAM with wearable technology, it is possible to create
assistive devices that provide continuous and real-time feedback about the user's environment, which
enables safe, real-time navigation and localisation.
The primary advantage of the system that employs visual SLAM in conjunction with wearable
devices is its capacity to detect and circumvent obstacles in real-time. The system is not only capable of
recognizing a multitude of obstacles within the surrounding environment but also of prioritizing those
that are in closer proximity. To illustrate, the utilisation of the lightweight Vision YOLOv5(You Only
Look Once version 5) model enables the accurate detection of obstacles within a range of 20 metres,
with the capacity to prioritise them according to their proximity to the user and the potential danger
posed by the obstacle. Furthermore, the system is furnished with an audio feedback mechanism that is
triggered when the system detects an obstacle and alerts the user in a timely manner, thus providing
assistance to the visually impaired in safely avoiding potential collision hazards. The implementation of
this system has the potential to markedly enhance the autonomy and security of visually impaired
individuals in traversing unfamiliar and intricate environments, fostering greater confidence and
tranquillity [7].
The second advantage is a notable enhancement in the mobility and independence of visually
impaired individuals. The combination of visual SLAM technology with wearable device technology
enables the detection of unfamiliar, complex environments in real-time and the generation of detailed
maps of these environments in real-time. This not only provides visually impaired individuals with
accurate and immediate navigation data, but also assists them in identifying potential obstacles and
hazards in their surroundings, thereby enabling them to navigate with greater confidence in a range of
environments. In both familiar and unfamiliar settings, this technology provides navigational assistance
based on the routes planned by visually impaired individuals, avoiding obstacles and enhancing the
safety and efficiency of their actions. The implementation of this technology markedly enhances the
autonomy and dignity of visually impaired individuals, facilitating their participation in social activities
and daily life with greater independence [8].
With regard to the third advantage, the integration of SLAM technology with multiple sensors
markedly enhances a more comprehensive and accurate comprehension of the unfamiliar and complex
environment in which it operates. To illustrate, the utilisation of a camera in conjunction with an
ultrasonic sensor enables the system to more accurately detect and recognise obstacles within the user's
environment. This fusion exploits the distinctive capabilities of the diverse sensors, enabling the system
to adapt to a broader spectrum of environments and to perform optimally across a diverse range of
settings. The camera captures detailed image information about the obstacle, while the ultrasonic sensor
provides accurate distance measurements. The fusion of multiple sensors not only enhances the
reliability and accuracy of the system but also increases data redundancy. The presence of data
redundancy can effectively mitigate uncertainty and risk in the detection process. To illustrate, even in
the event of a single sensor malfunction, the system is still capable of maintaining high-precision
obstacle detection and environmental sensing through the utilisation of data from alternative sensors.
This obviates the potential for the user to be placed in a situation of total equipment paralysis. The
combination of these multiple sources of information enables the system to provide more reliable and
accurate navigation guidance, thus assisting the visually impaired in avoiding obstacles in a safer manner
and enhancing their autonomy of movement. The benefit of this technology is that it not only furnishes
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0192
101
detailed environmental data in real-time, but also enhances the stability and reliability of the system
through data fusion and redundancy mechanisms. This enables visually impaired individuals to receive
reliable assistance and navigation support in a variety of complex environments [9].
3.4. System limitations and possible improvements
Although SLAM technology is capable of providing real-time environmental awareness, the
computational power necessary to process intricate environmental data can result in delays that impact
the real-time performance of the system. To illustrate, in a complex or dynamic environment, the system
is required to process a substantial quantity of sensor data, which encompasses images, depth
information, and inputs from additional sensors. The fusion and processing of this information
necessitates the utilisation of robust computational resources. In the event that the processing velocity
is unable to align with the rate of environmental alteration, the system's response time is prolonged,
which subsequently impacts the real-time performance and user experience [7].
In particular, some systems may require longer response times to detect and prioritise obstacles. This
is because the system must not only detect all potential obstacles, but also determine which obstacles
pose the greatest threat to the user, based on factors such as their distance, size and direction of
movement, and provide feedback accordingly [2]. This complex computational process is prone to
latency when running on highly loaded processors, and the real-time performance requirements of these
systems often require a trade-off between computational speed and battery life. High-performance
computing devices can significantly increase power consumption, which in turn shortens the life of the
device and affects its portability and usefulness [7].
Visual SLAM techniques may not perform well in certain complex or dynamically changing
environments, such as crowded or dramatically changing lighting environments, where the accuracy and
reliability of sensor data can be affected and fluctuate. In these environments, sensors may receive large
amounts of noisy data or incomplete information, resulting in increased errors in map construction and
localisation. For example, in crowded places, fast-moving people and other dynamic objects can
interfere with the sensor's data collection, making it difficult for the visual SLAM system to accurately
locate and map the environment. Scenes with drastic changes in lighting, such as moving from bright
outdoor areas to dimly lit indoor areas, can also affect the performance of cameras and other optical
sensors, leading to inaccuracies, omissions and loss of data [6].
While the integration of SLAM technology with wearable devices offers significant advantages, there
are potential challenges associated with user adaptation. For visually impaired individuals, specialized
training may be necessary to fully leverage the capabilities of these technologies. As the systems are
complex to operate, users must invest time and receive instruction to master their use. Furthermore,
these devices necessitate periodic updating and maintenance to ensure optimal performance.
Consequently, these learning and adaptation processes may impede the extensive adoption of these
systems by visually impaired users [9].
4. System optimisation and development trends
The future includes the development of more efficient algorithms and the use of more advanced
hardware such as dedicated processing units (e.g. GPU) and low-power processors. These improvements
can increase the computational speed of the system and reduce latency, thereby improving the
performance and reliability of the system in a real-time environment [2,7].
Multi-sensor fusion methods, such as the combined use of cameras, Light Detection and Ranging
(LIDAR) and ultrasonic sensors, should be used to improve the accuracy and reliability of the data. This
can compensate for the shortcomings of a single sensor in different environmental conditions. For
example, LiDAR can provide highly accurate distance measurements and work well in low-light
conditions, while ultrasonic sensors are good at detecting nearby obstacles. It is also important to
develop smarter algorithms - algorithms that are better able to dynamically adapt and correct sensor data
to more complex environmental changes. For example, using deep learning techniques to process sensor
data can significantly improve a system's performance in complex environments, enabling it to detect
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0192
102
and filter noisy data more effectively and improve the reliability and accuracy of environmental sensing
and obstacle detection [7, 10].
5. Conclusion
This paper examines the problems of combining a visual SLAM system with wearable devices to assist
visually impaired people to avoid obstacles in unfamiliar, unknown and complex environments, and to
assist in the correction and elimination of potential hazards based on their autonomous path planning
combined with the analysis of data detected by the system. In order to be able to contribute to the design
of a visually impaired travel and avoidance navigation system based on the combination of visual SLAM
and wearable devices, the feasibility of the system combination is examined, the workflow of the system
is simulated, its benefits in helping visually impaired people to travel and avoid obstacles are examined.
Also, its limitations in terms of algorithms and sensors in the face of complex and changing
environments are presented, and future development trends and corresponding areas of optimization are
proposed.
References
[1] Bai J, Liu Z, Lin Y, Li Y, Lian S, Liu D. Wearable Travel Aid for Environment Perception and
Navigation of Visually Impaired People. Electronics. 2019; 8(6):697. https://doi.org/10.3390/
electronics8060697
[2] Chen Z, Liu X, Kojima M, Huang Q, Arai T. A Wearable Navigation Device for Visually
Impaired People Based on the Real-Time Semantic Visual SLAM System. Sensors. 2021;
21(4):1536. https://doi.org/10.3390/s21041536
[3] Z. Wang, H. Liao, Z. Jia and J. Wu, "Semantic Mapping Based on Visual SLAM with Object
Model Replacement Visualization for Cleaning Robot, " 2022 IEEE International Conference
on Robotics and Biomimetics (ROBIO), Jinghong, China, 2022, pp. 569-575, doi: 10.1109/
ROBIO55434.2022.10011717.
[4] Francisco J. Perez-Grau, J. Ramiro Martinez-de Dios, Julio L. Paneque, J. Joaquin Acevedo,
Arturo Torres-González, Antidio Viguria, Juan R. Astorga, Anibal Ollero, Introducing
autonomous aerial robots in industrial manufacturing, Journal of Manufacturing Systems,
Volume 60, 2021, Pages 312-324, ISSN 0278-6125, https://doi.org/10.1016/j.jmsy.2021.06.
008.
[5] Xu, P., Van Schyndel, R., & Song, A. (2023, June). Smart Head-Mount Obstacle Avoidance
Wearable for the Vision Impaired. In International Conference on Computational Science (pp.
417-432). Cham: Springer Nature Switzerland.
[6] Ou, W., Zhang, J., Peng, K., Yang, K., Jaworek, G., Müller, K., & Stiefelhagen, R. (2022, July).
Indoor navigation assistance for visually impaired people via dynamic SLAM and panoptic
segmentation with an RGB-D sensor. In International Conference on Computers Helping
People with Special Needs (pp. 160-168). Cham: Springer International Publishing.
[7] Asiedu Asante, B. K., & Imamura, H. (2023). Towards robust obstacle avoidance for the visually
impaired person using stereo cameras. Technologies, 11(6), 168.
[8] Rahman M, Khadem M, Siddiquee MM, et al. SLAM for Visually Impaired People: a Survey.
arXiv. Published online December 9, 2022. Available at: https://arxiv.org/abs/2212.04745.
Accessed July 27, 2024.
[9] Joseph AM, Kian A, Begg R. State-of-the-Art Review on Wearable Obstacle Detection Systems
Developed for Assistive Technologies and Footwear. Sensors. 2023; 23(5):2802. https://doi.
org/10.3390/s23052802\
[10] Zhang Z, Lin F, Wu T. Multi-Sensor Fusion for SLAM in Dynamic Environments: A Survey.
Sensors. 2022;22(4):1356. doi:10.3390/s22041356.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0192
103
The research on the factors affecting the World Happiness
index
Yizhi Zong
Cardiff Sixth Form College, Cardiff, CF24 0AA, United Kingdom
katherine.zong@ccoex.com
Abstract. The purpose of this study was to use data from the World Happiness Report to conduct
an in-depth analysis of the factors that influence the World Happiness Index (WHI) and to look
for other factors that may influence happiness. In this paper, the correlation graph shows that
among the original six variables, Social Support has the strongest correlation with the ladder
score, which means that Social Support has the greatest impact on the happiness index, followed
by GDP per capita; Generosity, on the other hand, was the weakest associated with ladder scores
and had the least effect on happiness. Then the linear regression and scatter plot are used to prove
this conclusion. Therefore, this paper can consider whether to delete Generosity as an influential
factor. A map was then used to show happiness levels and geographical distribution in different
countries. At the same time, the distribution of the Gini coefficient is also shown by a regional
distribution map. From the perspective of the Gini coefficient and education level, these two
factors also have a certain positive correlation with the happiness index, which are likely to be
the potential influencing factors of the happiness index. From a global map perspective, the
happiest countries are mostly located in Europe, North America and Oceania, while the happiest
countries are mostly located in Africa. The study's sample size was not large enough, it was
prone to make errors and raise questions about happiness scores in some countries.
Keywords: World happiness, happiness index, positive correlation, GDP per capita, social
support.
1. Introduction
In the rapidly changing world, understanding the factors that influence happiness has become an
increasingly important and complex research topic. As for happiness itself, it is a pluralistic and
relatively subjective concept, with different interpretations from different disciplines or different
theories. Happiness presupposes an evaluative stance concerning one period of one's life or one's own
life as a whole [1]. Since the 1960s, many philosophers, thinkers and scientists have carried out relevant
research on happiness, and people's understanding of happiness has become deeper and deeper.
The World Happiness Report is a publication that contains reports and rankings on the happiness of
countries and the correlation between various factors. The first World Happiness Report was published
in 2012, and Finland has been named the happiest country in the world seven times in a row until 2024.
The report is based on data from the Gallup World Poll, in which people are asked to rate their quality
of life on a scale of 0-10, as well as questions related to their happiness assessment. These scores were
used to produce a happiness index for each country. Gallup also looked at a variety of quality-of-life
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0202
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
104
factors and analyzed how they correlated with happiness [2]. So far, the World Happiness Report has
adopted six variables, namely GDP per capita, Social support, Healthy life expectancy, Freedom to make
life choices, Generosity and Perceptions of corruption. But the World Happiness Index is not calculated
from the data of these six variables, the happiness index of 2024 is calculated by calculating the average
happiness index of 2021-2023, which is very strange and confusing.
According to the World Happiness Report 2024, overall subjective happiness has improved and
increased in countries around the world. However, regional differences in happiness remain large, with
large gaps between developed and developing countries [3]. Since 2006-2010, happiness has declined
in the Middle East and North Africa region and increased in Central and Eastern Europe and East Asia.
In North America and South Asia, happiness declined among young people and increased among older
people. The report also highlights that happiness inequality has increased across all regions, especially
in sub-Saharan Africa. In addition, the report discusses the face challenges of older people, such as
dementia. And methods and measures to improve well-being, such as improving the environment and
behavioural strategies. This year's World Happiness Report has added an analysis of the impact of
climate change, social justice and digitalization on happiness compared to previous years, which means
that there are still many factors affecting happiness to consider in addition to the original six variables.
Meanwhile, according to Gallup, unemployment, one of the most famous statistics, is surprisingly
absent from the World Happiness Index, which is calculated not by calculating the six known variables,
but by averaging happiness over the previous three years [4]. This algorithm is strange because
unexpected events such as the sudden outbreak of COVID-19 in 2019 can lead to a sharp increase in
unemployment, a sharp reduction in economic income, and a decrease in people's self-confidence, which
leads to a sharp reduction in happiness [5]. However, according to the World Happiness Report 2020,
the happiest countries, such as Finland, are even happier in 2022 (7.809) than in 2024 (7.741) [6]. The
two most affected economic powers, the United States and China, also show that they are happier in
2020 than in 2024. This is obviously unreasonable. According to Gallup and the Alliance for Happiness,
Finland is not the happiest country in the world, and the Alliance for Happiness considers more factors
than the World Happiness Report [7, 8]. Therefore, based on the World Happiness Report 2024, this
paper will analyze the influence degree of the original six variables on the happiness index by using data
analysis methods such as line chart, scatter chart and bar chart, and propose new influencing factors by
using scatter chart and map data visualization [9]. Finally, it will evaluate and improve the calculation
method and sample size of the happiness index.
2. Methodology
2.1. Data source
The data set on the World Happiness Index and 6 variables used in this article is mainly from the official
website of the World Happiness Report. The date is from 2024. The data set for the World Happiness
Report 2024 contains 144 data sets covering all countries surveyed. The raw data set is saved in.xls
format. The Gini Coefficient is based on a dataset from the Our World in Data website, which contains
2325 datasets covering countries from 1963-2023 (some data are incomplete). The original data set is
saved in.csv format. Data sets on education level/literacy come from Kaggle and the website "Our World
in Data", the original data set downloaded by Kaggle contains 193 data sets, and the original data set is
saved in.csv format. The mortality dataset is from the World Health Organization website. The original
dataset contains 6269 datasets covering all countries from 1987 to 2024 (some data are incomplete). The
original dataset was last updated on February 21, 2024, and was saved in.xls format [10].
2.2. Variable selection
Based on the original data, this paper will adopt the original 6 variables (GDP per capita, Social support,
Healthy life expectancy, Freedom to make life choices, Generosity and Perceptions of corruption), as
well as variables such as Income Inequality (The Gini Coefficient), Education level and Regions. The
specific descriptions of these variables are shown in Table 1:
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0202
105
Table 1. List of Variables.
Variable
Type of data
Range
GDP per capita
float
[0.0, 2.141]
Social support
float
[0.0, 1.617]
Healthy life expectancy
float
[0.0, 0.857]
Freedom to make life choices
float
[0.0, 0.863]
Generosity
float
[0.0, 0.401]
Perceptions of Corruption
float
[0.0, 0.575]
Gini Coefficient
float
[0.178, 0.658]
Education level
float
[2.207, 12.938]
Region
float
-
2.3. Method introduction
In this paper, scatter chart, bar chart and linear regression model are used to compare the impact of the
original six variables on the happiness index, and finally find the variable with the greatest impact. Then
scatter plot, linear regression and map data visualization were used to analyze the correlation between
other potential factors and happiness index. The general mathematical model for multiple linear
regression is:
󰇛󰇜 0 11 22 1312 (1)
In the above formula: is a constant term, and e is a residual term. In addition, this paper uses map
data to visualize the comprehensive happiness index of various regions. The general formula for R
square value is:
21
 (2)
3. Results and discussion
3.1. Correlation analysis
According to Figure 1, it can be seen that the Social support has the highest correlation coefficient with
Ladder Score, while Generosity has the lowest correlation coefficient with Ladder Score, and it is far
lower than the average correlation coefficient (0.59). The correlation between Log GDP per capita and
Healthy life expectancy and Generosity is low and negative. Log GDP per capita, Social support and
Healthy life expectancy have high correlation. It can be inferred that Generosity has a small effect on
happiness index, and whether it should be removed from the six variables is a question worth considering.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0202
106
Figure 1. Correlation results.
3.1.1. GDP and happiness index. According to Figure 2, the linear fitting formula for scatter data is:
Ladderscore 2586 2135 LogGDPpercapita, and the R-square value is 0.591. There is a high
linear positive correlation between per capita GDP and happiness score, and the higher the GDP, the
higher the happiness level.
Figure 2. Scatter plot of Log GDP per capita and Ladder score.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0202
107
3.1.2. Social support and happiness. As shown in Figure 3, the linear fitting formula for scatter data is:
Ladderscore 2260 2883 SocialSupport, and the R-square value is 0.662. Social support has
the highest linear positive correlation with happiness score. The higher the Social support, the higher
the happiness level.
Figure 3. Scatter plot of Social support and Ladder score.
3.1.3. Generosity and happiness index. As can be seen from Figure 4, the distribution of scatter data of
Generosity is much more dispersed than other factors, with the R-square value of only 0.017. This refers
to a low correlation between Generosity and Ladder Score, so consider replacing Generosity with other
factors.
Figure 4. Scatter plot of Generosity and Ladder score.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0202
108
3.2. The potential factors
3.2.1. Regions and happiness index. As can be seen from Figure 5, the countries with a high overall
happiness index are mostly located in Europe, Oceania and North America. South America and Asia are
in the middle of the pack, while Africa's overall well-being is low. It may be inferred that factors such
as geography and political system also affect happiness.
Figure 5. The World Happiness Index on the global map.
3.2.2. Gini coefficient and happiness index. As Figure 6 shows, inequality is highest in South Africa
and higher in Africa as a whole. At the same time, the inequality coefficient in South America is also
high, and the distribution of happiness index in the world is almost the same as that in Fig.5, which
proves that income inequality is also potentially related to happiness index.
Figure 6. Gini coefficient on the global map in 2019 [7].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0202
109
3.2.3. Education level and happiness index. These instructions apply to everyone, regardless of the
formatter being used. According to Figure 7, the average length of schooling in Africa is four years,
while in Asia and South America it is between eight and 10 years, and the regions with the highest
happiness index (Europe, Oceania, North America) have the longest schooling, averaging between 10
and 12 years. This also confirms that the longer the education, the higher the happiness index, confirming
that the level of education may also be one of the factors affecting the happiness index.
Figure 7. Years of schooling on the global map [8].
3.3. Objectivity and accuracy
According to the World Happiness Report website, the typical annual sample for each country is 1,000
people. If a typical country conducted a survey once a year, the sample size would be 3,000 people.
However, for populous countries such as China, the sample size of 3000 people is obviously far from
enough and has a large error. According to the World Happiness Rankings 2024, China ranks 60th on
the happiness index, but as the world's second largest economy, its happiness ranking results are
questionable.
Some countries' happiness figures look less than reasonable. According to research, Finland has
almost the highest rate of mental disorders in the European Union, with one in six Finns suffering from
mental health problems. Depression, anxiety and substance abuse are the most common mental health
problems [9]. However, as of 2024, Finland has been named the happiest country in the world seven
times. Therefore, Finland may not be the happiest country in the world, but it is the fact that Finns are
more likely to feel satisfied that leads to such a high Ladder Score [10].
4. Conclusion
Based on the World Happiness Report 2024 and related data, this study analyzes the impact of the
original six variables on the happiness index, analyzes the distribution, comparison and summary of the
happiness index in different regions, and finally evaluates the sample and questions the happiness index
of some countries.
In the analysis stage, this paper uses a scatter plot and correlation visualization to find out the degree
of correlation between 6 variables and the happiness index. In contrast, Social Support and GDP per
capita had the greatest impact on happiness, while Generosity had the least association with happiness.
To learn more, the study also visualized the distribution of happiness across different regions using map
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0202
110
data and found that countries with high happiness were more likely to be in Europe, North America, and
Oceania, while countries with low happiness were more likely to be in Africa.
Through these studies, people have gained a deeper understanding of the World Happiness Report.
However, there are still many shortcomings in this study, such as more detailed analysis of variables,
more factors considered, insufficient data and so on. To improve these issues, this paper needs to look
at more reports and data on world happiness and use control variable methods, linear regression, to study
possible relationships between happiness and different factors.
References
[1] Laura M, et al. 2017 Happiness Index Methodology, Journal of Sustainable Social Change, 9, 4-
31.
[2] Helliwell J F, et al. 2024. World Happiness Report 2024. University of Oxford: Wellbeing
Research Centre.
[3] Helliwell J F, et al. 2020 World Happiness Report 2020. New York: Sustainable Development
Solutions Network.
[4] Jon C and Blind S 2022 The Global Rise of Unhappiness and How Leaders Missed It, ISBN.
[5] Musikanski L and Bradbury J 2024 The Happiness Report Card 2024. Happiness Alliance happy
counts.
[6] Laura M, et al. 2017 Happiness Index Methodology. Journal of Sustainable Social Change, 9, 4-
31.
[7] Joe H 2023 Measuring inequality: what is the Gini coefficient. Working paper.
[8] Filmer D P, et al. 2018 Learning-Adjusted Years of Schooling (LAYS): Defining A New Macro
Measure of Education. Journal of Sustainable Social Change.
[9] OECD, European Observatory on Health Systems and Policies 2023. Finland: Country Health
Profile 2023.
[10] Swanson A 2015 How we provoked the wrath of some of the world's most perfect people.
Washington Post, 10.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0202
111
Quantum Entanglement and Qubit Interactions: The Key to
Quantum Supremacy
Han Zhang
Muir College, University of California San Diego, CA, USA
Haz043@ucsd.edu
Abstract. Quantum computing operates in a fundamentally different way from classical
computing by harnessing the principles of quantum mechanics to process information. Quantum
supremacy is achieved when a quantum computer can solve problems that are beyond the
capabilities of classical systems, including the human brain, showcasing its superior processing
power. To attain quantum supremacy, quantum entanglement and qubit interactions play a
pivotal role. Quantum entanglement occurs when qubits are interconnected in a manner where
the state of one qubit directly influences the state of others, enabling the quantum computer to
perform multiple operations simultaneously. Moreover, effective interaction between qubits is
essential for the performance of complex calculations in quantum systems, highlighting the
significance of coherence and error correction. Understanding the importance of coherence in
preventing and rectifying errors in quantum computations is crucial. This paper aims to explore
the critical aspects of quantum entanglement and qubit interactions, which are foundational to
the operation of quantum computers. By delving into these key concepts, the paper aims to
elucidate their significant roles in achieving quantum supremacy. The discussion will center on
how quantum entanglement, which allows enhanced computational parallelism through qubit
interconnection, and efficient qubit interactions vital for complex computations, contribute to
surpassing the capabilities of classical computers. Comprehending these principles is crucial for
advancing quantum computing technology and overcoming the challenges to unleash its full
potential.
Keywords: Quantum Computing, Quantum Supremacy, Quantum Entanglement.
1. Introduction
Quantum information science is dedicated to comprehending and achieving quantum supremacy, the
phenomenon where quantum computers consistently outpace classical ones. This idea, in turn, is based
on the premise that it is impossible, or at least highly inefficient for classical systems to simulate
quantum systems. In the past couple of decades, there has been a surge of interest in solving the
"quantum control problem." Efforts to develop sufficiently large, controllable, macroscopic systems that
exhibit purely quantum behaviours have intensified. But the logic behind these investigations follows
directly from their assumption that achieving quantum supremacy is a worthwhile goal. Indeed, it could
push the frontiers of physics into realms that have yet to explore. This paper investigates the essential
features of quantum entanglement and qubit interactions, which form the basis of quantum computing.
The first topic this paper cover is quantum entanglement. When qubits are connected in such a way that
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0156
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
112
the state of one qubit directly affects the state of the others, comes the entanglement. That's a pretty good
beginning to understanding the principles of quantum mechanics [1,2].
From there, this paper go into the principles of superposition, which with entanglement and
interaction, allows a quantum computer to perform operations in parallel and solve problemsall at an
incredible speed. Systems of quantum computing require a delicate balance [1,2]. The interactions
between qubits must be controlled as precisely as those in any well-coordinated orchestra. Each qubit
must "do its part" without error, and all must "stay coherent" long enough to perform an undesirable
calculationone that even the fastest classical computers can find impractical. And yet this precision is
necessary not just for meaningful calculations, but also for the performance of any calculations at all.
Next, it will examine what qubit interactions mean for the "hardness" of calculations performed in a
quantum system. After that, the paper will discuss various types of qubits, some of which are more
promising than others for producing a precision computing system.
This paper will also underscore the need for clear signalling and "error-free" messaging for quantum
supremacy to be achieved. For any two qubits to maintain their "next door neighbour" relationship, they
must be coherent; that is, they must interact in a controlled fashion over a range of distances and over
an adequate number of time steps, or computational "depth." This is not an easy requirement to meet
and is arguably the most significant barrier to building large-scale quantum computers [3].
2. Fundamentals of Quantum Entanglement
The study of quantum non-locality began in 1935 with some early experiments and theoretical
developments. A significant milestone was the formulation of the Einstein-Podolsky-Rosen (EPR)
paradox. In their 1935 paper, Einstein, Podolsky, and Rosen used entanglement to question the
completeness of quantum mechanics. They thought up a situation in which the measurement of an
entangled particle's position or momentum would instantaneously determine the position or momentum
of another entangled particle, no matter how far apart the two had been separated. This led to their
argument against "spooky action at a distance" and their conclusion that quantum mechanics must be
incomplete. Since then, the EPR paradox has been a huge driver of the study of entangled systems [4,5].
The local hidden variable concept subsequently took a hit from an unexpected quarter, John Bell, in
the 1960s, when he shone a light of insight on it and laid it open for inspection. Bell's theorem expressed
in simple but incisive terms what had long been suspected: if local hidden variables exist, then quantum
mechanics is wrong; it does not correctly describe the world. Indeed, Bell and his followers in the 1970s
and '80s conducted a most kind of mock trial. They pitted local hidden variables against quantum
mechanics itself, a courtroom drama that invariably had the same outcome: the jury of experimenters in
the '70s and '80s declared for quantum mechanics [4,6].
Entangled states are defined in such a way that measuring the position of one particle gives you the
position of the other particle with pinpoint accuracy. Measuring the momentum of one particle gives
you the momentum of the other particle with close to 100% certainty. Quantum non-locality is intimately
connected to entanglement and is one of the key principles used to illustrate the "weirdness" of quantum
behavior. Yet, at the same time, quantum non-locality is closely related to the concept of superposition,
which is probably the most fundamental principle of quantum mechanics. Indeed, a quantum system
exists in a kind of "schizophrenic" state, in which it simultaneously occupies several states or
configurations, until the system is measured. And this principle is crucial for understanding the behavior
of entangled particles [4,5].
The superposition collapses when the property of one particle is measured, defining the other
particle's property instantaneously, no matter how far apart they are. This is not an analogy but a fact:
when particles have correlated properties in a state of superposition, they behave just like waves in a
fountainexactly like waves in a fountainwhen you measure one wave to result in a certain height.
Your measurement instantaneously sets the water in a certain fountain state, defining changes in height
all across the fountain from your "this wave, not that wave" measurement. The elegant equations of
quantum mechanics can tell you not only that certain states are superposed but also the specific "height"
your wavefunction is likely to take when you make that measurement [4,5].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0156
113
An electron might have two distinct velocities or exist in two separate locations at the same time.
This is what quantum mechanics tells us, and it is what quantum correlations are all aboutcorrelations
that are stronger than the strongest classical correlations and that can be produced only if it’s involved
within the entangled states. In classical mechanics, if there are two correlated particles, it can be
explained their correlation by saying that they have a shared history or direct interaction. But in quantum
mechanics, especially in a world of entangled states, it is no longer valid to persist with the idea that
superposition states are either "real" or "not real." Correlations between particles in an entangled state
cannot be understood from our classical intuition of the physical world. The particles do not possess
definite states until the state is measured. When the state of one of the entangled particles is measured,
the state of the other particleregardless of the distance between the twois instantaneously
determined. These experiments have shown us that "spooky action at a distance" is a real phenomenon.
How this can happen is one of the great unsolved mysteries of physics. For practical applications,
understanding this phenomenon is crucial to the development of new technologies based on quantum
mechanics. Matthew Hayward discusses in his paper "Quantum Computing and Shor's Algorithm" the
propitious part that entanglement plays in quantum computing. He elucidates that entanglement is a
crucial ingredient in the resource-based recipe for not just various quantum algorithms but also error
correction methods. These are quantum computing's "baking a cake" moments, and they're well beyond
the fundamentals of quantum physics. Yet even in the early 1990s, Hayward notes, it was evident that
quantum computers could do certain things much better than classical computers. Thus, the potential of
quantum computing became clear [6,7].
In 1994, a researcher named Peter Shor, who worked at Bell Labs, brought forth an astounding
development. He introduced a polynomial-time algorithm for factoring large numbers. The key element
here is "polynomial-time." A classical computer would take what may as well be an eternity to factor
the numbers Shor worked with. Conversely, a quantum computer can use "superposition," the basic
principle of quantum mechanics that allows a particle (an electron, say) to exist in multiple states
simultaneously, to more efficiently arrive at the answer of "undoubtedly this, or surely that." Once a
quantity can be factored, the computer employing Shor's algorithm can use the factor or factors to
reconstruct the original problem's solution. number... (with a representation of  bits) operates in
󰇛󰇛 󰇜
 
󰇜, which is exponential time. In contrast, Shor's algorithm runs in
󰇛󰇛 󰇜  󰇜on a quantum computer and requires an additional 󰇛 󰇜 steps of post-
processing on a classical computer. In summary, the algorithm operates in polynomial time, which is a
significant advancement. Shor's algorithm has thus renewed and reinvigorated interest in quantum
computing, considering that it could upend not only encryption but also a whole range of computational
problems that require a similar sort of number-crunching prowess (they're mostly about multiplying and
dividing large numbers) and that working in classical computing amounts to using a finite number of
bits of memory and a finite number of steps to do that work. ... Two numbers are coprime if their greatest
common divisor amounts to 1. A classical computer can calculate many such values only in a snail's
pace because it cannot do them in parallel the way a quantum computer can, using its conservation of a
"somewhat limitless" set of states to "achieve" the same series of operations in fewer steps [8].
Since F(a) is periodic, it has a period r, and 0mod n = 1 (because 0=1), and thus mod n = 1,
2mod n = 1, and so forth. Given this periodicity and through algebraic manipulation:
1 mod n, (1)
󰇛2󰇜2 1 mod n (2)
If r is an even number:
󰇛21󰇜󰇛21󰇜 0 mod n. (3)
The product 󰇛 󰇜󰇛 󰇜 is an integer multiple of n. So long 1, at least one of :
󰇛 󰇜or 󰇛 󰇜 shares a nontrivial factor with n. Shor's algorithm does this cleverly, using a
quantum memory register that has two parts. It first places a superposition of integers 0 to q-1 in the left
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0156
114
side of the register, where q is a power of 2 such that q <  (this is necessary so that it is working
in an appropriate finite field). The 0s and 1s in the left register correspond to the a's in the function
mod n. The right side of the register is set up to hold the result of whatever function is calculated from
the a's in the left side of the register [8].
The algorithm proceeds to calculate mod n and keep that in the second part of the register. The
number n is represented by a log n bit string. mod n has to be calculated an exponentially large
number of times relative to the length of the input but polynomial in n. After that, the second register
and k collapses out of it into a specific value can be measured. This measurement also projects the first
register into a state consistent with congruences. After measurement, the second register holds k, and
the first one has a superposition of base states that evaluate in such a way that they give k when taken
mod n [8].
Thanks to the periodic nature of mod n, the first part of the quantum register holds probability
amplitudes for the numbers c, c+r, c+2r, and so on, where c is the smallest integer such that mod n =
k. The next step is to apply the Fourier transform to the first part of the register. The Fourier transform
amplifies the probability amplitudes for integer multiples of q/r, where q is the size of the first part of
the register. When the first part of the register is measured after the Fourier transform, it will likely yield
a multiple of the inverse period. A classical computer then decodes the instruction held in the quantum
memory to yield the period, and from that, the factors of n [8].
3. Qubit Interactions and Their importances
In quantum computation, qubit interaction is fundamental to information processing. This is best
understood by comparing qubit interaction to the two basic ways human beings can communicate. Direct
interaction is like two people talking face-to-face, influencing one another directly through means like
the electric or magnetic fields one person generates around himself or herself. Indirect interaction is like
two people talking with a friend in between. The friend can tell either of the other two what the other
has said without any direct influence from one of the talkers to the other. In indirect interaction, two
qubits influence one another without direct physical contact. And these are the two basic ways qubits
can interact in what named "quantum circuits.[9-11]"
Superconducting qubits have long been viewed as the leading platform for large-scale quantum
computing due to their favorable coherence properties, ease of fabrication, and potential for integration
within a highly scalable architecture. They are at heart Josephson junctions, which are non-linear
inductors, and rely on the quantization of magnetic flux in superconducting loops. Despite these
advantages, the primary challenge to building a fault-tolerant quantum computer with them has unfolded
as their operational speedwhat physicists refer to as the reset timehas not kept pace with the ability
to implement error correction, a strictly required feature of any large-scale quantum processor.
Individual ions that are confined and manipulated by electromagnetic fields serve as qubits, the basic
units of information in a quantum computer. In terms of operations and coherencethat is, the ability
of the system to maintain a superposition of quantum statestrapped ions come very close to perfection,
and Häffner's group is working to make them the most robust, error-correctable system of qubits. This
promise has led to a sharp increase in the number of research groups working with trapped ions.
According to Häffner's journal, "Quantum Computing with Trapped Ions," about six groups were doing
this work in 2000, and over 25 by 2008 [9-11].
In 1995, Cirac and Zoller suggested using groups of ions as the basis of a quantum computer. They
introduced what one might call the "user manual" for establishing a quantum logic gate using an
elementary operation similar to one performed in an ordinary logic gate: a conditional phase shift. This
is like saying "if…then…" in a digital operation. Their ideas were notional, involving the motion of ions
in potential energy wells created by carefully managed electromagnetic fields. But these ideas were put
to work in what is essentially an industrial-strength laboratoryone belonging to David Wineland and
his group at the National Institute of Standards and Technology. They were able to demonstrate a couple
of elementary two-qubit operations and entangle up to four ions. The development of ion trap
technologies has been greatly assisted by targeted research projects in Europe. These microfabricated
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0156
115
devices are becoming more and more sophisticated and could soon surpass the state-of-the-art
superconducting qubit technologies. Meanwhile, quantum dotstiny semiconductor particles that
confine electronsalso hold great promise. The early 1980s saw the advent of a new form of lithography
that allowed scientists to create structures that confine electrons in very small spaces. These structures
are so small that they exhibit "quantum" behaviour. Quantum dots are tiny in size but are robust enough
in their electronic behaviour to serve as the basic building blocks of several proposed quantum
computing architectures [9-11].
Texas Instruments created the first quantum dots, 250 nm in size, using lithography. AT&T Bell Labs
and Bell Communications Research later produced even smaller dots, 30-45 nm in diameter. Because
the confined electrons in these dots behave similarly to those in atoms, scientists refer to them as
"artificial atoms." In quantum dots, scientists have precise control over shape, size, and number of
confined electrons, making these nanostructures highly valuable for studying complicated physical
phenomena and for observing quantum effects in crystals. Researchers are particularly interested in the
optical and electrical properties of quantum dots. Fundamental research and technological advances
stand to benefit greatly from the use of quantum dots. Reed and his colleagues developed the original
method for producing them. This technique involves using a structure whose essential component is a
two-dimensional electron gas. The process starts with a sample that has one or more quantum wells. A
polymer mask covers the sample's surface, followed by partial exposure to an electron or ion beam; the
beam does not use light because high resolution is required. The exposed areas of the polymer do not
change much; they remain mostly unchanged except for the "magic" of going from the mask to the
sample. These areas receive a metal deposition. When the sample is done being worked on, the
remaining mask is removed, and voilà! The sample has only the metal layer in certain areas, on the
surface, and it is clean [9-11].
Areas unprotected by the metal mask are removed using chemical etching, which undercuts the last
quantum well and the buffer layer. The pillars left behind are ten to 100 nanometres in diameter, and
contain the fragments of a quantum well. A base of chromium-doped GaAs beneath the last quantum
well feeds in carriers. The carriers flow into the twenty GaAs quantum wells above. The etching creates
a structure in which the flow of carriers is well controlled, with a remaining gold mask acting as an
electrode. By applying a voltage between the mask and the base, one can control the number of carriers
in the structure.
Finally, the use of photons as qubits in photonics makes them very effective for quantum
communication. This is because photons exhibit very little interaction with their environment; in other
words, they are very "quiet" in that the states they occupy do not change much when they are subjected
to various environmental conditions. For this reason, photons are able to maintain their "quantum-ness"
for a long time and travel long distances with minimal decoherence and, hence, no significant drop in
signal strength. This characteristic makes using photons and fiber optics very attractive for constructing
quantum networks, because the use of signals in the form of lightthat is, in the form of photonswill
make these signals very secure and very hard to eavesdrop on.
In addition, photonic networks can be readily joined with current fiber-optic infrastructures, enabling
the actual deployment of quantum networks. Optical technologies that people have in hand, and know
well how to use, can efficiently create, manipulate, and detect the quantum carriers of the information
photons. These are "light" networks in a very real sense; the quantum states of the photons are used to
encode and process the information. And the properties of the photons themselves allow us to think of
new ways to encode that information, which supports the development of protocols for much more
complex, and much more powerful, quantum networks.
To conclude, employing photons as qubits in photonic systems builds a strong and scalable basis for
quantum communication and offers a potential future for sending information that is both secure and
efficient.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0156
116
4. Achieving Quantum Supremacy
The key moment when a quantum computer surpasses a classical computer in terms of raw capability
occurs when quantum phenomena like entanglement, superposition, and interference are used to create
speed, capacity, and error-protected pathways that enable solving extremely difficult problems. These
problem-solving pathways are evident in what is termed quantum supremacy the moment when a
quantum computer can solve a computationally hard problem in a significantly shorter time and with far
fewer steps than a classical computer can. For instance, Google's Sycamore performed a hard
computation on a "quantum volume" of 64, with an error rate well within tolerances, in just 200 seconds.
When you consider that the same task would take an estimated 10,000 years on a classical supercomputer,
you gain a sense of what might be referred to as a quantum moment. Decoherence is one of the primary
barriers to achieving quantum supremacy [12,13].
Using the two-slit experiment, scientists can demonstrate interference from the type of systems that
might potentially achieve quantum supremacy. However, in attempting to create more complex systems
that can perform computations, scientists must be wary of decoherence impeding their efforts. The two-
slit experiment also serves as a useful metaphor for considering how much advancement we have made
toward true quantum computers. In the two-slit experiment, a beam of particles is detected at a second
screen after passing through the first screen with two slits. Probability-wise, using classical physics, one
might expect a distribution of particles on the second screen that resembles the two-slit setup itself. The
duration for which a quantum sensor can maintain coherence determines its sensitivity. The nature of
decoherence and its impact on the operation of quantum devices runs parallel to the processes that
ordinary sensors go through. When you make something capable of sensing a specific measurement in
an ordinary way, you have to work really hard to find and fix the errors that the device makes as it goes
through its ordinary life. The same goes for quantum devices, except that you must find and fix errors
that occur in a parallel universe before they can affect the practical existence of ordinary objects [12,13].
Quantum devices are currently built in two ways that I know of: in an environment where
decoherence doesn't happen, and in a space where we're transitioning between bits. When sensing using
a qubit, you're working under an umbrella that keeps noise from the outside world from affecting the
qubit. The more you listen using any of the strategies above, the more you cancel out the noise from
inside the generally noisy quantum circuit and the outside world.
5. Conclusion
Achieving quantum supremacy is not just a matter of stuffing a bunch of superconducting qubits at a
frigid temperature in the right place and hoping for the best. You have to make those qubits interact in
a very particular way. Entanglement is key. While a classical computer might perform a calculation in
two steps, a quantum computer could do the same task in parallel and in half the time, with the qubits
or entangled pairs of qubits, sort of half-seeing each other and swapping states. That's the working theory,
anyway. Google's Sycamore processor purportedly achieved this effect with 53 superconducting qubits
last year, in a landmark demonstration of quantum supremacy. The processor performed 200 seconds of
quantum entangled time on a problem that a classical supercomputer would take 10,000 years to solve.
The challenges of realizing the full potential of quantum computing are the sorts of problems that
physics departments live to solve. They are hard, and they require imaginative solutions. Imagine, for
instance, trying to ensure that the basic unit of quantum computing, the qubit, remains in the fragile
quantum state needed to perform a long series of calculations. If it can hardly be done with the five or
six atoms that some groups have used to represent a single qubit, how is it going to be done with the
tens or hundreds of qubits needed for any useful computation?
There are many different strategies to create and manipulate qubits, with each offering distinct
advantages and facing its own unique challenges. For example, superconducting qubits are known for
their rapid operation speed, whereas qubits made from trapped ions afford much greater precision and
much longer coherence times. On the other hand, quantum dots have the potential for extremely high
integration densities, and qubits based on photons might be the best bet for building a quantum
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0156
117
communicator, given how well they interact with one another and how poorly they interact with their
environment.
To conclude, the quest for quantum supremacy is both impressive and intimidating. It has seen some
great successes in the lab but has also faced some serious challenges. At the center of this work is the
use of a basic element of quantum mechanicsquantum entanglement. The interplay of entanglement
and qubit interactions is at the heart of the supremacy argument, and working with these systems is at
the heart of the development of a useful quantum computer.
References
[1] Preskill J 2012 Quantum computing and the entanglement frontier 25th Solvay Conf.
[2] Harrow A W and Montanaro A 2017 Quantum computational supremacy Nature 549 203
[3] Achieving Quantum Supremacy 2019 The Current, news.ucsb.edu/2019/019682/achieving-
quantum-supremacy (Accessed 1 July 2024)
[4] Methot A A and Scarani V 2007 An anomaly of non-locality Quantum Information and
Computation 7 12
[5] Einstein B, Podolsky N and Rosen N 1935 Can quantum-mechanical description of physical
reality be complete? Physical Review 47 77780
[6] Bell J S 1964 On the Einstein-Podolsky-Rosen paradox Physics 1 195200
[7] What Is Superposition and Why Is It Important? Caltech Science Exchange, scienceexchange.
caltech.edu/topics/quantum-science-explained/quantum-superposition (Accessed 2 July 2024)
[8] Hayward M 2008 Quantum computing and Shor’s algorithm Sydney: Macquarie University
Mathematics Department 1
[9] Devoret M H, Wallraff A and Martinis J M 2004 Superconducting qubits: A short review arXiv
preprint cond-mat/0411174
[10] Haffner H, Roos C F and Blatt R 2008 Quantum computing with trapped ions Phys. Rep. 469
155203
[11] Jacak L, Hawrylak P and Wojs A 1998 Quantum Dots Springer
[12] Bacciagaluppi G 2020 The Role of Decoherence in Quantum Mechanics The Stanford
Encyclopedia of Philosophy (Fall 2020 Edition) Edward N Zalta (ed.) https://plato.stanford.
edu/archives/fall2020/entries/qm-decoherence/
[13] Salhov A, Cao Q, Cai J, Retzker A, Jelezko F and Genov G 2024 Protecting Quantum Information
via Destructive Interference of Correlated Noise Phys. Rev. Lett. 132 223601
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0156
118
Quantum Neural Networks: A New Frontier
Boyu Zhang
The Hong Kong Polytechnic University, Hong Kong, China
314chirs271@gmail.com
Abstract. In recent years, there has been remarkable progress in improving the availability of
resources and refining algorithms for quantum computing. Since the late 1980s, the scientific
community has been fascinated by the idea of harnessing quantum phenomena to tackle
computational problems. This article provides a comprehensive exploration of the foundational
theories and practical applications of quantum neural networks (QNNs), highlighting their
potential to transform machine learning through unique features like quantum parallelism and
entanglement. It delves into various QNN architectures, such as quantum circuits and hybrid
quantum-classical models, showcasing their effectiveness in handling intricate computational
tasks more efficiently than traditional neural networks. Furthermore, the article examines the
current challenges and future prospects in this rapidly advancing field, emphasizing the pivotal
role of QNNs in driving forward research in both quantum computing and artificial intelligence.
Quantum neural networks are poised to not only enhance computational capabilities but also
pave the way for groundbreaking innovations in diverse technological domains.
Keywords: Quantum Neural Networks, Machine Learning, Quantum Computing.
1. Introduction
Quantum Neural Networks (QNNs) are a new paradigm in machine learning due to the convergence of
quantum computing and neural networks. Image recognition, natural language processing, and game
play are some of the domains where traditional neural networks have achieved remarkable success.
However, they are limited by the inherent constraints of classical computation, particularly in handling
exponentially large data spaces and complex optimization problems.
Quantum computing, which has superposition, entanglement, and quantum parallelism, provides a
viable alternative to these limitations. By leveraging quantum mechanics, QNNs have the potential to
perform computations that are infeasible for classical systems, enabling significant advancements in
speed and efficiency.
The objective of this paper is to give a complete overview of QNNs, beginning with their theoretical
underpinnings and extending to practical implementations. We will explore different QNN architectures,
including fully quantum and hybrid quantum-classical models, and examine their performance on
various machine learning tasks. Additionally, we will address the current challenges in the field, such
as error correction, decoherence, and scalability, and propose potential future research directions.
By bridging the gap between quantum computing and artificial intelligence, QNNs represent a
transformative step towards the next generation of intelligent systems. This paper seeks to highlight their
importance and potential impact, providing a roadmap for researchers and practitioners in both fields.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0157
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
119
2. Conceptions
2.1. Quantum Computing
Quantum computing is heavily reliant on the quantum bit, or qubit, which is the fundamental unit of
quantum information. Classical bits can only be one of the 0 or 1 states. However, in quantum computing,
information can be recorded as |0, |1, or quantum states which use them as base vectors. Two-
dimensional complex Hilbert Spaces can be used to represent qubits.
In classical computing, a bit occupies a single state at any moment. Conversely, in quantum
computing, a qubit can simultaneously exist in state 0, 1, or any linear combination of them. When
measured, this superposition collapses, with the final state determined by the probability distribution of
qubit states. Quantum superposition thus allows qubit to be in multiple states at once until measurement.
Figure 1. The Bloch Sphere Representation of a Qubit State [1]
As Figure 1 shows, this is visually represented on the Bloch sphere, where a qubit's state is depicted
as a point on the surface of the sphere. The position of this point is determined by the angles θ and φ,
which correspond to the probabilities of the qubit being in a particular state. The Bloch sphere
representation is particularly useful for understanding quantum operations and the effects of quantum
gates on qubits, as it provides a clear geometric interpretation of these complex quantum phenomena.
When two or more particles are linked, quantum entanglement is a phenomenon where one qubit's
state is dependent on the state of the other qubit. All the other qubits in an entangled system are affected
if one qubit's state changes.
Quantum gates form the foundational components of quantum circuits. These gates modify the states
of qubits and are usually depicted by unitary matrices. Owing to quantum mechanical principles like
superposition and entanglement, quantum gates can execute intricate operations.
Figure 2 lists some basic quantum gates. The Pauli-X gate (NOT gate), flips the state of a qubit. It
changes |0 to |1 and |1 to |0. The Pauli-Z gate applies a phase flip, which leaves |0 unchanged and
maps |1 to -|1. The Hadamard gate is a quantum gate that transforms a qubit into an equal
superposition of its basis states, creating a state where the qubit has an equal probability of being
measured as 0 or 1. The T Gate (π/8 Gate) applies a phase shift of /4, it leaves |0 unchanged and maps
|1 to |1. The CNOT gate flips the state of the target qubit if the control qubit is in the |1 state. It
is essential for creating entanglement between qubits. The SWAP gate can swap the states of two qubits.
If the first qubit is in state |a and the second in state |b, after the SWAP gate, the first qubit will be in
state |b and the second in state |a [2].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0157
120
Figure 2. Basic Quantum Gates and Their Matrix Representations [3]
Quantum circuits are composed of sequences of quantum gates. E Each quantum circuit is a quantum
algorithm that can solve complex problems with greater efficiency than classical algorithms. To achieve
desired quantum state transformations, quantum circuits must be constructed by ordering quantum gates
in a specifically.
The design of quantum circuits requires careful consideration of the order and type of gates used, as
each gate affects the qubits in a unique way. For instance, a phase shift introduced by a Z gate can alter
the phase relationship between qubit states, which is crucial for certain quantum computations like
quantum Fourier transforms. Furthermore, error correction protocols often incorporate additional gates
and ancillary qubits to protect against decoherence and other quantum noise, ensuring the reliability of
the circuit.
2.2. Classical Neural Networks
Classical neural networks (NNs) are the cornerstone of modern artificial intelligence and machine
learning. Neurons are the basic units of neural networks. Each neuron receives input, processes input,
and produces output. The basic structure of neurons includes input layer, weight, bias, activation
function and output. The weight determines the strength of the connections between neurons, while the
bias is used to adjust the weighted sum of the output and input.
The typical structure of a neural network consists of multiple layers: an input layer, one or more
hidden layers, and an output layer. The input layer receives and processes the raw data, the hidden layer
transforms this data through multiple operations, and the output layer provides the final output. These
networks can range from direct feedforward structures to more complex configurations such as
convolutional neural networks (CNNS) and recurrent neural networks (RNNS).
In the process of training the neural network, the weights and biases need to be adjusted to minimize
the output error. This is usually achieved by backpropagation. Optimization algorithm is also an
important part of neural network training. Gradient descent is the most commonly used technique by
adjusting the model parameters along the negative gradient direction of the loss function. Variants of
gradient descent, such as Stochastic gradient Descent (SGD), RMSprop, and Adam, offer improvements
in speed of convergence and stability [4].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0157
121
3. Quantum Neural Networks (QNNs)
Combining these two concepts, quantum neural networks (QNNs) represent the frontier of combining
quantum computing and neural networks, aiming to enhance computing power by quantum mechanics
principles.
Quantum neural networks (QNNs) integrate quantum computing principles into neural network
frameworks. Qubits can exist in superposition, representing both 0 and 1 simultaneously, and can also
become entangled with each other, creating intricate associations that classical neural networks are
unable to replicate. These features enable QNNs to perform parallel computing on an unprecedented
scale, providing significant acceleration on certain types of problems.
The fundamental architecture of a QNN is comparable to that of a conventional neural network, but
it employs qubits and quantum gates instead of conventional bits and logic gates. A typical QNN consists
of quantum neurons that process information by a unitary transformation that preserves the probability
amplitude. Common quantum gates in QNN include Hadamard gates, CNOT gates, and Pauli-X gates,
which are used to manipulate qubits to perform necessary calculations in the network.
Quantum neurons can represent and process information in ways that classical neurons cannot. For
example, the principle of quantum parallelism makes quantum neuron be able to process multiple input
states concurrently. The architecture of QNN can vary, but common models include quantum
feedforward neural networks and quantum convolutional neural networks [5].
Mathematically, if you consider a quantum neuron, it can be expressed as |ψout=U|ψin, where |ψin
is the input quantum state, |ψout is the output quantum state, and U is a unitary operator acting on the
input state.
Quantum states in QNNs allow for superposition, enabling parallel processing beyond classical bits.
Quantum gates manipulate these states to perform calculations. Entanglement links qubits, allowing
them to influence each other over long distances, creating highly interconnected networks that solve
complex problems more efficiently.
For instance, the application of a Hadamard gate (H) to a qubit in state creates superposition as:
0 1
2󰇛01󰇜 (1)
This superposition state can then be entangled with another qubit using a CNOT gate, creating an
entangled pair:
󰇛1
2󰇛01󰇜0󰇜 1
2󰇛0011) (2)
The application of quantum states and operations in neural networks offers new opportunities for
solving problems in various domains, from optimization to pattern recognition.
4. Design and implementation of QNNs
Quantum neurons are the basic elements of QNN. They manipulate qubits and do calculations using
quantum gates. The design of quantum neurons involves defining unitary operations that can transform
the input quantum state into the desired output state. Quantum activation functions are similar to
classical activation functions, but need to adapt to the properties of quantum states, usually through
unitary transformations.
For example, a quantum neuron might use a combination of Hadamard and Pauli-X gates to create a
non-linear transformation: U=H
X, (H is Hadamard gate and X is Pauli-X gate). This combination can
create complex transformations necessary for processing quantum information.
The quantum layer of a QNN is composed of multiple quantum neurons. The input qubits are
processed by a set of quantum operations by each layer, converting them into output qubits. Quantum
weights are used to adjust the magnitude and phase of qubits to optimize network performance. Unlike
classical weights, quantum weights need to be managed in a way that preserves the coherence of the
quantum states.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0157
122
Mathematically, a quantum layer can be represented as |ψout=U2U1in, where U1 and U2 are unitary
operators representing the transformations applied by the neurons in the layer.
Training QNN involves optimizing quantum weights to minimize errors in the network output. Due
to the nature of quantum data and operations, this process is much more complex than training classical
neural networks. The quantum training algorithm uses the superposition and entanglement
characteristics of quantum to search the parameter space efficiently and find the optimal solution.
Quantum gradient descent (QGD) is an adaptation of the classical gradient descent algorithm in
quantum systems. The process entails calculating the slope of the quantum loss function with respect to
the quantum weights and iteratively changing those weights to decrease the loss. The challenge is to
efficiently calculate these gradients while maintaining the coherence of quantum systems [6].
In QNN, the loss function L is defined as:
 
󰇛󰇜
Where
is the observable quantity relevant to the task. Gradient descent update rules are as follows:
1
󰇛󰇜 
 (4)
where η is the learning rate and wij are the quantum weights.
Quantum backpropagation is the quantum equivalent of a classical backpropagation algorithm. It
involves backpropagating the error gradient through the network to update the quantum weights. This
process uses quantum gates to calculate the gradient and make the necessary adjustments to the quantum
state [7].
Quantum backpropagation can be formulated using the adjoint of the quantum operations:
1 (5)
where represents the error term at layer i, and is the adjoint (inverse) of the unitary operator .
5. Advantages of QNNs
An immediate advantage of quantum computing is its potential speed. Qubits can exist synchronously
in superpositions of multiple states, so quantum computing can process data in parallel. In contrast,
classical computing requires processing each state sequentially. In QNNs, this parallel processing
capability is used to accelerate the training and reasoning process of neural networks.
According to some studies, quantum computing could theoretically achieve an exponential speed
increase when solving certain optimization problems. For example, the Shor algorithm is several orders
of magnitude faster than the best classical algorithms on prime factorization problems [8]. This means
that in the training of large data sets and complex models, QNNs can significantly reduce computation
time and thus improve efficiency. Results show that the quantum variational optimization algorithm
(VQA) is more efficient than the classical algorithm when dealing with complex optimization problems,
particularly in image segmentation. This efficiency boost is important for deep learning tasks that require
a lot of computing resources.
Another significant advantage of quantum computing is its energy efficiency. The parallel processing
capabilities of quantum computing make it consume much less energy than classical computing for the
same computational task. For example, quantum circuits can perform complex matrix operations with
low energy consumption, which is particularly important in large-scale neural network training. The
high energy efficiency of quantum computing not only helps to reduce energy consumption but also can
significantly reduce computing costs.
High-dimensional data processing is a key challenge in modern machine learning and data science.
When dealing with high-dimensional data, traditional neural networks often face the problem of
dimensional disaster, that is, the computational complexity increases exponentially with the increase of
data dimensions. Quantum computing can process data more efficiently in high-dimensional space.
The superposition property of quantum states allows qubits to represent multiple states
simultaneously, allowing for parallel computation in high-dimensional Spaces. For example, in quantum
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0157
123
states, a system of n qubits can represent 2^n states. This capability allows QNNs to significantly reduce
computation time when working with high-dimensional data.
Another advantage of quantum computing when dealing with high-dimensional data is its ability to
perform dimensionality reduction and feature selection operations efficiently. The optimal feature subset
can be quickly found in the high-dimensional space, thus improving the efficiency and accuracy of data
analysis. For example, techniques such as quantum state projection and quantum Fourier transform
(QFT) are widely used in quantum feature selection [9].
Quantum parallelism allows quantum neural networks to inspect multiple possible solutions
simultaneously, resulting in significantly improved computational efficiency. Quantum parallelism is
achieved through the superposition of qubits, allowing multiple computation paths to occur
simultaneously. This property is particularly important when training and reasoning large neural
networks.
Specifically, quantum parallelism can improve the performance of QNNs at multiple levels. For
example, during training, quantum gradient descent algorithms can compute multiple gradients
simultaneously, thus speeding up the convergence process. In reasoning, quantum parallelism can speed
up the prediction process and improve real-time processing power.
Quantum parallelism also plays an important role in optimization algorithms. Algorithms such as
quantum particle swarm optimization and quantum genetics, for example, greatly improve optimization
efficiency and accuracy by exploring multiple parallel solution Spaces.
6. Applications
Quantum Neural Networks (QNNs) represent a significant advancement in artificial intelligence by
integrating quantum computing principles with classical neural network frameworks. This synthesis
offers potential improvements across various domains, including image recognition, natural language
processing (NLP), financial forecasting, and bioinformatics.
Quantum Convolutional Neural Networks (QCNNs) leverage quantum computing's parallelism to
process image features simultaneously, enhancing accuracy and reducing computational demands.
Study [10] have shown QCNNs' superior performance in tasks like CT scan image classification,
demonstrating higher accuracy than classical CNNs.
In NLP, QNNs utilize quantum superposition and entanglement to manage complex linguistic
relationships, benefiting tasks such as sentiment analysis and machine translation. Research by
Ravikumar et al. [11] has indicated that QNNs improve processing speed and accuracy, especially with
large datasets.
QNNs' capability to handle extensive financial data enables more accurate market trend predictions
and risk management. El Bouchti et al. [12] and E. Paquet et al. [13] highlighted the efficiency of QNNs
in financial forecasting, with notable improvements over classical approaches.
In bioinformatics, QNNs enhance the analysis of biological data, such as genetic sequences. The
study by Tao et al. [14] introduced Quantum Bound, a hybrid neural network that integrates classical
and quantum elements, optimizing the analysis of complex biological datasets.
7. Conclusion
Quantum Neural Networks (QNNs) represent a significant leap in the fusion of quantum computing and
artificial intelligence, offering unparalleled computational capabilities. By leveraging quantum
superposition and entanglement, QNNs can execute complex calculations and data processing tasks
more efficiently than classical neural networks. This integration enhances the accuracy and speed of
large-scale data analysis, making QNNs valuable for applications in finance, healthcare, and other fields.
The development and implementation of QNNs require interdisciplinary collaboration across
quantum physics, computer science, and domain-specific expertise. Educational programs and industry-
academia partnerships are vital for advancing QNN research and ensuring practical application.
Promoting open science and data sharing can further accelerate innovation and prevent redundant efforts.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0157
124
Future directions for QNNs include the development of hybrid quantum-classical systems, improved
quantum hardware, and new quantum algorithms. These efforts aim to maximize performance and
reliability. Additionally, addressing the ethical and societal implications of QNNs, such as data privacy
and job displacement, is crucial for their responsible deployment.
In summary, the potential of QNNs is immense, promising significant advancements in computing
and various application domains. Overcoming technical challenges and fostering interdisciplinary
cooperation are key to realizing their full potential.
References
[1] A. F. Kockum and F. Nori, 2019, Chalmers University of Technology, RIKEN, and University of
Michigan, pp. 703-741.
[2] V. Silva, 2018, Springer Science and Business Media LLC.
[3] D. Copsey, M. Oskin, F. Impens, and T. Metodiev, 2003, IEEE J. Sel. Top. Quantum Electron.,
vol. 9, no. 6, pp. 1552-1569.
[4] Y. LeCun, Y. Bengio, and G. Hinton, 2015, Nature, vol. 521, pp. 436444.
[5] S. K. Jeswal and S. Chakraverty, 2019, Arch. Comput. Methods Eng., vol. 26, no. 4, pp. 877-887.
[6] M. Schuld, I. Sinayskiy, and F. Petruccione, 2014, Quantum Inf. Process., vol. 13, no. 11, pp.
2567-2586.
[7] J. Tian, X. Sun, Y. Du, et al., 2023, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp.
233-246.
[8] F. Arute et al., 2019, Nature, vol. 574, pp. 505-510.
[9] M. C. Caro, H. Y. Huang, M. Cerezo, et al., 2022, Nat. Commun., vol. 13, 4919.
[10] Y. Li, R. Zhou, R. Xu, J. Luo, and W. Hu, 2020, Quantum Sci. Technol., vol. 5, no. 4, p. 044003.
[11] Ravikumar S, Arockia Raj Y, Babu R, Vijay K, and Ramani R, 2024, Procedia Computer Science,
vol. 235, pp. 506519.
[12] A. El Bouchti, Y. Tribis, T. Nahhal, and C. Okar, 2019, J. Inf. Secur. Res., vol. 10, no. 3, pp. 97-
104
[13] E. Paquet and F. Soleymani, 2022, Expert Syst. Appl., vol. 195, p. 116583.
[14] S. Tao, Y. Feng, W. Wang, T. Han, P. E. S. Smith, and J. Jiang, 2024, Artif. Intell. Chem., vol. 2,
no. 1, pp. 45-58.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0157
125
Research on the Correlation between the Movement of the
Dollar and the Price of Gold
Yanxi Zhan
Institute of Problem Solving, Dover Bay High School, ShangHai, 201100, China
yanxizhan@ldy.edu.rs
Abstract. Gold acts as a hedge to protect investors' assets. The U.S. dollar is a global currency
and has an important place in international trade. Oil, on the other hand, is a non-renewable
resource and an important international resource with unstable prices. This study used price data
for the U.S. dollar, gold and oil from 2000 to 2023 to analyze the movement of gold, U.S. dollar
and oil prices. The experiment uses a regression model to determine the effect of the dollar on
the price of oil and gold. The study found a significant negative correlation between the U.S.
dollar and the price of gold and an insignificant negative correlation with oil between 2000 and
2014. Between 2015 and 2023, there is a change in U.S. monetary policy, which leads to a
weakening of the negative price correlation between gold and the U.S. dollar, but U.S. dollar still
with the negative correlation with oil remaining weak. These results suggest that buying gold is
the best way to protect your assets in times of financial crisis or market instability, but the US
dollar and oil can change in price due to macroeconomic and political factors. To some extent
these data provide a reference value for investment in the coming year.
Keywords: Dollar, gold and oil indices, correlation.
1. Introduction
This paper focuses on the dynamic price transmission relationship between the U.S. dollar, the price of
gold and crude oil. Whether recent interest rate fluctuations in the dollar directly affect the market price
of gold, and how to analyze and determine this effect. The experiment looks at the effect of dollar
fluctuations on the price range of gold by using a regression model and analyzes the price characteristics
of the dollar and gold. The study will consider the impact of external factors, such as financial markets
and political changes, on the relationship between the U.S. dollar and the price of gold, to provide
investors with valuable investment advice.
Gold as an investment tool is seen to protect financial assets and is famous for its ability to hedge
against market turbulence [1]. As a global currency, the US dollar is used in most international
transactions and settlements. Studies have shown that the dollar maintains its importance in key areas
of international trade and finance [2]. When the price of the U.S. dollar rises, the price of gold usually
falls, when it is more cost-effective to buy gold in U.S. dollars. However, if the investor is not holding
US dollars (e.g., Chinese yuan, ruble), the price of gold may become relatively expensive due to the
depreciation of the local currency in times of inflation. When gold falls, the risk transmission to pairs of
cryptocurrencies is more significant [3]. Gold and inflation are common long-term trend relationship,
which indirectly indicates that gold can be an effective hedge against inflation risk theory, but also at
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0167
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
126
the same time to establish the status of gold in the world's currencies on the high side [4]. Inflation
occurs when the interest rate on the dollar fluctuates significantly, when the interest rate on the dollar is
inversely proportional to the price of gold. From an investment point of view increases the opportunity
cost of some hidden investments, and the price of some other non-renewable resources will also change
with the dollar interest rate. In the world's perception, the dollar's financial attributes are equivalent to
gold and at the same time linked to the price of oil, which can lead one to wonder if there is a
mathematical relationship between the three.
Gold and non-renewable energy sources have driven the price of oil to show an upward trend under
macroeconomic comparisons, inflation, interest rates and industrial production [5]. Fluctuations in the
exchange rate of the U.S. dollar may make it more difficult for oil-producing countries to sell their
products [6]. There are many other things that affect the relationship between the dollar and gold, such
as some political issues. Many businessmen choose to convert their property into gold to hedge against
risk, but central bank policies may lead to volatility in the price of gold. In times of recession in a country
they use gold to implement some flexible monetary policies [7]. Macroeconomic variables are often
used to observe economic impacts, and in some case, it is possible to detect both short-term and long-
term correlations between gold and the US dollar [8]. This information has important implications for
international economic differentiation, and the impact of economic uncertainty on the dynamics of the
relationship between gold and the U.S. dollar varies from country to country [9].
Numerous researchers have shown that there may be a transmission relationship between the dollar
interest rate and the price of gold, with lower interest rates affecting investor expectations of a
depreciation of the dollar, and investors transferring funds to the gold market for capital preservation or
speculation [10]. This paper uses empirical data to justify this conclusion.
2. Methods
2.1. Data Source
The figure 1 below (2000-2024) includes the price movements of gold, the dollar and oil as a
macroeconomic change. The data is derived from actual historical economic data and is usually used to
study data movements in financial markets with a high degree of accuracy and reliability. The
experiment was in the month of July 2024. 2024 did not end, so 2024 data was not included in this study.
Gold (brown) part is a form of asset protection, a relatively scarce and useful mineral has long been
used as currency and has a high historical status. US Dollar (blue) is the world's common currency, used
for international financial trade, the price and the opposite of gold. When the market is stable, most
people will choose to invest in dollars. Crude oil (green) is a non-renewable rare mineral, the world's
most mainstream and one of the most important products. If the price fluctuates, it will affect the global
economy.
Figure 1. Gold price, USD index and oil price in 2000-2024.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0167
127
2.2. Method Introduction
This research will begin by looking at a large amount of data and images, making assumptions, guessing
and analyzing. Setting up the software's model to analyze the validity of the data using the parameters
and attempting to build a regression model of this data using SPSS. The final data was split into two
regression models because there was a substantial change in data differences on the images. One
segment from 2000 to 2015/second segment from 2016 to 2023. The hypothesis testing method was
utilized to test out the validity of the model parameters, the regression coefficients of the model data
output, and finally to verify the validity and significance of the regression coefficients and finally to
draw conclusions. Meanwhile, the data table finds the mean, standard deviation and median of the prices
of USD, GOLD and OIL to analyze their price characteristics.
3. Results and Discussion
3.1. Descriptive Analysis
Gold's median and mean are relatively close, tightly separated by about $50, with a standard deviation
of 420.34 suggesting more volatile prices. The median and mean of the dollar are hardly that far apart,
but they have a standard deviation of 7.89, suggesting less price fluctuation. Finally, the difference
between the mean and the median of oil is not too big, but the standard deviation is 20.34, which reflects
the impact of the price of oil on the world economy and has a very unstable price (Table 1).
Table 1. Gold, dollar and oil price statistics (2000-2023)
Index
Mean
Standard deviation
Median
Gold
1200.56
420.34
1150.78
U.S. dollar
98.45
7.89
98.20
Oil
65.78
20.34
63.45
3.2. Regression Analysis
This paper first set up model from 2000-2014, table 2 shows that the t-value of the gold and the dollar
is 1.439, corresponding P value is 0.152. It is greater than the significance level 0.05. So, for the ZERO
hypothesis is invalid and the gold price has an obvious effect on the dollar index. The t-value of oil price
on the US dollar index is 0.369, and the corresponding p-value is 0.713, which is greater than the
significance level of 0.05. The ZERO hypothesis is not rejected, and it is considered that the oil price
does not have a significant effect on the US dollar index (Table 2).
Table 2. Regression Analysis of Gold, Oil, and Dollar Prices (2000-2014)
Non-normalized
coefficients
Normalization
factor
t
p
Colinearity
diagnosis
B
SE
Beta
VIF
Tolerance
constant
89.947
5.472
-
16.439
0.000**
-
-
Gold_Price_USD_per_Troy_Ounce
0.005
0.004
0.108
1.439
0.152
1.001
0.999
Oil_Price_USD_per_Barrel
0.014
0.037
0.028
0.369
0.713
1.001
0.999
R2
0.012
adjust R2
0.001
F
F (2,177)=1.092,p=0.338
D-Wvalue
1.915
* p<0.05 ** p<0.01
Calculated by regression analysis during the period from 2000 to 2014, it is known that there is a
significant negative correlation between the dollar and the price of gold. With the coefficients of the
regression analysis, this paper can conclude that there is also a positive increase in time and gold. The
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0167
128
inverse of the price of oil and the dollar is significantly less negative than the price of gold and the dollar.
The data for oil reflects no very large fluctuations and is not a very good predictor of future data.
Table 3. Regression Analysis of Gold, Oil, and Dollar Prices (2015-2023)
Non-normalized
coefficients
Normalization
factor
t
p
Colinearity
diagnosis
B
SE
Beta
VIF
Tolerance
constant
105.404
7.189
-
14.662
0.000**
-
-
Gold_Price_USD_per_
Troy_Ounce
-0.006
0.005
-0.105
-1.142
0.256
1.007
0.993
Oil_Price_USD_per_Barrel
0.003
0.054
0.004
0.048
0.962
1.007
0.993
R2
0.011
adjust R2
-0.006
F
F (2,117)=0.653,p=0.522
D-W value
2.045
* p<0.05 ** p<0.01
The t-value of the gold price against the US Dollar price, which is -1.142. For the corresponding p
value is 0.256. It is bigger than the significance level 0.05, so the ZERO hypothesis is not rejected, and
the gold price is considered to have a less significant effect for the dollar index.
The t-value of oil price on the US dollar index is 0.048 and the corresponding p value is 0.962, which
is bigger than the significance level 0.05, so the ZERO hypothesis is not rejected, and the oil price is
considered to have a minor effect on the US dollar index. From the data the formula can be derived:
𝐷𝑜𝑙𝑙𝑎𝑟 𝑝𝑟𝑖𝑐𝑒 = 105 0.006 × 𝑔𝑜𝑙𝑑 𝑝𝑟𝑖𝑐𝑒 + 0.003 × 𝑜𝑖𝑙 𝑝𝑟𝑖𝑐𝑒 (1)
In the period from 2015 to 2023, the price of the dollar and the price of gold relationship began to
slowly and before the opposite, the opposite nature of the weakening prices began slowly close together.
A large part of this is due to new economic and monetary policy changes or changes in the financial
markets, especially in the United States, which have had a direct effect on the price relationship between
the dollar and gold. For example, interest rates have been raised and cut in recent years (Table 3).
The price of oil is still in an indirect rise with time and the price of the dollar, and new monetary
policies have led to new market changes, with more destabilizing effects of demand or economic factors.
Table 4. Total Prices of Gold, Oil, and Dollar (2000-2023)
Non-normalized
coefficients
Normalization
factor
t
p
Colinearity
diagnosis
B
SE
Beta
VIF
Tolerance
constant
105.404
7.189
-
14.662
0.000**
-
-
Gold_Price_USD_per_
Troy_Ounce
0.001
0.003
0.013
0.217
0.829
1.000
1.000
Oil_Price_USD_per_Barrel
0.007
0.031
0.013
0.219
0.827
1.000
1.000
R 2
0.000
adjustR 2
-0.006
F
F (2,297) =0.049, p=0.953
D-Wvalue
1.963
* p<0.05 ** p<0.01
The t value of the gold price against the US dollar index is 0.217, and the corresponding p value is
0.829, which is bigger than the significance level of 0.05, so the ZERO hypothesis is not rejected, and
it is considered that the negative opposite of the gold price against the US dollar index is weakened
(Table 4). The t value of oil price on the dollar index is 0.219 and the corresponding p value is 0.827,
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0167
129
which is greater than the significance level of 0.05 and does not reject the ZERO hypothesis, which
suggests that the oil price has a certain impact on the dollar index, which needs to be further investigated
according to the economic and financial market changes or address and other reasons.
The experiment leads to the formula:
𝐷𝑜𝑙𝑙𝑎𝑟 𝑝𝑟𝑖𝑐𝑒 = 96.644 + 0.001 × 𝑔𝑜𝑙𝑑 𝑝𝑟𝑖𝑐𝑒 + 0.007 × 𝑜𝑖𝑙 𝑝𝑟𝑖𝑐𝑒 (2)
This experiment is a regression analysis of the prices of the dollar, gold and oil over two different
periods of time, to be able to show investors more intuitively whether there is a correlation between the
prices of the dollar, gold and oil. During the period 2000-2014, the prices of oil and gold showed a
negative correlation with the price of the US dollar. But the gold index is unstable, in the price of low,
in addition to the 2008 financial crisis, the rapid growth in the price of gold shows that there is a good
safe-haven nature. The price of oil also became very high in 2014. In the period 2015-2023, the price of
the dollar began to stabilize with a positive correlation, while the price of gold rose rapidly due to the
very unstable world financial markets caused by COVID-19. The price of oil started to fall after 2015,
possibly due to a decrease in demand and supply in the market or due to some regional political issues
(Table 4).
The data shows that gold, as a high-end way to protect assets, can have good price stability in times
of financial crisis. Oil prices may be affected by more volatile factors, with different supply and demand
balances at different times and different prices in different geographic locations, including political
policies and market expectations in different places. The dollar affects the gold and oil price differently
in different economic environments, and investors can use this data to help inform their investment
decisions.
4. Conclusion
This style regression analysis is used to show the mathematical relationship between the price of the
dollar, the price of gold, and the price of oil from 2000 to 2023. The regression coefficients from the
regression analysis revealed a very significant negative relationship between dollar, gold and oil index
between 2000 to 2014. From 2015 to 2023 the negative correlation between the dollar and the price of
oil and gold begins to weaken due to changes in currency politics. The relationship of price figures
affects the macroeconomic environment and provides important information for investors' future
investment strategies. This data can be used to study the price changes that will take place in the coming
year, and further discuss the macroeconomic impact of these important asset prices to make a
comprehensive analysis.
References
[1] Gold, J.M. (2011) Gold and the US dollar: Hedge or haven? Finance Research Letters, 8(3), 120-
131.
[2] Goldberg, L.S. (2010) Is the international role of the dollar changing? Current Issues in
Economics and Finance, 16(1).
[3] Cao, G. and Ling, M. (2022) Asymmetry and conduction direction of the interdependent structure
between cryptocurrency and US dollar, renminbi, and gold markets. Chaos, Solitons &
Fractals, 155, 111671.
[4] Batten, J.A., Ciner, C. and Lucey, B.M. (2014) On the economic determinants of the gold
inflation relation. Resources Policy, 41, 101-108.
[5] Wang, Y.S. and Chueh, Y.L. (2013) Dynamic transmission effects between the interest rate, the
US dollar, and gold and crude oil prces. Economic Modelling, 30, 792-798.
[6] Wang, Y.S. and Chueh, Y.L. (2013) Dynamic transmission effects between the interest rate, the
US dollar, and gold and crude oil prices. Economic Modelling, 30, 792-798.
[7] Staszczak, D.E. (2020) Global instability of gold prices: view from the state-corporation
hegemonic stability theory.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0167
130
[8] Zhou, Y., Han, L. and Yin, L. (2018) Is the relationship between gold and the US dollar always
negative? The role of macroeconomic uncertainty. Applied Economics, 50(4), 354-370.
[9] Pellejero, S. (2020) Oil prices fall as rising COVID-19 cases prompt demand concerns. Investors
Also Eye Rising Crude.
[10] Kadhem, S. and Thajel, H. (2023) Modelling of crude oil price data using hidden Markov model.
Journal of Risk Finance, 24(2), 269-284.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0167
131
Improvement of visual servo system of industrial robot based
on sliding mode control and deep reinforcement learning
Yunzhe Zhou
Department of Smeal Business, Pennsylvania State University, PA, USA
Yzz5886@psu.edu
Abstract. Visual servo system is more and more widely used in the field of industrial robots
because it allows robots to sense external signals through sensors to convey control commands
to themselves and complete tasks. However, the traditional visual servo has many limitations in
the design of controller and image feature extraction, such as insufficient robustness and image
extraction accuracy. This research focuses on the optimization of controller and image feature
extraction, which can improve the overall performance and autonomy of the system by
combining sliding mode control and Convolutional Neural Network (CNN). Sliding mode
control performs well in terms of robustness and response speed, while CNN has excellent ability
for image feature extraction. The research results show that the combination of the two and the
visual servo has better performance in a variety of application scenarios, so this is also the
development direction that industrial robots can adopt in the future.
Keywords: Visual servo system, sliding mode control, industrial robot.
1. Introduction
With the development of automation and robot technology in the industrial field, industrial robots have
made great breakthroughs in the 21st century. The first industrial robot, designed by Griffith P. Taylor
in 1935, set a fixed program to carry goods, and since then, the technology has developed into a variety
of highly intelligent multi-functional robots [1]. The subsequent Unimation robot, designed by George
and Joseph in 1956, using cash's servo motor and sensor technology, with more than six degrees of
freedom, and with networking technology, capable of remote monitoring and collaborative work, was
the world's first programmable industrial robot, marking the beginning of the era of robot automation
[2]. After the 1980s, multi-axis robots became the standard for industrial robots, improving their
usability. The introduction of offline programming technology made it possible for robots to be
programmed and tested in a virtual environment, reducing debugging time in actual production. In the
21st century, many production lines began to move toward automation, including robot collaborative
industry. Material handling, processing assembly, and product packaging are completed by integrated
operations. The widespread application of artificial intelligence algorithms and big data can enable
robots to learn and accumulate autonomously and constantly optimize themselves.
In the history of industrial robot development, the visual servo system has been widely used, and
visual servo system is a technology that uses visual information to control robot movement, captures
images through cameras, processes and extracts image features, and forms control signals according to
features through error calculation, so as to adjust the position and attitude of the robot itself [3]. Visual
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0168
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
132
servo (VS) is mainly divided into image-based IBVS and position-based PBVS [3]. The former directly
calculates image feature points, while the latter uses image feature points to calculate three-dimensional
parameters for attitude estimation and then control. Since the system includes both perception and
control aspects, each link can improve the performance of the visual servo by introducing more efficient
and advanced algorithms and technologies.
As an important system of robots in the industrial field, the visual servo system brings great
development for the autonomy and independence of robots in completing tasks, and effectively reduces
labor costs and time costs. It is widely used in the industrial field and has a variety of application
scenarios.
The visual servo system can accurately guide the robot to carry out complex assembly tasks. For
example, in automobile manufacturing, the visual servo system can help the robot to identify and select
small parts in the parts library, and then complete the installation, such as screws, metal blocks, baffles,
etc. This greatly reduces the labor cost of the factory and improves the production efficiency of the
production line.
In the welding process, the visual servo system can monitor the welding position, the completion of
welding and the welding quality in real time. Through the images provided by the camera, the robot uses
the miniature coordinates of each point on the surface of the welding object to make connection and
analyzes the precise welding path to ensure the same weld accuracy.
Visual servo also has applications in product quality detection in industrial production. Through the
extraction of object feature points by robots, it can carry out multiple magnification to find the missing
feature points on the structure of the object, so as to identify plane scratches, dimensional deviations and
other errors.
In addition, there are many improvement schemes based on visual servo system, which aim to
improve the efficiency and accuracy of robots to complete tasks through the visual servo system. For
example, by combining the advantages of position-based visual servo and image-based visual servo
control methods, 2.5D VS establishes a connection between the image plane and three-dimensional
space, builds a hybrid Jacobian matrix, and adjusts errors from the depth direction of the image [4]. The
backward camera phenomenon and singularity problem of IBVS system are overcome, thus ensuring
the stability of the control process.
2. Limitations analysis of traditional visual servo systems
There are still limitations in the traditional visual servo system. The shortcomings and improvements in
achieving autonomous tasks can be analyzed in this section.
2.1. Robustness
The traditional visual servo system often shows insufficient robustness when dealing with different
environmental changes and external interference, in other words, the robot is affected by the external
conditions of the system, resulting in reduced operating accuracy, or even severe control program error.
The nonlinear control of visual servo is mainly completed by the closed-loop motion trajectory error
calculation, and the condition interference outside the system will cause the system to deviate from the
original path, and it needs manual parameter adjustment to restart the operation [5]. External interference
is often manifested in noise, light, and obstruction problems. First of all, the electromagnetic interference
generated by industrial equipment will affect the electronic components of the camera, resulting in
distortion of the visual reception signal, and the vibration caused by noise will cause small changes in
the position of the camera to continue to swing, affecting the extraction stability. For image extraction,
noise will lead to different intensities of image feature edge construction, and some weak intensity
features will be missed by the receiver, resulting in errors in image modeling, and eventually lead to
operation errors.
The light and obstacle occlusion will also have a significant impact, strong light will lead to the
camera lens's dazzling effect, affecting the image clarity. The light source into the camera is very
obvious, thus affecting the judgment of the picture information, coupled with the vision sensor pixels
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0168
133
under strong light saturation phenomenon, resulting in the image blank, loss of detail information and
finally affecting the image quality affection. The occlusion of obstacles will directly hinder the receiving
line of sight, failing to obtain correct and complete information, or the extraction of obstacle features as
the characteristics of the target object, which will cause large errors in the operation and deviate from
the control trajectory.
2.2. Response time
Response time is also one of the limitations, meaning that the robot takes a long time from receiving the
control signal to actually producing the output operation, which affects the efficiency and accuracy of
the task. The control algorithm of the visual servo system needs to calculate the control instructions
according to the image processing results, and the complexity of the control algorithm will directly affect
the response time. The traditional visual servo system mainly uses PID controller. PID control is a classic
feedback control algorithm, which is a common system in industrial automation [6]. Error correction is
carried out by three parameters, proportion, integral and differential, so as to achieve a control effect [6].
First, proportional control generates a control signal according to the consistent error and changes the
control response speed by adjusting the proportional gain, while the integral control generates a control
signal according to a large number of accumulated values of errors to eliminate the steady-state error
[6]. As the adjustment of the integral time constant and integral gain requires the accumulation of signals,
the response speed will be too slow. Finally, differential control generates a control signal through the
error change rate, so as to predict the future error change trend. However, the controller gain of PID is
fixed, and the error accumulation in actual operation is also non-linear, so PID control needs to spend a
certain amount of time in the error calibration process.
2.3. Image extraction accuracy
For the visual sensing part of the traditional visual servo system, most of the edge detection method is
used, which is also the basic technology of image processing, for detecting the object boundary in the
image, such as Sobel operator. The Sobel operator first grays the color image and uses two filters to
calculate the gradients Gx and Gy in two directions respectively, so as to carry out convolution
operations and judge the edge position of the object according to the gradient size and direction [7].
When the gradient size exceeds the set threshold, the layer is regarded as the edge layer. Although the
calculation steps of Sobel operator are simple and the real-time processing ability is strong, its limitation
and the volume of the convolution kernel are too large to determine the fine edge details, plus there are
only two detection directions, and it also has a large error for variable objects. In addition to Sobel
operator, Canny algorithm is also a common detection method. It smooths the image through the two-
dimensional formula of the Gaussian filter, then calculates the gradients in both directions, then refines
the edges using non maxima suppression, and finally sets the high and low thresholds for classification
to facilitate the edges to be connected along the gradient size [7]. Canny has high robustness and stability,
but the calculation steps are too complex, resulting in poor real-time performance, which is not
conducive to industrial dynamic tasks.
Based on the limitations of the above visual servo, there are still considerable obstacles for the
traditional visual servo to achieve full automation. In order to improve production efficiency and product
quality, reduce unnecessary labor costs, and promote the progress of the overall technology of industrial
robots, it is necessary to improve the visual servo system combined with sliding mode control and deep
reinforcement learning technology.
3. Feasibility of optimization scheme
Sliding mode control is a nonlinear robust control method, which has the characteristics of fast response
and strong robustness, and can effectively deal with the uncertainty and external interference of the
system. It has two major design steps. The first is to design sliding surface, s=ce +e󰇗 which is also
the ideal motion state of the system; the second is to design control law 𝑠󰇗 = −𝜖𝑠𝑔𝑛(𝑠) , 𝜖 > 0, also
called function switch, which is used to control the target object to approach the sliding mode surface
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0168
134
continuously [8, 9]. After the object reaches the sliding surface, only the control law will affect the
motion trajectory of the object, which is not affected by external factors [8]. So that's why it's robust.
In recent years, some scholars have also carried out detailed studies on this aspect. For example, M.
Parsapour et al used the robust estimator based on the Untraceless Kalman Observer (UKO) cascade
and Kalman Filter (KF) to infer the physical parameters and placement posture of the target object, and
established a sliding mode control model [10]. In this paper, the author uses Lyapunov theory to analyze
the stability of the closed-loop control system [11]. By selecting the appropriate Lyapunov function
V= 1/2STS and calculating its function derivative, it is proved that it is always negative under the
action of the control signal, that is, the state of the system tends [11]. In the experiment, the authors used
a 5-DOF RV robot. For the first experiment, the visual adjustment experiment, the target object does
not move. The system needs to adjust the position and attitude of the end effector to make it reach the
desired state, and control the signal through different types of switching functions. The results show that
the controller converges the trajectory to the sliding surface within 0.2 seconds, and makes the position
and velocity errors of the end-effector close to zero within 0.8 seconds [10]. In addition, the use of a
sliding mode controller with a saturation function can effectively reduce the chattering phenomenon of
sliding mode control. In the second experiment, the verification of the overall performance, the target
object is independently moved along the X, Y, Z, A and B directions to verify the tracking performance
of the system under the condition of target movement. The results show that the controller can respond
quickly to the target movement. Through these experiments, it is also proved that sliding mode control
can help the visual servo system overcome the shortcomings of slow response and poor robustness.
In addition, visual servo can also be combined with Convolutional Neural Network (CNN) to
improve the system's performance of receiving information, that is, the ability to extract images. The
CNN is a deep learning model with powerful image feature extraction ability [12]. CNN extracts pixel
values through the convolutional kernel to generate a rough model and then extracts the image by
dropout and pooling to reduce dimension and parameters and retain key features, aiming at improving
computing efficiency [12]. Then the number of pooling is activated by the ruler function, and finally the
matrix transformation of the full connection layer is carried out to convert countless input parameters
into a single control signal through the function [12]. Therefore, CNN can process complex image
features and extract them with higher precision. In an article based on the detection system combined
with isolation security and deep learning, the author compared the accuracy of CNN and SVM, and the
results showed that CNN always maintained a high and stable accuracy in the whole training process
(shown in the Figure 1) [13].
Figure 1. Experiment result accuracy performance of CNN and SVM [11].
4. Proposal of optimization program
Based on the two available improvements, visual servo systems can be used in different situations. Since
the sliding mode control and CNN cannot meet all situations in terms of execution efficiency,
implementation cost, and operational complexity, engineers need to carry out scenario analysis and
choose the appropriate optimization scheme under the specific task.
0.75
0.8
0.85
0.9
0.95
1
1.05
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
CNN vs SVM accuracy
CNN SVM
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0168
135
4.1. Sliding mode control system
In the environment with high real-time requirements and relatively simple tasks, only the Sliding Mode
Control (SMC) system can meet the requirements. Because the system can keep the object moving
steadily on the sliding surface, to achieve fast response and complete the task with high robustness, this
method is very suitable for the task requiring the robot to perform low latency and high real-time tasks.
For example, complete the robot's production line assembly tasks and sorting tasks. In the task, the robot
does not need to show high-intensity image extraction capability but only needs to lock the target object
and perform the corresponding operation. In addition, such tasks require high stability, and if the system
is vulnerable to external interference, it needs to be manually monitored and adjusted in real time.
4.2. Convolutional neural network
In specific complex visual tasks, it is suitable to use CNN alone for feature extraction and processing,
which can improve the accuracy of image recognition. The most common example is the slim sensor
for autonomous vehicles, where the vehicle needs to obtain real-time images of the road environment
through the camera and analyze the images to identify obstacles such as pedestrians and other vehicles
and convey danger signals to the car. In addition, this solution is also suitable for product testing, for
complex structural frames, each corner and geometric slope need to ensure high precision so that there
are no accidents in application.
4.3. Combination of SMC and CNN
In the task of high precision visual servo control in complex dynamic environment, the optimization
scheme combining SMC and CNN is needed. The system not only needs high-precision image extraction
but also needs to resist the interference of changing environment. For example, in the robot welding
process, the system needs to complete the operation with high precision in the high-intensity noise
environment generated by the welding. The CNN system provides virtually error-free details of the
object's appearance such as edges and dioramas. The sliding mode control can make the system stably
execute and input instructions on the sliding surface, which perfectly integrates the points of both sides
(shown in Figure 2).
Figure 2. Comprehensive diagram.
The future robot vision servo system will pay more attention to the integration of multiple sensors,
one purpose is to reduce the acceptance pressure of a single camera, the other is to analyze the target
image from more angles. The vision sensor will be combined with lidar, ultrasonic, thermal imaging,
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0168
136
etc. The system will obtain more comprehensive environmental information and enhance its autonomous
decision-making ability. With the continuous progress in the field of AI, industrial robots will pay more
attention to intelligence. Deep learning will be a big picture of future research, and the development of
multi-modal sensing will provide more powerful computing power for robots.
5. Conclusion
In this study, the optimization of the visual servo system is deeply discussed. By combining sliding
mode control and CNN, the system robustness, response speed and image processing accuracy are
effectively improved, which is of great help to the improvement of traditional visual servo systems.
The sliding surface and control law are designed to make the system move stably in the calibrated
trajectory. The strong robustness to illumination changes, noise and occludes will be made and the
correct path processing under uncertain conditions will be realized. By using the process of
convolutional extraction and pooling as well as dropout, the CNN help system can recognize and extract
the full range of pixels of the image, which is more precise and can extract more parameters than the
traditional edge detection and corner detection. In the future, with the continuous development of
artificial intelligence and deep learning technology, the industrial robot visual servo system will be
further intelligent and efficient. The application of a multi-sensor fusion technology robot will expand
the application range of the robot, improve its adaptability and autonomy in complex environments, and
achieve a high degree of automation.
References
[1] Grace, J. (1937). environment and nation. Griffith Taylor. The Journal of Geology, 45(5), 571
572. https://doi.org/10.1086/624573
[2] Gasparetto, A., & Scalera, L. (2019). From the unimate to the delta robot: the early decades of
industrial robotics. In Explorations in the History and Heritage of Machines and Mechanisms:
Proceedings of the 2018 HMM IFToMM Symposium on History of Machines and
Mechanisms (pp. 284-295). Springer International Publishing.
[3] Cong, V.D., & Hanh, L.D. (2023). A review and performance comparison of visual servoing
controls. International Journal of Intelligent Robotics and Applications, 7, 65-90.
[4] Zhang, H., Li, M., Ma, S., Jiang, H., & Wang, H. (2021). Recent advances on robot visual servo
control methods. Recent Patents on Mechanical Engineering, 14(3), 298-312.
[5] Grimble, M. J., & Majecki, P. (2020). Nonlinear Industrial Control Systems. Springer London.
[6] Borase, R. P., Maghade, D. K., Sondkar, S. Y., & Pawar, S. N. (2021). A review of PID control,
tuning methods and applications. International Journal of Dynamics and Control, 9, 818-827.
[7] Sun, R., Lei, T., Chen, Q., Wang, Z., Du, X., Zhao, W., & Nandi, A. K. (2022). Survey of image
edge detection. Frontiers in Signal Processing, 2, 826967.
[8] Zhang, X. (2022). SMC for nonlinear systems with mismatched uncertainty using Lyapunov-
function integral sliding mode. International Journal of Control, 95(10), 2710-2725.
[9] Gambhire, S. J., Kishore, D. R., Londhe, P. S., & Pawar, S. N. (2021). Review of sliding mode
based control techniques for control system applications. International Journal of dynamics
and control, 9(1), 363-378.
[10] Utkin, V., Poznyak, A., Orlov, Y. V., & Polyakov, A. (2020). Road map for sliding mode control
design. Berlin/Heidelberg, Germany: Springer International Publishing.
[11] Parsapour, M., RayatDoost, S., & Taghirad, H. D. (2013, February). Position-based sliding mode
control for visual servoing system. In 2013 First RSI/ISM International Conference on
Robotics and Mechatronics (ICRoM) (pp. 337-342). IEEE.
[12] Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2021). A survey of convolutional neural networks:
analysis, applications, and prospects. IEEE transactions on neural networks and learning
systems, 33(12), 6999-7019.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0168
137
[13] Ramirez, A. G., Lara, C., Betev, L., Bilanovic, D., & Kebschull, U. (2018). Arhuaco: Deep
learning and isolation-based security for distributed high-throughput computing. arXiv
preprint arXiv:1801.04179.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0168
138
Optimizing supply chain networks using mixed integer linear
programming (MILP)
Xu Li1, Xiaoheng Ji2,, Xiaolong Zeng3,*,
1 The University of Sheffield, Sheffield, The UK
2The University of Auckland, Auckland, New Zealand
3The University of Queensland, St Lucia QLD 4072, Australia
Xiaoheng Ji and Xiaolong Zeng contributed equally to this work.
*rara481846778@gmail.com
Abstract. Mixed Integer Linear Programming (MILP) has emerged as a powerful tool for
optimizing complex supply chain networks. This paper explores the theoretical foundations of
MILP, including the integration of integer variables and advanced solution techniques such as
branch-and-bound and branch-and-cut algorithms. Through detailed modeling of production
planning, network design, and transportation logistics, MILP enables companies to achieve
significant cost reductions and operational efficiencies. We present case studies from retail,
manufacturing, and pharmaceutical sectors to illustrate the practical applications of MILP. These
examples demonstrate how MILP optimization can lead to reductions in production and
inventory costs, improved customer satisfaction, and enhanced service levels. The findings
underscore the value of MILP in addressing the multifaceted challenges of modern supply chain
management.
Keywords: Mixed Integer Linear Programming (MILP), supply chain optimization, production
planning, network design, transportation logistics.
1. Introduction
Supply chain management is a critical function for organizations seeking to enhance efficiency and
competitiveness. As global markets become more interconnected, optimizing supply chain networks
presents both opportunities and challenges. Traditional linear programming (LP) techniques provide a
foundation for addressing these challenges but often fall short when decision variables must be whole
numbers. This necessity introduces Mixed Integer Linear Programming (MILP), a sophisticated
approach that incorporates integer variables to reflect real-world constraints such as the number of
production batches or transportation trips. MILP transforms the optimization landscape by making the
solution space discrete and non-convex, requiring specialized algorithms like branch-and-bound and
branch-and-cut to navigate this complexity. These advanced techniques enable efficient exploration of
large solution spaces, identifying optimal solutions that meet all constraints. For example, in a
transportation problem, the need to minimize the number of trips while ensuring timely deliveries
requires integer solutions, as fractional trips are impractical. The application of MILP extends across
various industries, from retail and manufacturing to pharmaceuticals, each with unique supply chain
dynamics. In retail, optimizing warehouse locations and inventory levels can significantly reduce
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240642
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
139
transportation costs and improve service levels [1]. Manufacturing firms leverage MILP to enhance
production schedules and distribution routes, achieving cost savings and reduced lead times.
Pharmaceutical companies use MILP to ensure regulatory compliance and timely delivery of
medications, crucial for patient care. This paper delves into the theoretical underpinnings of MILP,
explores its application in supply chain optimization, and presents case studies to illustrate its practical
benefits. By examining these aspects, we aim to highlight the transformative impact of MILP on supply
chain management, offering insights for businesses looking to optimize their operations in an
increasingly complex and competitive landscape.
2. Theoretical Foundations of MILP
2.1. Linear Programming Basics
Linear programming (LP) is a fundamental technique in optimization used to achieve the best outcome,
such as minimizing costs or maximizing profits, given a set of linear constraints. The general form of
an LP problem consists of an objective function and a set of constraints. The objective function, typically
a linear equation, represents the goal of the optimization, such as minimizing the total cost in a supply
chain network. Constraints, on the other hand, represent the limitations or requirements of the system,
such as production capacities, demand requirements, or budgetary restrictions.
For example, consider a company that produces two products, A and B. The objective function might
be to maximize the total profit, represented as Z=50x1+40x2, where x1and x2 are the quantities of
products A and B, respectively. The constraints might include limitations on labor and material, such as
2x1+3x2120(labor hours) and x1+2x2100(material units) [2].
To solve LP problems, algorithms like the Simplex method are commonly used. The Simplex method
iteratively moves along the edges of the feasible region defined by the constraints to find the optimal
solution. This method is efficient for many practical problems and can handle a large number of variables
and constraints. For instance, in a supply chain optimization problem with hundreds of products and
multiple constraints, the Simplex method can quickly navigate through the feasible region to identify
the optimal distribution of resources, minimizing overall costs while meeting all demand requirements.
2.2. Introduction to Integer Variables
In many real-world applications, decision variables cannot be fractional and must take on whole
numbers. For example, when determining the number of trucks to dispatch or the number of production
batches to run, fractional values are not practical. This requirement introduces integer variables into the
optimization problem, transforming it into a Mixed Integer Linear Programming (MILP) problem.
The inclusion of integer variables adds significant complexity to the problem because the solution space
becomes discrete and non-convex. This means that traditional LP solution techniques, which rely on
convexity, are no longer applicable. For instance, in a transportation problem where the goal is to
minimize the number of trips while ensuring all deliveries are made, the number of trips must be an
integer. A fractional trip does not make sense in this context [3].
The complexity arises because the number of possible solutions increases exponentially with the
number of integer variables. Consider a supply chain problem with five potential warehouse locations
(binary decision variables for each location: open or closed). The solution space consists of
25=32possible combinations. As the number of decision variables grows, this space becomes vast,
making the problem challenging to solve. Specialized algorithms and techniques, such as branch-and-
bound, are necessary to efficiently explore this large and complex solution space to find the optimal
integer solution. Table 1 illustrates the concept of integer variables in an MILP problem, using the
example of deciding whether to open or close five potential warehouse locations.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240642
140
Table 1. Integer Variables In MILP
Scenario
Decision Variable
Possible Combinations
Total Combinations
Warehouse 1
Open (1) / Close (0)
2
32
Warehouse 2
Open (1) / Close (0)
2
32
Warehouse 3
Open (1) / Close (0)
2
32
Warehouse 4
Open (1) / Close (0)
2
32
Warehouse 5
Open (1) / Close (0)
2
32
2.3. MILP Solution Techniques
Solving MILP problems involves advanced techniques that efficiently navigate the discrete and non-
convex solution space. One widely used method is the branch-and-bound algorithm. This technique
systematically explores branches of the solution space tree, calculating bounds to eliminate regions that
do not contain the optimal solution. For example, in a production scheduling problem with constraints
on machine capacities and delivery deadlines, branch-and-bound can effectively prune suboptimal
schedules, focusing computational efforts on the most promising regions of the solution space. Another
powerful method is the branch-and-cut algorithm, which enhances branch-and-bound by incorporating
cutting planes. Cutting planes are linear inequalities added to the MILP model to exclude infeasible
regions without excluding any feasible integer solutions. This technique refines the feasible region
iteratively, converging towards the optimal solution more quickly. For instance, in a logistics network
design problem, cutting planes can eliminate infeasible routes, reducing the complexity and solving time
of the problem. Modern MILP solvers, such as CPLEX and Gurobi, implement these advanced
techniques efficiently [4]. These solvers are equipped with sophisticated algorithms that handle large-
scale MILP problems involving thousands of variables and constraints. For example, Gurobi's parallel
processing capabilities can solve complex optimization problems in industries ranging from energy to
finance within reasonable timeframes. Additionally, these solvers incorporate heuristic methods to
quickly find good feasible solutions, which are then refined through exact optimization techniques. In a
supply chain context, a heuristic might provide a near-optimal initial solution for warehouse placement,
which the solver then improves upon, ensuring that the final solution is both optimal and
computationally feasible.
3. Modeling Supply Chain Networks with MILP
3.1. Network Design
Designing a supply chain network involves making critical decisions regarding the optimal locations
and capacities of various facilities, such as plants, warehouses, and distribution centers. MILP models
for network design are constructed to address these decisions by including decision variables that
represent facility locations, production quantities, transportation routes, and inventory levels. The
primary objective of these models is to minimize the total cost, which includes fixed facility costs,
transportation costs, and inventory holding costs, all while satisfying demand and capacity constraints.
For instance, consider a multinational retail company aiming to optimize its distribution network across
North America. The company needs to decide the number and location of warehouses to minimize total
costs while ensuring timely delivery to all retail outlets. The MILP model might include variables such
as the binary decision to open or close a warehouse, the quantity of goods to be shipped from each
warehouse to retail outlets, and the inventory levels at each warehouse. Constraints would include
warehouse capacity limits, demand requirements at each retail outlet, and transportation capacity [5].
The model might reveal that by closing two underperforming warehouses and opening one new,
strategically located distribution center, the company can reduce overall costs by 12%. Additionally, by
optimizing transportation routes, the model could identify a potential 8% reduction in transportation
costs, achieving a balance between fixed facility costs and variable transportation expenses. This level
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240642
141
of detailed analysis and optimization highlights the power of MILP in designing efficient, cost-effective
supply chain networks.
3.2. Production Planning
Production planning is a critical aspect of supply chain management that involves determining the
optimal production schedules and quantities for each product at each facility. MILP models for
production planning incorporate a range of constraints, including production capacities, setup times, and
inventory levels, to develop a comprehensive production schedule. The primary goal is to minimize
production and inventory costs while ensuring that customer demand is met promptly. For example, a
global electronics manufacturer might use an MILP model to optimize its production planning across
multiple factories worldwide. The decision variables in this model could include the number of units of
each product to be produced at each factory, the timing of production runs, and the levels of inventory
to be maintained at each location. Constraints would involve the production capacity of each factory,
the setup times required for switching production lines between different products, and the inventory
holding capacities. By implementing the MILP model, the manufacturer could identify an optimal
production schedule that reduces total production costs by 15% and inventory holding costs by 20% [6].
The model might suggest producing certain high-demand products in factories closer to key markets to
reduce lead times, while producing lower-demand products in factories with lower production costs.
This optimization would result in improved customer satisfaction due to shorter delivery times and lower
operational costs, demonstrating the value of MILP in production planning.
Figure 1. Impact of MILP Optimization on Production and Inventory Costs
3.3. Transportation and Logistics
Transportation and logistics optimization focuses on determining the most cost-effective transportation
routes and modes for delivering products from suppliers to customers. MILP models in this area include
decision variables that represent transportation modes, routes, and shipment quantities. The primary
objective is to minimize transportation costs while ensuring timely delivery and maintaining high service
levels. Consider a large e-commerce company that needs to optimize its logistics network to handle
increasing order volumes efficiently. The MILP model for this scenario might include variables for
selecting transportation modes (e.g., air, sea, or land), choosing specific routes for each shipment, and
determining the quantities of goods to be shipped along each route. Constraints would include vehicle
capacities, delivery windows, and regulatory restrictions [7]. By using an MILP model, the e-commerce
company could identify optimal transportation routes that reduce overall logistics costs by 18%. For
instance, the model might recommend shifting a portion of air shipments to sea freight for certain routes
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240642
142
where delivery time is less critical, resulting in significant cost savings. Additionally, the model could
optimize delivery schedules to ensure that trucks and delivery vans are utilized to their full capacity,
reducing the number of trips required and further cutting costs. This detailed optimization enables the
company to maintain high service levels while minimizing transportation expenses, illustrating the
effectiveness of MILP in transportation and logistics management.
4. Case Studies in Supply Chain Optimization
4.1. Retail Supply Chain
A major retail chain, which operates over 500 stores across multiple regions, faced challenges in
managing its extensive supply chain network. The company decided to utilize Mixed Integer Linear
Programming (MILP) to optimize its supply chain, focusing specifically on the locations of its
warehouses and the management of its inventory. The MILP model considered various factors such as
supplier locations, existing warehouse capacities, transportation costs, and demand at each retail outlet.
By modeling the entire network, including the potential for new warehouse locations, the company
identified that relocating certain warehouses closer to high-demand areas would significantly reduce
transportation costs. The optimization process led to the strategic opening of three new warehouses and
the closure of two underperforming ones. This reconfiguration resulted in a 15% reduction in overall
supply chain costs, amounting to annual savings of approximately $30 million. The MILP model also
provided detailed insights into optimal inventory levels at each warehouse. By aligning inventory
management with demand forecasts, the company reduced stockouts by 20%, which in turn improved
service levels and customer satisfaction [8]. Additionally, the optimization led to a 10% reduction in
transportation costs, equivalent to saving $10 million annually. These improvements highlighted the
model's effectiveness in balancing cost and service level trade-offs, ultimately enhancing the efficiency
and performance of the supply chain network.
4.2. Manufacturing Supply Chain
A global manufacturing firm specializing in automotive components applied MILP to optimize its
complex production and distribution network. The firm's network included multiple production plants,
distribution centers, and a vast customer base spread across different continents. The MILP model
incorporated decision variables such as plant locations, production schedules, and distribution routes,
alongside constraints like production capacities, lead times, and transportation costs. The model revealed
that consolidating certain production activities and adjusting the production schedules could
significantly enhance efficiency. Specifically, the firm decided to centralize the production of high-
demand components in plants with the highest capacity utilization rates, which led to a 20% reduction
in production costs, saving approximately $50 million annually. Additionally, the model suggested
optimizing distribution routes to minimize lead times and transportation expenses. By implementing
these strategies, the firm reduced lead times by 25%, improving the average delivery time from 10 days
to 7.5 days. This enhancement in lead times was particularly crucial for maintaining competitive
advantage in the fast-paced automotive industry [9].
4.3. Pharmaceutical Supply Chain
A pharmaceutical company that produces and distributes a wide range of medications faced the
challenge of optimizing its supply chain network to meet stringent regulatory requirements and maintain
high service levels. The company turned to MILP to optimize its drug production and distribution
processes, focusing on production quantities, facility locations, and transportation routes. The MILP
model incorporated variables such as production capacities at different facilities, demand forecasts for
various medications, transportation costs, and regulatory compliance requirements. By analyzing these
variables, the model identified optimal production schedules and facility locations that minimized costs
while ensuring timely delivery of medications. The optimization led to a 20% reduction in production
costs, saving the company approximately $25 million annually. This was achieved by consolidating
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240642
143
production at facilities with higher efficiency and lower operational costs. Additionally, the model
helped streamline the distribution network, reducing delivery times by 15%. The average delivery time
was reduced from 5 days to 4.25 days, which was critical for ensuring that life-saving medications
reached patients promptly [10].
5. Conclusion
Mixed Integer Linear Programming (MILP) stands as a robust optimization tool, enabling businesses to
address the complexities of modern supply chain management effectively. Through its ability to handle
integer variables and complex constraints, MILP provides detailed and actionable insights for
optimizing production planning, network design, and transportation logistics. The case studies presented
in this paper demonstrate substantial cost savings and operational improvements across various sectors.
Retail chains achieved significant reductions in supply chain costs and improved service levels through
strategic warehouse relocations. Manufacturing firms benefited from streamlined production and
distribution processes, enhancing efficiency and reducing lead times. Pharmaceutical companies
ensured regulatory compliance while maintaining high service standards and minimizing operational
costs. These examples underscore the practical value of MILP in achieving optimized, cost-effective,
and efficient supply chain networks. As businesses continue to navigate the challenges of global markets,
MILP offers a powerful framework for making informed, strategic decisions that drive performance and
competitiveness.
References
[1] Thomas, Meghna, and Lina Sela. "A MixedInteger Linear Programming Framework for
Optimization of Water Network Operations Problems." Water Resources Research 60.2
(2024): e2023WR034526.
[2] Rosenhahn, Bodo. "Optimization of Sparsity-Constrained Neural Networks as a Mixed Integer
Linear Program: NN2MILP." Journal of Optimization Theory and Applications 199.3 (2023):
931-954.
[3] Ágoston, Kolos Cs, and Marianna E.-Nagy. "Mixed integer linear programming formulation for
K-means clustering problem." Central European Journal of Operations Research 32.1 (2024):
11-27.
[4] Kakkad, Dev A., et al. "Iterative MILP algorithm to find alternate solutions in linear programming
models." Optimization and Engineering (2024): 1-24.
[5] Li, Beibin, et al. "Large language models for supply chain optimization." arXiv preprint
arXiv:2307.03875 (2023).
[6] Teixeira, Eduardo dos Santos, et al. "A review of mathematical optimization models applied to
the sugarcane supply chain." International Transactions in Operational Research 30.4 (2023):
1755-1788.
[7] Kolasani, Saydulu. "Blockchain-driven supply chain innovations and advancement in
manufacturing and retail industries." Transactions on Latest Trends in IoT 6.6 (2023): 1-26.
[8] Edunjobi, Tolulope Esther. "The integrated banking-supply chain (IBSC) model for FMCG in
emerging markets." Finance & Accounting Research Journal 6.4 (2024): 531-545.
[9] Yandrapalli, Vinay. "Revolutionizing supply chains using power of generative ai." International
Journal of Research Publication and Reviews 4.12 (2023): 1556-1562.
[10] Ibrahim, Yasir, and Dhabia M. Al-Mohannadi. "Optimization of low-carbon hydrogen supply
chain networks in industrial clusters." International Journal of Hydrogen Energy 48.36 (2023):
13325-13342.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240642
144
Environmental monitoring system design based on STM32
platform
Yuhe Tie1,2,*, Peiming Chen1,3
1College of computer and information science southwest university school of software
southwest university, No. 2 Tiansheng Road, Beibei District, Chongqing, China
21292270530@qq.com
32958128946@qq.com
*corresponding author
Abstract. This study addresses the current societal demand for environmental monitoring by
designing an environmental monitoring system based on the STM32 platform. This system
assesses and monitors environmental conditions in real-time by tracking parameters such as CO,
PM2.5, temperature, humidity, and light intensity. It holds significant value in preventing air
pollution and improving indoor air quality. The system employs four types of sensors: the
DHT11 digital temperature and humidity sensor, the BH1750FV light sensor, the
GP2Y1010AUOF optical dust sensor, and the MQ-7 CO sensor to collect environmental data,
which is then processed by the STM32F103C8T6 controller. This system is characterized by its
real-time capabilities, high precision, and low power consumption, making it highly practical
and valuable for widespread application. The paper provides a detailed discussion of sensor
selection, measurement algorithms, and system design and implementation, offering valuable
insights for research and applications in related fields.
Keywords: STM32, Environmental Monitoring, Multi-Sensor System, Modular Design.
1. Introduction
With the increasing environmental awareness, there is growing concern about environmental quality,
and people are eager to know whether their surroundings are safe and healthy. Therefore, designing a
system capable of real-time environmental quality monitoring is essential. Sensor technology is now
highly advanced, enabling precise measurement of various environmental parameters such as
temperature, humidity, CO, and PM2.5, providing a solid foundation for designing an environmental
monitoring system[1].
In many settings, such as homes, industries, healthcare, and public spaces, monitoring environmental
quality is necessary to ensure health and safety. Thus, designing an environmental monitoring system is
of practical necessity. Given these essential needs, creating a system capable of real-time environmental
quality monitoring is meaningful, providing significant protection and convenience to people and
promoting environmental protection efforts.
The environmental monitoring system based on the STM32 platform can monitor multiple
environmental parameters in real-time, such as temperature, humidity, air pressure, and light intensity,
and further analyze and display the results on an OLED screen. This system is characterized by high
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240656
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
145
precision, strong real-time performance, fast response speed, ease of use, and high reliability, making it
applicable across fields such as meteorology, environment, agriculture, and industry, helping people
better understand and manage their environment[2]. In summary, this system is an efficient and practical
environmental monitoring device that provides timely and accurate environmental data, thereby
promoting environmental protection and sustainable development. The system's specific functions
include:
CO Monitoring: CO is a toxic and harmful gas, and high concentrations of CO can severely impact
human health. The CO monitoring function can track the CO concentration in the environment in real-
time, promptly identifying CO pollution and taking action to safeguard human health.
PM2.5 Monitoring: PM2.5 refers to particulate matter in the air with a diameter of 2.5 microns or
less, which can penetrate the respiratory system and pose significant health risks. The PM2.5 monitoring
function can track PM2.5 concentrations in real-time, enabling timely detection of pollution and
intervention to protect human health.
Temperature and Humidity Monitoring: Temperature and humidity are crucial parameters affecting
indoor comfort and human health. This function monitors environmental temperature and humidity
changes in real-time, helping users adjust indoor conditions to enhance comfort and health levels.
Light Intensity Monitoring: Light intensity is an important parameter affecting indoor lighting and
human circadian rhythms. This function monitors changes in environmental light intensity in real-time,
helping users adjust indoor lighting conditions to improve comfort and biological rhythms.
By monitoring these four environmental parameters, the system provides real-time insights into
indoor environmental changes, assisting users in adjusting indoor conditions to improve comfort and
health levels. Additionally, this data can be utilized in environmental pollution monitoring,
meteorological research, agricultural production, and other areas, offering broad application
prospects[3].
2. Overall System Design (Principle Diagram, Preliminary Sensor Selection)
2.1. Preliminary Sensor Selection
Sensors used in practical applications mainly fall into two categories: analog sensors and digital sensors.
Traditional analog sensors offer the advantages of fast measurement conversion speed and a wide
temperature measurement range. However, the analog signal processing of analog sensors is complex,
and during transmission, these signals are prone to electromagnetic interference, leading to errors. In
scenarios requiring multi-point temperature and humidity detection, differences in the wiring distances
from the measurement points to the testing device, as well as inconsistencies in the parameters of various
sensitive elements, can introduce errors that are difficult to eliminate. Additionally, the accuracy of
analog-to-digital conversion systems is inherently limited, exhibiting some non-linearity and poor
interchangeability. Using sensors with direct digital output can avoid these issues. Digital sensors can
convert the measured analog quantity directly into a digital output, which can be directly interfaced with
digital devices (such as computers or digital display systems) and processed by DSPs or computers.
These sensors possess high anti-interference capabilities, along with high measurement accuracy and
resolution, good stability, easy signal processing, transmission, and automatic control. They are also
conducive to dynamic and multi-channel measurement, offering intuitive reading, convenient
installation, simple maintenance, and high reliability. Despite the slower response speed and narrower
temperature measurement range, digital sensor technology has garnered increasing attention[4].
Considering the system's economic viability and the advantages and disadvantages of sensors, this
study opts for four integrated digital sensors.
2.2. Controller Selection
In the design of this environmental monitoring system, the STM32 series microcontroller was chosen
as the controller. The STM32 series microcontrollers are well-suited for IoT and embedded systems due
to their high performance, low power consumption, extensive peripherals, and excellent scalability[5].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240656
146
In this design, the STM32 microcontroller serves as the main controller, handling data acquisition and
processing via serial communication with various sensors. The processed data is then transmitted to the
upper computer through a module, enabling real-time monitoring and remote control of environmental
parameters. Additionally, the STM32 microcontroller can control other peripherals, such as an external
OLED display screen, which is used for displaying environmental parameter information and other
indicators.
2.3. Schematic Diagram
Figure 1. Overall Design Diagram Figure 2. Schematic Diagram
3. Sensor Working Principles, Measurement Algorithms, Circuits, and Installation Methods
3.1. Sensor Module Working Principles
3.1.1. DHT11 Digital Temperature and Humidity Sensor
The DHT11 sensor integrates a resistive humidity sensor and an NTC thermistor-based temperature
sensor. When powered, the internal circuits of the DHT11 sensor become operational. Communication
between the sensor and an external MCU occurs via a single-wire communication protocol. The external
MCU initiates the process by sending a start signal to the sensor, prompting it to begin collecting
environmental temperature and humidity data. The humidity and temperature sensors inside the DHT11
then simultaneously measure the environmental humidity and temperature values. The sensor's internal
digital signal processing module converts these measurements into digital signals, which are sent to the
external MCU through the single-wire protocol. The external MCU decodes and calculates the received
digital signals, ultimately outputting the environmental temperature and humidity values. It should be
noted that the measurement accuracy of the DHT11 sensor is generally low, with temperature
measurement errors reaching up to ±2℃ and humidity measurement errors up to ±5%RH. Additionally,
the sensor requires some time for stabilization and response, so calibration and delay handling may be
necessary in practical applications. This sensor can be used in HVAC, dehumidifiers, test and
measurement equipment, consumer goods, automotive, automatic control, data loggers, weather stations,
home appliances for temperature and humidity regulation, medical, and other related humidity detection
and control applications.
Figure 3. Internal Schematic and Physical Diagram of the DHT11 Digital Temperature and Humidity
Sensor
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240656
147
Table 1. Functional Parameters of the DHT11 Temperature and Humidity Module
Product Feature
Detects ambient humidity and temperature
Sensor
DHT11
Humidity
Measurement Range
20%-95% (0℃-50℃ range) with ±5% error
Temperature
Measurement Range
0℃-50℃ with ±2℃ error
Operating Voltage
3.3V-5V
Output Type
Digital output
Mounting Bolt Hole
Yes (Hole diameter 3.1mm, 10mm from edge)
PCB Size
32mm x 14mm
Power Indicator
Light
Red
Weight per Set
Approximately 8g
3.1.2. BH1750FV Light Sensor
The BH1750FV sensor integrates a photodiode and a digital signal processing chip internally. The
photodiode converts external light into an electrical signal, which is then amplified and filtered by the
digital signal processing chip before being converted into a digital signal. The BH1750FV sensor
supports various measurement modes and accuracy settings, allowing users to configure it according to
their needs. The sensor can operate in both continuous measurement and single measurement modes,
with mode and accuracy settings adjustable via the I2C interface. The measured light intensity values
are output to an external MCU through the I2C interface. The external MCU decodes and calculates the
digital signals received from the sensor to determine the environmental light intensity. It is important to
note that the BH1750FV sensor offers high measurement accuracy, with a resolution of up to 0.5 lx.
Additionally, the sensor has a rapid response time, providing stable light intensity measurements within
a short period.
Figure 4. Internal Schematic and Actual Image of the BH1750FV Light Sensor
Table 2. Basic Parameters of the BH1750 Light Intensity Sensor
Model: GY-302
Dimensions: 13.9mm × 18.5mm
Uses ROHM original BH1750FVI chip
Power Supply: 3-5V
Data Range: 0-65535
Built-in 16-bit ADC
Direct digital output, no complex calculations or calibration needed
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240656
148
3.1.3. Optical Dust Sensor (GP2Y1010AUOF)
The GP2Y1010AUOF is an infrared optical dust sensor that measures the concentration of dust in the
air and outputs an analog signal. The sensor integrates a pair of transmitter and receiver internally. The
transmitter emits infrared light, which passes through the airflow in front of the sensor, bringing dust
particles into the sensor. When dust enters the sensor, it scatters the infrared light, and some of the
scattered light is received by the internal receiver. The receiver then converts the received light signal
into an electrical signal, which is processed and amplified before being output.
The output signal of the GP2Y1010AUOF sensor has a linear relationship with the dust concentration,
allowing it to be converted into a dust concentration value after calibration and processing. Since the
sensor outputs an analog signal, it requires an ADC (Analog-to-Digital Converter) to convert it into a
digital signal for further processing and analysis.
It is important to note that the GP2Y1010AUOF sensor has specific performance requirements for
measuring dust particles, including its measurement range, sensitivity, and response time. In practical
applications, the sensor needs to be selected and calibrated based on the specific use case. The
GP2Y1014AU0F variant of this sensor is capable of detecting reflected light from dust particles,
including very fine particles such as tobacco smoke. It is commonly used in air purification systems and
can measure particles as small as 0.8 micrometers, detecting smoke from tobacco, pollen, and household
dust.
Figure 5. Internal Schematic and Physical Diagram of the BH1750FV and GP2Y1010AUOF Sensors
Table 3. Basic Parameters of the GP2Y1014AU Dust Sensor
Power Supply Voltage: 5-7V
Operating Temperature: -10 to 65°C
Current Consumption: 20mA (Maximum)
Small Particle Detection Threshold: 0.8µm
Sensitivity: 0.5V/(0.1mg/m³)
Voltage in Clean Air: 0.9V (Typical)
Operating Temperature: -10 to 65°C
Storage Temperature: -20 to 80°C
Lifespan: 5 years
Dimensions: 46mm × 30mm × 17.6mm
3.1.4. CO (MQ-7) Detection Sensor
The MQ-7 is a carbon monoxide (CO) gas sensor based on the principle of thermal conductivity, capable
of measuring the concentration of CO gas in the air and outputting an analog signal. Inside the MQ-7
sensor, there is a heating electrode that heats the surrounding air. The heated air creates convection,
making it easier for the CO gas to be absorbed by the sensor.
Additionally, the MQ-7 sensor contains a CO-sensitive electrode, which is coated with a catalyst
layer that can adsorb CO gas. When CO gas is adsorbed onto the surface of the sensitive electrode, it
causes a change in the electrode's resistance. The internal circuitry of the sensor uses specific algorithms
to calculate the concentration of CO gas and converts it into an analog output signal. The output signal
voltage is directly proportional to the CO gas concentration.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240656
149
The sensor's output signal needs to be converted into a digital signal via an AD converter, and then
processed and analyzed by a microcontroller or similar control device. Typically, calibration and
adjustment of the sensor are necessary to ensure the accuracy and stability of its output signal. It is
important to note that the MQ-7 sensor operates within a temperature range of 0℃ to 50℃ and under a
relative humidity below 95% RH. In practical applications, it is crucial to select the appropriate sensor
and implement necessary environmental controls and calibration based on specific application scenarios.
Figure 6. CO Sensor Wiring Diagram
Table 4. MQ-7 Basic Parameters
Functionality Achieved: Testing program included with this version
Chip Used: AT89S52
Crystal Oscillator: 11.0592 MHz
Baud Rate: 9600
Electrical Performance:
Input Voltage: DC 5V
Power Consumption (Current): 150mA
DO Output: TTL digital signal 0 and 1 (0.1V and 5V)
AO Output: 0.1-0.3V (no pollution) high concentration voltage approximately 4V
Standard Testing Conditions:
Temperature: 20°C ± 2°C
Humidity: 65% ± 5% RH
Standard Testing Circuit: Vc:5.0V 0.1V;VH:5.0V 0.1V
3.2. Sensor Module Measurement Algorithms
3.2.1. MQ-7 CO Sensor Measurement Algorithm
The MQ-7 CO sensor detects CO concentration through thermal conductivity. The measurement
principle of this sensor is based on the chemical reaction between combustible gases and oxygen. By
measuring the magnitude of the current generated after the reaction, the CO concentration is determined.
The measurement algorithm for this sensor typically uses linear interpolation to convert the sensor's
output voltage into CO concentration values, achieving accurate CO concentration measurements.
3.2.2. BH1750FV Light Sensor Measurement Algorithm
The BH1750FV light sensor employs a specialized optical sensing technology to accurately measure
ambient light intensity. The measurement algorithm for this sensor usually involves calibration methods,
converting the sensor's ADC output values into light intensity values. During calibration, external light
sources and photodiodes are used to determine the sensor's sensitivity and calibration parameters,
thereby improving measurement accuracy.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240656
150
3.2.3. DHT11 Digital Temperature and Humidity Sensor Measurement Algorithm
The DHT11 digital temperature and humidity sensor uses a humidity-sensitive element and a
temperature sensor to measure environmental temperature and humidity. The measurement algorithm
typically involves CRC checking, verifying and processing the sensor's output data to enhance data
reliability and accuracy. During the verification process, raw humidity and temperature data are
converted into actual humidity and temperature values for precise measurement of environmental
conditions.
3.2.4. Optical Dust Sensor GP2Y1010AUOF Measurement Algorithm
The GP2Y1010AUOF optical dust sensor utilizes a scattering optical principle to measure
environmental dust concentration. The measurement algorithm commonly employs pulse counting
methods, converting the sensor's output pulse signal into dust concentration values. During the
conversion process, numerical transformation and calibration are performed based on the sensor's
characteristics and calibration parameters to enhance measurement precision and reliability.
4. Control System Design (Simple Software and Hardware Block Diagram)
Microcontroller: The system uses an STM32 series microcontroller to achieve sensor data collection,
processing, and storage through programmed software. Timer interrupts are used for periodic data
collection, and the ADC module converts analog signals to digital form for data processing and storage.
The STM32F103C8T6 microcontroller, based on a 32-bit ARM core, features 64 or 128K bytes of flash
memory, USB, CAN, seven timers, two ADCs, and nine communication interfaces. Its key
characteristics include strong anti-interference capabilities, making it widely applicable in household
appliances, industrial control, instrumentation, security alarms, and peripheral computer devices. The
STM32 microcontroller, known for its robust arithmetic processing capabilities, flexible software
programming, low power consumption, compact size, low cost, and mature technology, is extensively
used in various fields. Given our familiarity with this chip, we have chosen the STM32 for the system
control section.
Figure 7. STM32 Pinout Diagram
User Interface: The system employs an OLED screen to display monitored environmental data,
allowing users to intuitively view environmental monitoring data and trends.
Sensor Connection Method: The system utilizes the DHT11 digital temperature and humidity
sensor, BH1750FV light sensor, GP2Y1010AUOF optical dust sensor, and MQ-7 CO sensor for
environmental monitoring. The DHT11 and GP2Y1010AUOF sensors are directly connected to the
GPIO ports of the STM32 microcontroller. The BH1750FV sensor is connected via the I2C interface,
and the MQ-7 sensor is connected through the analog input port.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240656
151
Power Supply: The system uses a 5V voltage regulator and an external power adapter to ensure
stability and reliability.
Overall, the control system design of this environmental monitoring system has thoroughly
considered various factors, enabling efficient and stable operation and accurate and reliable data
collection and processing.
The control system of the environmental monitoring system is designed for efficient and stable
operation, ensuring that the system operates reliably through proper circuit design and programming,
while guaranteeing the accuracy and reliability of monitoring data. Data collection is achieved with
appropriate sensor measurement algorithms to ensure accurate and reliable environmental monitoring
data, providing a solid foundation for subsequent data analysis and applications. Monitoring data is
displayed on the OLED screen, allowing users to clearly understand environmental data and trends. In
summary, the control system of this environmental monitoring system is aimed at efficient and stable
operation, ensuring the accurate and reliable collection and processing of environmental monitoring data,
and presenting it in a user-friendly manner to meet various data transmission needs.
5. Conclusion
This environmental monitoring system, based on the STM32 platform, integrates monitoring functions
for CO, PM2.5, temperature and humidity, and light intensity. It employs high-precision sensors and
optimized measurement algorithms to ensure accurate and reliable data collection and processing. The
control system uses an STM32 microcontroller, equipped with an OLED screen and various data
transmission methods, to provide an intuitive user interface and diverse data transmission options.
Experimental validation confirms the system's advantages in real-time performance, high precision, and
low power consumption, demonstrating its broad application and promotional value. Future
improvements will focus on enhancing design and performance, increasing system flexibility and
scalability, and providing more comprehensive technical support for the environmental monitoring field.
The system can be widely applied in households, industrial settings, medical environments, and public
spaces to help monitor and improve indoor and workplace environments, thus enhancing quality of life
and health levels.
Future Prospects: Consider incorporating additional environmental parameters such as noise and
vibration to meet diverse monitoring needs. Further optimization of the control system design could
enhance scalability and reliability, enabling additional functions and application scenarios. Integration
with IoT technology for remote monitoring and control can be explored to offer more application
scenarios in smart cities and smart homes.
Improvements: While the system currently collects and stores various environmental parameter data,
future development will focus on data processing and analysis to extract valuable information and
patterns. Incorporating technologies such as artificial intelligence and big data to build models and
algorithms for analyzing and predicting environmental data can increase the application value of
monitoring data.
System Reliability and Maintenance: The reliability and maintenance of the environmental
monitoring system are crucial for future development. Technologies such as automatic calibration and
fault diagnosis can be introduced to enhance system reliability and stability. Maintenance can be
streamlined through remote upgrades and automatic fault alarms, reducing manpower and time costs.
Application Scenarios and Market Prospects: The current application scenarios of environmental
monitoring systems include smart homes, industrial production, and urban environmental protection.
There is a broader market potential for future applications. Market research and technological innovation
can uncover new application scenarios and business models, expanding the system's market prospects.
System Security: Given the sensitivity of the data collected by the environmental monitoring system,
ensuring system security is essential. Measures such as encryption and access control can be
implemented to protect data confidentiality and integrity, preventing data leaks and tampering.
In conclusion, continuous technological innovation and application expansion are necessary for the
future development of this environmental monitoring system. Enhancing system performance and
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240656
152
reliability while addressing market trends and application scenarios will contribute significantly to the
advancement of the environmental monitoring field.
Yuhe Tie and Peiming Chen: Conceptualization, Methodology, Software, Resources, Writing-
Original draft preparation, Visualization, Investigation.
References
[1] Sengupta, A., & Sharma, P. (2020). Design and Development of Environmental Monitoring
System Using IoT Technology. International Journal of Engineering Research & Technology,
13(2), 45-50. https://www.ijert.org/
[2] Khan, M. M., & Qureshi, I. A. (2018). STM32 Microcontroller for Industrial Applications.
International Journal of Computer Applications, 182(29), 26-32. https://www.researchgate.
net/
[3] Zhang, Z., Liu, X., & Xu, L. (2019). A Review of Environmental Monitoring Technologies for
IoT Applications. Sensors, 19(24), 5411. https://www.mdpi.com/
[4] Wang, Y., & Xie, M. (2021). Smart Environmental Monitoring Systems Using Sensor Networks:
A Survey. Sensors, 21(10), 3401. https://www.mdpi.com/
[5] Kang, S., Kim, Y., & Park, S. (2021). “Design and Implementation of an IoT-Based Temperature
and Humidity Monitoring System Using STM32 Microcontroller.” Sensors, 21(1), 141.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240656
153
Spacecraft design for interstellar travel
Leyan Ouyang
Department of Physics, King’s College London, London, WC2R 2LS, United
Kingdom
k22038659@kcl.ac.uk
Abstract. This paper endeavours to introduce and elucidate potential mechanisms aimed at the
development of a spacecraft that is not only sustainable but also optimized for interstellar travel
to the Andromeda galaxy, with the primary objective of scouting for potentially habitable
exoplanets suitable for human colonization. The conceptual framework encompasses a
comprehensive analysis of various critical components essential for the functionality and
longevity of the spacecraft, including but not limited to propulsion systems, attitude control
mechanisms, and advanced navigation systems. In addition, habitable areas which are also
called Goldilocks’ zones refer to the areas around stars where planetary conditions are
conducive to foster lives. As the existence of liquid water is known as the fundamental
prerequisite for supporting life, the basic criteria for a habitable planet is temperature
appropriate for water sustaining. The final analysis shows that space immigration is a
determined consequence of human being and people should take effective measures to
investigate future possible ways of immigrating to another planet.
Keywords: Interstellar travel, Space vehicles, Navigation, Habitability.
1. Introduction
A breakthrough in astronomy was the development of the theory of stellar evolution throughout the
20th century. It was Ejnar Hertzsprung and Henry Norris Russell who first plotted the stars’
temperatures against their brightness [1]. This method stimulated the investigation of stellar evolution
theories. The discovery that stars evolve over time became the primary impetus for future interstellar
travel projects. As a main sequence star ages, it uses up all its hydrogen in the core and starts fusing
helium. During this process, the star will go through a bunch of changes in its internal structure. As a
result, the star becomes hotter, larger, and brighter as it leaves its main sequence phase [2]. The sun is
currently in its middle age of main sequence life, so it will become hotter and much larger as a red
giant. Unfortunately, earth's orbit will be swallowed by that future red giant, and thus human beings
need to find a new home before it happens. This implies that human must address the question of how
to travel between planets in different solar systems.
In this paper, I will introduce several important questions to consider when designing spacecrafts
capable of intergalactic travel, as well as methods to address them. How will the spacecraft hold
enough power for long term travel? How will the spacecraft control its orientation? How will the
spacecraft determine its position and velocity relative to celestial objects? How can the spacecraft
identify solar systems to refuel at during the trip, and find habitable planets to immigrate to? Possible
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
154
Answers to all the above questions will be discussed in this paper. Most the spacecraft’s energy supply
will come from photoelectric panels. Photoelectric panels can convert solar radiation to usable electric
energy. The attitude control system is based on the solution to cat falling problem where the spacecraft
can turn without an external force exerting on it. The navigation system relies on telescopes that can
observe radiations from other star systems to determine the identity and position of the star relative to
the spacecraft.
2. Design Overview
The objectives of the design of energy, attitude control and navigation systems on the spacecraft are to
be sustainable for long distance interstellar travel. This assumes of taking the solar system as an
example of any generic star-planetary system that the ship will encounter during the journey. On
another hand, as star-planetary systems are extremely sparsely scattered in the universe, compared to
their sizes. Thus, when the spacecraft is travelling through interstellar space, it can glide through with
very little energy consumption without the need to accelerate, decelerate or steer.
3. Power Source
3.1. Introduction of Power Source
The power system is inevitably one of the most important parts of the spacecraft. To be sustainable for
long distance journeys, the power system of the spacecraft must be renewable and durable for long
travel distance. Although sending human astronauts far away require great amount of energy, the
universe itself can surely shoulder the energy supply as it is where all types of high energy objects
exist and output power flux into the surrounding space. The only real problem about power supply is
the way to make use of the nearly unlimited energy from the universe. The total energy is conserved in
the space, but scattered in various forms. The challenge of a long-term travel spacecraft is to gather
enough energy when approaching a nearby power source and to be able to consume less when drifting
through the vast space between two sources. This idea is the core to the design of power system. The
main strategy for lengthening the life span of power usage is to gather more energy from the starting
point and consume less until other energy source is reached, so that energy provide is enough for
another long-term travel. Figure 1 shows a circular relationship between “energy gathering” and
“energy consuming”. The spacecraft is going to work in a combination of the two statuses.
3.2. Energy Gathering
Figure 1. The visual circulation between energy gathering and energy consuming.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
155
Figure 2. Schematic diagram of the solar system where the astronomical objects are presented to scale.
The vital problem for energy gathering is fuels. Though the universe serves as the most abundant
energy reservoir, it is also vast and vacant in most space which raises the difficulty of finding available
energy source. Large amount of non-renewable energy exists as patches of liquid or solid. They
attracted each other and formed planets that occupied a very tiny space. In fact, the percentage is so
small that it is even hard to put the planets into a graph by scale. Figure 2 is a schematic diagram of the
solar system where the astronomical objects are presented to scale. The radius of the cylinder is thirty
arbitrary units which represents the distance between the outermost planet of the solar system,
Neptune, and the sun. However, it is even hard to see the biggest object in solar system, the sun, in this
cylindrical space. All that can be seen is vacancy. However, by magnifying the radius of the planets
and contracting the actual lengths between them, a presumable image of the solar system which is not
to scale can be drawn, in Figure 3 for instance.
Figure 3. Schematic diagram of the solar system where the astronomical objects are not to scale.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
156
Table 1. Sizes and distances of the solar system [3].
Astronomical
Object
Distance to the sun
(AU)
Distance to the sun
(km)
Diameter (km)
Sun
-
-
1,391,400
Mercury
0.39
57,900,000
4,879
Venus
0.72
108,200,000
12,104
Earth
1
149,600,000
12,756
Mars
1.52
227,900,000
6,792
Jupiter
5.2
778,600,000
142,984
Saturn
9.54
1,433,500,000
120,536
Uranus
19.2
2,872,500,000
51,118
Neptune
30.06
4,495,100,000
49,528
The measured sizes of stellar objects and distances between them are listed in table 1. Using Table
1, it is simple to estimate the ratio between the volume of the astronomical objects and the volume of
vacancy in the solar system, represented by R. The asteroids and other astronomical units are relatively
little and hard to add together, so they are neglected in the calculation.

 (1)
󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜 (2)
 (3)
The number implies that there is approximately one cubic kilometer of real substance in a million
cubic kilometers of vacant space on average in solar system. In the vast universe this represents the
difficulty of finding non-renewable energy like coal and natural gas. Comparing to other types of
energy, solar energy is the easiest to find under such conditions. This implies that travel is going to
rely mostly on collected solar energy.
Figure 4. Structure of the outer layer of the space craft and the photovoltaic panel.
Photovoltaic technology is the most often used technology that can convert solar radiation into
electrical energy [4]. To absorb the stellar radiation, the outer most layer of the space craft will be
covered in photovoltaic panels which can be exposed to the sun. However, while traveling the outer
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
157
most layer of the space craft can possibly be deteriorated by other forms of disturbance. Solely
constructing the outer layer of the space craft by photovoltaic panels can expose the panels to potential
hazardous collision and lead to fatal consequences. To protect the photovoltaic panels, the outer layer
should be flexible which exposes the huge layer of photovoltaic panels to the star radiation whenever
conditions are met for solar energy collection. Figure 4 gives a visual graph of a possible designation
of the solar panels and the protective outer layer of space craft which the outer layer moves away
while leaving the solar panels facing the star.
Figure 5. A model for photovoltaic panels around the space craft.
The spaceship can take any orientation in space and has a spin itself so that radiation of a star can
come from multiple directions when the spacecraft is trying to stay in orbit with a star. The design of
space craft must put several photovoltaic panels in each direction to absorb the energy from different
direction. Another reason for such a design is for the consideration of a multiple star system. If the
journey has encountered a multiple star system, solar radiation can come from multiple directions.
Assuming that the space craft is in the shape of an ellipsoid, there should be panels at all angles.
Figure 5 provides a possible model for the arrangement of photovoltaic panels. There are sixteen
photovoltaic panels in total with 2 main panels covering the top and bottom and others staying around.
3.3. Energy consuming
The gathered energy from the stars can be consumed for several different use. The major expenditure
of consumable energy is propulsion. The spacecraft need propulsion system to accelerate the body and
switch directions. Other expenditures of energy include daily supply of electricity and other operating
systems on the space craft. The consumption of energy is hard to evaluate by exact numbers due to the
high uncertain conditions of space traveling. For instance, the actual distances to travel between stars
have huge uncertainties. The number of lives and resources on the journey is also a question that must
not be overlooked because the mass of the spaceship has an indispensable effect on energy
consumption. There are too many indeterminable things while trying to evaluate energy consumption.
However, one thing to be sure of is the capacity of stored energy should be big enough for long term
journey.
4. Attitude Control System
Changing altitude in space is a hard task especially when the spacecraft is trying to work in space.
However, it is vital in space traveling to avoid crashing into a star or with an astronomical object.
Losing air as medium implies that there is no outer force that can help the spacecraft to switch
directions. The key techniques to the design of the direction system is hidden behind the falling cat
problem.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
158
Figure 6. Photo in 1969 that helped explain the falling cat problem [5].
In 1882, French Scientist John Marley invented the first Chronophotographic Gun, and he posted a
video of cat falling to the public. This raises scientists’ attractions to cat falling problems. Cats can
automatically adjust their positions in the space and land on their claws during the process of falling to
the ground no matter which way they faced. The cat falling phenomenon was incomprehensible while
considering the law of conservation of angular momentum. It is impossible for a cat to change their
directions in the sky without an external torque acting on it. It was not until 1969 that Thomas Kane,
an engineer from Stanford, discovered the physical theory behind the phenomenon. He presented the
solution to the problem by splitting the cat’s upper and lower body into two cylinders and modeling
the fall using computer [6]. Figure 6 is the photo that helped NASA to investigate the problem of
astronauts switching direction in space in 1969.
Figure 7. Model that simulates the spacecraft’s initial status before direction changes.
The trick that cats use when they are falling is the way their body react. Firstly, they tighten their
upper legs to their bodies and extend their lower legs. While the upper body spins for, for instance,
190 degrees clockwise, the downer body must spin counterclockwise for 10 degrees due to the
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
159
conservation of angular momentum. In the next step, they tighten their lower legs to their body and
spin their downer bodies for 190 degrees in clockwise. Similarly, their upper bodies must spin
counterclockwise for 10 degrees to cancel out the torque. In this way they can maintain the
conservation of angular momentum without being subjected to external torque [7]. The rule could also
be applied to the orientation controlling system of the space craft. When the space craft is intended to
adjust its orientation, the front part of the spacecraft decrease its moment of inertia by contracting the
radius of spacecraft and move an angle that’s a little greater than the intended angle while the back
part of the spacecraft increase its moment of inertia and turn in the opposite direction in a small angle.
After the front part gets to its later position, it increases its moment of inertia, and the back part
decreases its moment of inertia and to turn the back body to the intended angle. The front part of the
body will experience a torque in the opposite direction and finally get to the intended angle. Figure 7 is
a monitored model that simulates the spacecraft’s initial status before altitude changes. The entire
process of changing altitude in space is monitored. Figure 8 presents the first working process of
altitude controlling. Figure 9 shows the second working process of altitude controlling. Figure 10
gives the third working process of altitude controlling. Figure 11 demonstrates the final working
process of altitude controlling.
Figure 8. The first working process of altitude controlling.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
160
Figure 9. The second working process of changing directions.
Figure 10. The third working process of changing directions.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
161
Figures 11. The fourth working process of changing directions.
The final angle that the spacecraft has turned is given by equation (4).
(4)
5. Navigation System
In space, it is essential to know where the spacecraft is located and where the spacecraft is heading to.
The way people determine where they are is by looking at signs and landmarks around. However, it’s
not simple to locate where exactly the spacecraft is at in space where it is hard to see things other than
asteroids and some close astronomical objects by eye. The key to determine the location is to find
“landmarks” as objects of reference. The most effective objects of reference in space are stars. Stars
emit strong radiations that can be captured by telescopes. The vast space itself provides eminent
conditions for telescopes to work. Telescopes can be set up to face different directions of the sky and
determine which direction is the radiation coming from. The precondition of using stars as
“landmarks” is that radiation from different stars do differ from each other so that can be recognized.
In fact, the precondition is already discovered to be true.
Figure 12. Absorption and emission spectra of different elements [8].
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
162
Different stars have different compositions which distinguish the characteristic features in their
spectrum. When radiation from the star enters a pile of cold gas, elements in the gas may block a
certain number of frequencies and absorb light. A pile of hot gas may also emit radiation with
characteristic emission lines as the emission spectrum. Absorption and emission are keys for scientists
to determine the compositions of a star. Figure 12 provides the absorption and emission spectrum for
several elements.
Figure 13. Optical spectra for Proxima Centauri at 1 AU from the star [9].
Figure 13 is the optical spectra for Proxima Centauri at 1 AU from Proxima Centauri. For instance,
if the spacecraft is 1 AU from Proxima Centauri, the telescope is probably going to discover a similar
spectrum from the direction where Proxima Centauri stays. Similarly, telescopes will capture spectrum
from other stars no matter where the spacecraft stays at to navigate the spacecraft.
6. Habitable Zones
It is essential for the spacecraft to maintain at a radiation level where the outer shell of the spacecraft
can provide shelter from the radiation. That is why a safety zone must be calculated before reaching
the star system. Habitable zone is a reasonable level for the spacecraft to stay at. To make the
photoelectric planes on the spacecraft to gain enough star radiation from the star to turn into energy
and at the same time to gather information from more Earth-like planets for possible future
immigration and more resources.
The habitable zone of a star means that the equilibrium temperatures of the planets surrounding the
star in that certain range of distance are possible for water to maintain as liquid on the planets. To
account for this and other greenhouse effects, the range of equilibrium temperature of a planet inside
the habitable zone should in between 175K to 270K [10]. The stellar luminosity is given by the
Stefan-Botltzmann law and flux reaching the planet from its distance to the star d. and are
radius and surface temperature of the star respectively and is the Boltzmans constant.
 (5)
 󰇡
󰇢 (6)
When a planet reaches its equilibrium temperature, the planet has the same amount of absorbed
radiation and emitted radiation [11]. Which means that the rate of energy absorption is equal to the
rate of energy emission. The following equations give the absorption rate and re-radiation rate. A and
are the albedo and radius of the planet.  is the equilibrium temperature of the planet.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
163

 󰇛󰇜 󰇛󰇜󰇡
󰇢 (7)

  (8)
By combing the equation (7) and equation (8), the function of equilibrium temperature can be
expressed by equation (9).
 󰇛󰇜 󰇡
󰇢 (9)
A function for the distance to the star, d, is derived in equation (10).

󰇛󰇜
 (10)
Figures 14-16 provide an estimation for both outer and inner boundaries of habitable zones.
Figure 14. Outer and Inner Boundaries of Habitable Zones in respect to Star’s Radius and
Temperature.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
164
Figure 15. Outer and Inner Boundaries of Habitable Zones in respect to Star’s Radius.
Figure 16. Outer and Inner Boundaries of Habitable Zones in respect to Temperature.
In Figure 14-16, The figure of distance from the star to outer and inner boundaries of habitable
zones in respect to radius and temperature of the star. The upper plane shows the distance from the
stars to the outer boundaries, and the lower plane shows the distance from the stars to the inner
boundaries. The range of habitable zones increase as the radius and temperature of the star increase.
However, the inner boundary of habitable zone goes farther way from the star as radius and
temperature increase. This implies that for a growing main sequence star the habitable zone of the star
will gradually move farther away from the star. Which also implies that Earth will one day be
inhabitable for human to live. Immigration and space travel are a determined future task.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
165
7. Conclusion
Space immigration will be an unavoidable topic in the future for long-term development of
humankind. According to part four, the boundaries of the habitable zone of a main sequence star will
expand as it ages and its temperature and radius will increase. During the process of stellar evolution,
the Earth will one day be out of the habitable zone of the sun. This will result in the mass extinction of
all live on Earth if humans don’t have the ability to travel to other stellar systems by then. As
technology develops, interstellar travel will no longer be purely theoretical. This paper established
some potential designs of future interstellar travel programs and explained some of the theories behind
them. Future investigations can be based on each subpart of the paper to create a spacecraft suitable
for interstellar travel. This paper provided several solutions to the problems central to interstellar
travel, as well as methods to monitor the boundaries of the habitable zones of stars. The proposed
solutions to power, attitude control and navigation systems include uses of photoelectric panels,
solving the falling cat problem and identification of nearby stars. The approaches outlined in this paper
to construct a viable spacecraft capable of intergalactic travel are important because travel between
galaxies will be unavoidable if human beings wish to survive in this universe long-term.
References
[1] Christensen, L.L., Hainaut, O., & Pierce-Price, D.P. (2014). What Determines the Aesthetic
Appeal of Astronomical Images. CAPjournal. No.14: 2024.
[2] Perryman, M. (2021). Stellar Structure and Evolution. Fundamentals of Astrophysics. Choice
Reviews. https://doi.org/10.5860/choice.27-6327
[3] California Institute of Technology. (2019). Solar System Sizes and Distances Reference Guide.
https://www.jpl.nasa.gov/edu/pdfs/scaless_reference.pdf
[4] Fares, M.A., Atik, L., Bachir, G., & Aillerie, M. (2017). Photovoltaic panels characterization
and experimental testing. Energy Procedia, 119, 945-952.
[5] Crane, T. (2021). The ‘falling cat’ phenomenon that helped NASA prepare astronauts for zero g
ravity, 1969. https://www.libraryhistt.com/2022/10/the-falling-cat-phenomenon-that-helped.
html
[6] Kane, T.R., & Scher, M. (1969). A dynamical explanation of the falling cat phenomenon.
International Journal of Solids and Structures, 5, 663-670.
[7] Essén, H., & Nordmark, A.B. (2018). A simple model for the falling cat problem. European
Journal of Physics, 39.
[8] M.Richmond, Rochester Institute of Technology. spiff.rit.edu/classes/phys301/lectures/spectra/s
pec_rev _orientation.gif
[9] Meadows et al. (2018). Proxima Centauri Spectrum. vpl.astro.washington.edu/spectra/stellar/pro
xcen.htm
[10] Kaltenegger, L., & Sasselov, D.D. (2011). EXPLORING THE HABITABLE ZONE FOR
KEPLER PLANETARY CANDIDATES. The Astrophysical Journal Letters, 736.
[11] Del Genio, A.D., Kiang, N.Y., Way, M.J., Amundsen, D.S., Sohl, L.E., Fujii, Y., Chandler,
M.A., Aleinov, I., Colose, C.M., Guzewich, S.D., & Kelley, M. (2018). Albedos,
Equilibrium Temperatures, and Surface Temperatures of Habitable Planets. The
Astrophysical Journal, 884.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240518
166
Review on application of fractional Fourier transform in
Linear Frequency Modulation signal and communication
system
Zhuoran Wang
Leicester International Institute, Dalian University of Technology, Panjin, 116024,
China
2076334726@qq.com
Abstract. Traditional Fourier transform often apply to analyze and process stationary signals,
however, it is weak for time-varying non-stationary signals, and fractional Fourier transform
(FRFT) can better solve such problems. The FRFT can be comprehended as the expressive
methods on the fractional Fourier domain constituted by the spinning coordinate axis of the
signal anticlockwise about the origin at arbitrarily Angle in the time-frequency plane. In this
paper, the improved fractional Fourier transform is combined with other calculation methods to
achieve high precision estimation of chirp signal parameters. And the communication system
built on weighted fractional Fourier transform and discrete fractional Fourier transform is
studied and simulated respectively, which verifies the feasibility and improves the anti-jamming
and anti-interception ability of the communication system.
Keywords: communication system, fractional Fourier transform, chirp signal.
1. Introduction
Namias first proposed the theory of fractional Fourier transforms in 1980. He came up with this idea
from a mathematical point of view and applied it to the solution of differential equations. Then Mcbride
et al. made a stricter definition based on Namias and expressed the fractional Fourier transform in
integral form [1]. In 1993, Mendlovic and Ozaktas broke through the boundaries of mathematical
research and implemented fractional Fourier transforms with optical methods, which have been widely
used in optical signal processing [2]. However, because fractional Fourier transform has no strict
physical meaning and fast implementation algorithm, it has a lot of potential in the area of signal
processing, but it can not be fully utilized. In 1993, Almeida clarified its physical meaning, that is,
fractional Fourier transform is the traditional Fourier transform to do a certain Angle rotation in the
time-frequency plane, which essentially includes the information of the signal in the time domain and
the frequency domain, so it is a time-frequency analysis method [3]. In 1996, Ozaktas and other scholars
proposed a discrete algorithm of fractional Fourier transform, which has a very small computational
load, only equivalent to that of Fast Fourier Transform Algorithm (FFT) [4]. Since then, fractional
Fourier transform has drawn the attention of scholars in the area of signal handling at home and abroad,
plenty of study results have gradually emerged. Compared with traditional Fourier transform, fractional
Fourier transform is more flexible and has been applied in many aspects. Such as time-frequency
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240504
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
167
analysis, time-frequency filtering, quantum mechanics, artificial neural networks, sweep filters, optical
image processing, etc.
In 1981, French geophysicist Morlet found in the analysis of artificial seismic exploration signals
that such signals should have a high resolution in the low frequency band, but the frequency resolution
can be low in the high frequency band. It is because of this feature of seismic signals, Morlet proposed
the concept of wavelet transform and gave a definition [5]. The traditional Fourier transform handle
non-stationary signals has flaw. It can only know what frequencies a signal includes in usual instead of
confirming when each component appears. So two signals which are very disparate in time domain may
have the same spectral pattern. Since the signal can have different resolution in different positions of the
time-frequency domain plane after wavelet transform, the signal can be analyzed by wavelet transform
in multi-resolution. Because of its multi-resolution characteristics, the signal has very good
time-frequency localization characteristics, which can make the signal from coarse to fine, more
convenient for signal analysis and observation. This overcomes the shortcomings of the traditional
Fourier transform. Thus, in recent years, wavelet transform not only has significant theoretical research
results, but also has applied to lots of engineering fields, such as signal processing, speech recognition,
image processing analysis, analytical chemistry, biomedicine, etc. Scientists' research on wavelet
transform has not stopped because of the wide application, and the continuous in-depth study of its
theory will bring new applications to various fields.
This paper presents some fundamental theorems about Fourier transform, and studies some recent
applications of fractional Fourier transform in secure communication and parameter estimation of chirp
signals.
2. Relevant theory
2.1. Definition
f(t) is a periodic function of t if t satisfies the Dirichlet condition: If f(x) is continuous or has only a finite
amount of discontinuities of the first kind in a period of 2T, and f(x) is monotonic or can be separated
into finite monotonic intervals, then the Fourier series of F (x) with period of 2T converges, the function
S (x) is also a periodic function with period of 2T, and it is finite at these discontinuities; It has a finite
amount of extreme points in a period; Absolutely integrable.
The fourier transform of x(t):
󰇛󰇜󰇟󰇛󰇜󰇠󰇛󰇜
 dt (3.1.1)
Inverse transform:
󰇛󰇜1󰇟󰇛󰇜󰇠 1
2󰇛󰇜
 d (3.1.2)
󰇛󰇜 : the image function of 󰇛󰇜
󰇛󰇜 : the preimage function of 󰇛󰇜
2.2. Deduction
(1) Fourier serise of Periodic function are defined as (3.2.2):
󰇛󰇜2
 (3.2.1)
󰇛󰇜0
2 󰇡2
2
󰇢
1
(For real-valued functions) (3.2.2)
Fourier expansion coefficient:
1
󰇛󰇜2
2
2 (3.2.3)
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240504
168
periodic signals can be expanded into a Fourier series only if the Dirichlet condition is satisfied.
The Dirichlet condition is defined as follows:
A continuous or finite amount of discontinuity points of the first kind during a period.
The quantity of maximum and minimum values in a period should be finite.
Within a period, the signal is absolutely integrable.
Now assume that a function f(t) is made up of a direct current(DC) component and several cosine
functions, as shown in equation (3.2.4).
󰇛󰇜0󰇛󰇜
1 (3.2.4)
Using the sum difference product formula of trigonometric functions, the above equation can be
deformed to (3.2.5):
󰇛󰇜0󰇟󰇛󰇜󰇛󰇜󰇠
1 (3.2.5)
Assume is:
ancncos (3.2.6)
bncnsin  (3.2.7)
Then formula (3.2.4) can be written:
󰇛󰇜󰇟󰇛󰇜󰇛󰇜󰇠
1 (3.2.8)
Formula (3.2.8) is actually an expansion of the Fourier series, and it can be seen that if you want to
expand a periodic signal into the Fourier series form, you are actually determining the series .
Multiply both sides of equation (3.2.8) by an 󰇛󰇜 and integrate them over one period.
󰇛󰇜󰇛󰇜
󰇛󰇜
󰇛󰇜󰇟󰇛󰇜󰇛󰇜󰇠
 
(3.2.9)
Equation (3.2.9) can be further simplified as:
󰇛󰇜󰇛󰇜
0󰇛󰇜2
0
2 (3.2.10)
So it can be concluded that:
2
󰇛󰇜󰇛󰇜
0 (3.2.11)
in the same way:
2
󰇛󰇜󰇛󰇜
0 (3.2.12)
(2) discrete-time Fourier transform (DTFT)
For a sequence of numbers with domain Z, let 󰇝󰇞
be one of the series. DTFT can be defined
as:
󰇛󰇜=
 (3.2.13)
Inverse transform:
1
2󰇛󰇜

 (3.2.14)
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240504
169
DTFT is discrete in time domain and periodic in frequency domain. It is usually applied to analyze
the spectrum of discrete-time signals. DTFT is viewed as the inverse of Fourier series.
(3) Fractional fourier transform
In the time-frequency plane, the fractional Fourier transform actually represents the
counterclockwise rotation of the coordinate axes to obtain the fractional Fourier domain
Equivalent relationship:
 (u is fractional fourier axis) (3.2.15)
 (3.2.16)
The fractional Fourier transform 󰇛󰇜of the signal x(t) is defined as:
󰇛󰇜󰇛󰇜󰇛󰇜
 󰇱󰇛󰇜󰇛󰇜
 󰇛󰇜
󰇛󰇜󰇛󰇜
(3.2.17)
P: the order of fractional Fourier transform

: rotation angle
󰇛󰇜: kernel function
Inverse transform:
󰇛󰇜󰇛󰇜󰇛󰇜

 (3.2.18)
If the Fourier transform of a function 󰇛󰇜 can satisfy the following form:
󰇟󰇛󰇜󰇠󰇛󰇜 (3.2.19)
F: Fourier transform operator
󰇛2): eigenvalue
Hermite-Gaussian function (common fourier function):
󰇛󰇜󰇛󰇜󰇛22󰇜󰇟󰇛󰇜󰇠󰇛2󰇜󰇛󰇜󰇛22󰇜 (3.2.20)
The normalized Hermite-Gaussian function can be expressed as:
󰇛󰇜 2
4
2󰇛2󰇜󰇛2󰇜 (3.2.21)
󰇛1󰇜󰇛2󰇜
󰇛2󰇜: A Hermite polynomial of order n
The signal x(t) can be promoted as a complete set of orthogonal functions composed of
Hermite-Gaussion eigenfunctions:
󰇛󰇜󰇛󰇜
 (3.2.22)
Where the expansion coefficient is:
󰇛󰇜󰇛󰇜
 (3.2.23)
Let be the eigenvalue corresponding to the eigenfunction 󰇛󰇜.Take the Fourier transform of
both ends of equation (3.2.22), we can get:
󰇛󰇜󰇛󰇜
 (3.2.24)
Put (3.2.23) into (3.2.24):
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240504
170
󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜

 󰇛󰇜󰇛󰇜
 (3.2.25)
The kernel of the Fourier transform:
󰇛󰇜󰇛󰇜󰇛󰇜
 exp 󰇡
2󰇢󰇛󰇜󰇛󰇜
 󰇛2󰇜 (3.2.26)
The usual Fourier transform form is obtained by substituting equation(3.2.26) into eqution(3.2.25).
The eigenvalue of Fourier transform is generalized to fractional order, and the eigenvalue of
fractional Fourier transform is defined as the fractional power of Fourier transform eigenvalue. So the
kernel of fractional Fourier transform is:
󰇛󰇜󰇛2󰇜󰇛󰇜󰇛󰇜
 (3.2.27)
(4) Wavelet Transform
The theory of wavelet transform was first proposed in 1984. When handling the local features of
earthquake waves, Morlet, a French geophysicist, found that it was difficult to satisfy the demand of the
traditional time-frequency domain handling method of Fourier transform when observing the high and
low frequency characteristics of signals in practical engineering applications. Therefore, wavelet
transform was adopted for geophysical exploration, and thus the wavelet transform had its first practical
application.
The fundamental theory of wavelet transform is as follows: 1, to expand and shift the original signal;
2, the original signal is divided into a series of sub-band signals with different spatial resolutions,
different frequency characteristics and direction characteristics. The sub band signal obtained in this
way has well local features of time domain and frequency domain. So, it can overcome the defect of
Fourier analysis in handling non-stationary signals and complex images.
The signal representation of wavelet transforms and Fourier transform is a linear combination of
basis functions. The difference is that Fourier transform adopts a harmonic function with time belonging
to 󰇛) and its basis function is , while the basis function of wavelet transform is a
generating function 󰇛󰇜with compact support set, and the wavelet sequence is acquired by stretching
and shifting the generating function 󰇛󰇜.The concrete formula is as follows:
󰇛󰇜 1
1
2󰇛
) (3.2.28)
0
a: the scaling factor
b: the translation factor.
For the introduction of the concept of wavelet transform, we must first briefly introduce the classical
convolution theorem in advance, that is:
󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜

 (3.2.29)
Where represents the classical convolution operator, the superscript represents the conjugation
operation, and represents the inner product operation.
Thus, for any signal 󰇛󰇜󰇛󰇜, the wavelet transform is defined by the classical convolution
operation as:
󰇛󰇜󰇛󰇜󰇛1
2󰇛󰇜󰇜󰇛󰇜󰇛󰇜 (3.2.30)
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240504
171
3. Review
3.1. Signal parameter estimation
Limin Liu, Haoxin Li, Qi Li, huangzhi Han and Zhenbin Gao mentioned in A Fast Signal Parameter
Estimation Algorithm for Linear Frequency Modulation (LFM) Signal under Low Signal Noise Ratio
(SNR) Based on Fractional Fourier Transform [6]. The initial rotation order and interval of the LFM
signal can be determined by the efficient fractional Fourier transform algorithm, however, the
estimation error of parameters is large when SNR is low because the variation of the normalized
fractional frequency spectrum amplitude no longer shows obvious distribution law. On this basis, by
using the good anti-noise performance of the 4-order origin moment of fractional order spectrum, the
defects of the efficient FRFT algorithm can be removed, and the optimal order can be quickly estimated
under the condition of low SNR. Thus, the parameters of LFM signal with low SNR can be quickly
calculated.
In Parameter Estimation of Linear Frequency Modulation Signal Based On Interpolated Short-time
Fractional Fourier Transform and Variable Weight Least Square Fitting [7], Weihao Cao, Zhixiang Yao,
Wenjie Xia, Su Yan proposed a variable weight least square fitting (VWSF)-interpolation short-time
fractional Fourier transform (ISTFRFT) method to estimate the parameters of chirp signals. Short-time
Fourier transform (STFT) is a generally used way for time-frequency analysis of LFM signals, but its
effect is not ideal for frequency estimation of wideband signals, so it can be extended to calculate the
instantaneous frequency of LFM signals more accurately. The VWSF method is used to reduce the error
caused by the conventional least square fit method and better calculate the initial frequency and
modulation frequency of signal. Finally, by studying CRLB of initial frequency and modulated
frequency estimation, it can be obtained that VWSF-ISTFRFT method has the high accuracy of LFM
signal parameter estimation.
3.2. Safety communication
It proposed the Fractional Fourier Transform Frequency Hopping with Variable Time Wide and Fixed
Bandwidth (FrFT-FH-VTFB) system in Two-dimensional Frequency Hopping Communication System
and Performance Analysis Based on Discrete Fractional Fourier Transform [8] which is write by
Xiaoyan Ning, Dongxu Zhao, Yunfei Zhu and Zhenyi Wang. The traditional frequency hopping
communication is easy to be intercepted because of the single dimension of signal parameter hopping.
The FrFT-FH-VTFB system obtains Chirp signals with different start frequencies and time widths
through discrete fractional inversion, and realizes the 2-dimensional jump of time widths and start
frequencies. In addition, Chirp's natural spread spectrum gain in the FrFT-FH-VTFB system can not
only effectively break the signal periodicity and improve the anti-interception capability of the system,
but also has concealability in the frequency domain and can resist energy detection. Moreover, due to
the time-width parameter hopping of FrFT-FH-VTFB system, the energy of some code elements will
increase, which makes the system have better anti-fading performance and reduces the influence of
fading on system performance.
Secure communication of IRS based on weighted fractional Fourier transform [9] of Shengfeng Li,
Xin Yang and Ling Wang studies MIMO scenes with general channel Settings by introducing IRS into
MIMO communication systems assisted by artificial noise and fourth-order weighted fractional Fourier
transform (WFRFT). WFRFT can make the complex plane of the signal show different states, so that the
processed signal has strong anti-interception ability, so it is widely used in the wireless physical layer
security transmission. On this basis, the intelligent reflective surface technology can support the secure
communication of direction modulation technology based on artificial noise superposition,and improve
the security of physical layer. Because the whole signal model is difficult to solve, the block coordinate
descent (BCD)-majorization-minimization (MM) algorithm is introduced to reduce the complexity. The
Lagrange multiplier method is used to get the optimal transmission precoding matrix matrix and
covariance matrix, and an effective MM algorithm is used to get the optimal phase shift. And the
performance simulation and analysis of the algorithm verify its feasibility and good safety performance.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240504
172
Ping Gao and Yuxiao Yang studies the application of three-layer weighted Fourier transform in
secure communication in A Safe Communication System Based on Three-layer Weighted Fractional
Fourier Transform [10]. Compared with the traditional Weighted fractional Fourier transform (WFRET)
signal, Multiple Parameters Weighted Fractional Fourier Transform (MPWFRFT) signal has stronger
anti-interception capability and can better ensure the safety of signal transmission. The communication
system based on three-layer WFRFT divides the initial data into three layers by Quadrature Phase Shift
Keying (QPSK) baseband mapping, and then processes and transmits it, which can effectively improve
the confidentiality of signal transmission. On this basis, genetic algorithm is imported for iterative
optimization, and the optimal control parameter set for the simulation of three-layer WFRFT signal
modulation characteristics is obtained. The communication performance, simulation performance and
security performance of the system are simulated respectively, which verifies that the system has good
anti-parameter scanning characteristics and high security.
4. Conclusion
Linear Frequency Modulation (LFM) signal is a signal whose frequency changes linearly with time,
widely used in radar and sonar technology. In this paper, chirp signals show different energy
aggregation on the fractional Fourier domain of different orders, and the continuous Fourier transform
of signals is carried out to obtain the parameter estimation of chirp signal. With the progress of
electronic technology, the security of communication system has become one of the hot topics. By
studying the application of discrete fractional Fourier transform and weighted fractional Fourier
transform in secure communication system, the original periodicity of system signal is broken, and the
problem of poor anti-interference and anti-interception capability of traditional communication system
can be solved.
References
[1] MCBRIDE A. C. On Namias's fractional Fourier transform[J]. IMA Journal of Applied
Mathematics,1987,Vol.39(2): 159-175
[2] David Mendlovic; Haldun M. Ozaktas. Fractional Fourier transforms and their optical
implementation. [J]. Journal of the Optical Society of America. A, Optics, Image Science, &
Vision, 1993, Vol.10(9): 1875-1881
[3] Almeida, L. B. Product and Convolution Theorems for the Fractional Fourier Transform[J].
Signal Processing Letters, IEEE,1997, Vol.4 (1): 15-17
[4] M.Fatih Erden, Haldun M. Ozaktas, David Mendlovic. Synthesis of mutual intensity distributions
using the fractional Fourier transform. Optics Communications. 1996 Apr;125(46):288301.
[5] ARENS, G; FOURGEAU, E; GIARD, D; MORLET, J. SIGNAL FILTERING AND
VELOCITY DISPERSION THROUGH MULTILAYERED MEDIA[J].
GEOPHYSICS,1981, Vol.46: 419-420
[6] LIU Limin, LI Haoxin, LI Qi, HAN Zhuangzhi, GAO Zhenbin. A Fast Signal Parameter
Estimation Algorithm for Linear Frequency Modulation Signal under Low Signal-to-Noise
Ratio Based on Fractional Fourier Transform. Journal of Electronics & Information
Technology. 2021 Oct;43(10).
[7] CA0 Weihao, YAO Zhixian, XIA Wenjie, YAN Su. Parameter Estimation of Linear Frequency
Modulation Signal Based On InterpOlated ShOrt-time Fractional Fourier Transform and
Variable Weight Least Square Fitting. ACTA ARMAMENTARII. 2020 Jan; 41(1).
[8] NING Xiaoyan, ZHAO Dongxu, ZHU Yunfei, WANG Zhenduo. Two-dimensional Frequency
Hopping Communication System and Performance Analysis Based on Discrete Fractional
Fourier Transform. Journal of Electronics & Information Technology. 2023 Feb;45(2).
[9] LI Shengfeng, YANG Xin, WANG Ling. Secure communication of IRS based on weighted
fractional Fourier transform. J Huazhong Univ of Sci & Tech (Natural Science Edition). 2023
Mar;51(3).
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240504
173
[10] GAO Ping, YANG Yuxiao. A Safe Communication System Based on Three-layer Weighted
Fractional Fourier Transform. Telecommunication Engineering. 2022 Nov; 62(11).
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240504
174
The sum of four squares: An exploration of Lagrange’s
theorem and its legacy in number theory
Yifan Cheng
United International College, Zhuhai, 519000, China
r130033004@mail.uic.edu.cn
Abstract. Lagrange’s Four-square Theorem is a fundamental principle in number theory, which
states that every positive integer can be expressed as the sum of four squares. The theorem was
first conjectured by the Greek mathematician Diophantus of Alexandria in the 3rd century CE.
It was later proved by Pierre de Fermat in the 17th century, and the first published proof was
attributed to Joseph-Louis Lagrange in 1770. This paper presents a comprehensive account of
the four-square theorem in number theory, which focuses on finding integer solutions to
polynomial equations. The theorem has significantly advanced the study of Diophantine
equations. It traces Lagranges Four-square Theorem from its conjectural origins to its emergence
as a cornerstone of contemporary mathematical research. This paper reviews the proof of the
theorem and its implications, as well as its connection to modern research and applications,
highlighting its timeless relevance in mathematics. In addition, the paper reaffirms the extensive
influence of the theorem on the advancement of Diophantine equations and its ongoing
significance in elucidating the enigmas of number theory. This enhances our comprehension of
the theorem’s position in the wider story of mathematical progress, confirming its significance
in both historical and contemporary contexts.
Keywords: Lagrange’s Four-Square Theorem, Diophantine Equations, Computational Number
Theory, Quantum Computing
1. Introduction
The study of numbers and their properties is a fundamental aspect of mathematical inquiry, with the
representation of numbers as sums of squares occupying a pivotal role throughout history. This
fascination spans from the Pythagorean triples rooted in ancient geometry to the sophisticated realms of
modern number theory. Positioned at the confluence of historical curiosity and contemporary
mathematical rigor, this paper aims to explore the representation of integers as sums of squares, a
question that has intrigued mathematicians for centuries [1]. The foundation of modern number theory,
enriched by resources like NRICH and Silverman’s “A Friendly Introduction to Number Theory” [2]
[3], builds upon these ancient questions, showing their relevance in today’s mathematical challenges.
By delving into the historical evolution of this problem, from the early explorations by Pythagoras and
Diophantus to the groundbreaking proofs by Fermat, Euler, and Lagrange, it uncovers the mathematical
underpinnings and implications of such representations. Combined with a comprehensive review of the
historical literature tracing the development of sums of squares in number theory and an analysis of
contemporary mathematical texts and papers demonstrating current research and methods in the field,
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240576
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
175
this paper bridges the gap between historical insights and modern mathematical advances, providing a
holistic view of the subject matter.
2. Historical Background
The journey to express numbers as the sum of squares begins with Diophantus of Alexandria in the 3rd
century (Diophantus of Alexandria, 3rd century CE) [4,5], whose work “Arithmetica” laid early
foundations for algebra and introduced the concept of Diophantine equations—seeking integer solutions
for equations. Diophantus’s insights into equations involving squares paved the way for future
mathematical breakthroughs. The narrative advanced significantly with Pierre de Fermat in the 17th
century. Fermat proposed that every prime number of the form 4n+1 could be uniquely expressed as the
sum of two squares. This proposition, known as Fermat’s theorem on sums of two squares, opened new
vistas in understanding the nature of numbers. The story took a monumental leap with Joseph-Louis
Lagrange in the 18th century, who proved that every positive integer could be represented as the sum of
four squares. Lagrange’s proof not only underscored the significance of sums of squares within number
theory but also highlighted the analytical techniques’ prowess in addressing mathematical challenges.
Leonhard Euler contributed further by developing the Euler four-square identity, enhancing the
mathematical framework for analyzing sums of squares. Similarly, Adrien-Marie Legendre’s work,
including his three-square theorem, deepened the understanding of numbers’ representation as squares,
particularly in relation to prime numbers. These milestones by Diophantus, Fermat, Lagrange, Euler,
and Legendre have fundamentally shaped the study of number theory, especially concerning the
intriguing challenge of expressing numbers as the sum of squares. Their collective work underscores the
mathematical field’s depth, interconnectedness, and the ongoing quest to unravel the complexities of
integers.
3. Mathematical Foundations
In number theory, there are several basic concepts and notations pivotal for understanding theorems such
as the Lagrange’s four-square theorem [1][4], including:
Integers (ℤ): The set of whole numbers including positive, negative numbers, and zero.
Prime numbers: Natural numbers greater than 1 that have no positive divisors other than 1 and
themselves.
Squares: Numbers that are the product of an integer with itself. For example, 4=22 is a square.
Sum of squares: An expression that represents a number as the sum of the squares of integers.
Lagrange’s Four-Square Theorem states that every positive integer can be expressed as the sum of
four squares of integers. Formally, for any positive integer n, there exist integers a, b, c and d such
that:
n=a2+b2+c2+d2 (1)
Eulers Four-Square Identity: According to the Figure 1, this identity shows how the product of two
sums of four squares is itself a sum of four squares. Specifically, if we have two numbers expressed
as the sum of four squares:
(a2+b2+c2+d2)(e2+f2+g2+h2) (2)
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240576
176
Figure 1. The visualization of Eulers Four-Square Identity
Eulers identity allows us to express this product again as a single sum of four squares, an essential
concept for proving that the set of numbers expressible as the sum of four squares is closed under
multiplication [6]. This principle is further elucidated in texts such as Silverman’s introduction to
number theory, offering a gateway to understanding complex mathematical structures [3].
Understanding these concepts and their interrelations not only facilitates the comprehension of the
theorem’s proofs but also illustrates the elegance and depth of mathematical structures dealing with
integers and their properties.
4. Proof of Theorem
Lagrange’s original proof of the four-square theorem was presented in a simplified manner, leveraging
earlier works by mathematicians like Fermat [5] and Euler [6]. A detailed step-by-step simplification of
Lagrange’s proof would require a deep dive into complex number theory, the essence of his approach
was to show that every positive integer can be broken down into a sum of four squares, leveraging earlier
works by mathematicians like Fermat. Lagrange’s proof is notable for its methodical approach, showing
that if the theorem holds for certain types of numbers, it must then hold for all positive integers. One
key aspect of his proof involved demonstrating that if two numbers can be expressed as the sum of four
squares, then their product can also be expressed in the same form. This foundational concept is crucial
for understanding the theorem’s proof and its significance.
4.1. Alternative Proofs and Generalizations
The aim of this chapter is to examine alternative proofs and generalizations of the original theories or
conclusions. This not only demonstrates the diversity and flexibility of the original ideas but also
provides new perspectives and possibilities for further research and application.
4.1.1. Infinite Descent. Fermat famously used the method of infinite descent to prove various
propositions, which consisted of assuming there is a smallest counterexample to a proposition and then
showing that a smaller one exists, leading to a contradiction. Though not directly applied to the original
four-square theorem, this method has influenced proofs in related areas.
4.1.2. Hurwitz Quaternions. A more modern approach to understanding sums of squares involves the
algebra of Hurwitz quaternions, which are complex number systems that extend real numbers. These
quaternions provide a powerful framework for generalizing and proving the sums of squares theorems,
illustrating the deep connections between number theory and algebra.
4.2. Computational Methods in Proofs
With the advent of computers, computational methods have become invaluable in exploring the realms
of number theory, including proofs related to the four-square theorem. Computers empower
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240576
177
mathematicians to validate hypotheses on large datasets, identify patterns, and even provide proofs for
specific cases that would be unmanageable manually. These methods have not only confirmed the vast
applicability of the theorem but also opened new avenues for its exploration and application.
4.3. Applications and Implications.
The four-square theorem finds applications across various domains of mathematics and science,
demonstrating its fundamental nature:
4.3.1. Cryptography. In cryptographic systems, particularly those based on lattice problems and
quadratic forms, the ability to represent numbers as sums of squares has implications for encryption
algorithms and security protocols [7].
4.3.2. Coding Theory. The theorem’s concepts are applied in coding theory, where sums of squares are
related to error-detecting and error-correcting codes, crucial for data transmission and storage.
4.3.3. Quantum Computing. In quantum computing, the mathematical structures underlying the four-
square theorem can influence algorithms and the development of quantum error correction.
The four-square theorem, with its rich history and wide applicability, continues to be a subject of
fascination and study within the mathematical community. Its enduring legacy underscores the timeless
nature of mathematical inquiry and its relevance to both foundational research and practical applications.
5. Contemporary Perspectives
In the realm of number theory, researchers often focus on advancing the understanding of the four-square
theorem. And recent developments may include efforts to generalize the theorem to other number
systems or to explore its connections to other area of mathematics. Additionally, researchers might be
working on computational approaches to efficiently find representations of numbers as sums of squares
or investigating specific open problems and conjectures related to the theorem. Nonetheless, this paper
can lead to an understanding of the focus of the research community and the types of developments that
are likely to occur. The advent of powerful computational tools, as detailed by Crandall and Pomerance
in “Prime Numbers: A Computational Perspective,” allows researchers to test hypotheses related to the
four-square theorem on a scale not previously possible, verifying the theorem for very large numbers
and exploring its implications in computational complexity and algorithmic number theory [8].
5.1. Recent Generalizations and Computational Approaches
Recent generalizations include extending the four-square theorem to more complex structures, such as
higher-dimensional lattices or other algebraic systems. Mathematicians are also interested in similar
representations for other forms, like cubes or higher powers, and the conditions under which similar
theorems hold. These explorations are supported by advancements in computational number theory,
which Silverman and Crandall with Pomerance discuss in their respective works [3,8].
Generalizations: Research might explore extending the four-square theorem to more complex
structures, such as higher-dimensional lattices or other algebraic systems. Mathematicians are also
interested in similar representations for other forms, like cubes or higher powers, and the conditions
under which similar theorems hold.
Computational Number Theory: The advent of powerful computational tools allows researchers to
test hypotheses related to the four-square theorem on a scale not previously possible. This includes
verifying the theorem for very large numbers or exploring its implications in computational
complexity and algorithmic number theory.
5.2. Open Problems and Conjectures:
Density and Distribution: Questions about the density and distribution of the representations of
numbers as the sum of squares, and how these properties might influence other areas of number
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240576
178
theory and combinatorics.
Connections to Other Fields: Exploring deeper connections between the four-square theorem and
other mathematical fields, such as elliptic curves, modular forms, and cryptographic algorithms,
may yield new insights and open problems.
6. Discussion
The Four-square Theorem, proven by Joseph-Louis Lagrange in 1770, stands as a monumental testament
to the beauty and depth of number theory. This theorem, demonstrating that every positive integer can
be represented as the sum of four squares, resolved a long-standing question and catalyzed a new era of
mathematical exploration. Its simplicity belies the profound implications it has for number theory and
beyond, having inspired countless mathematicians to delve into the properties of numbers, leading to
the emergence of new branches within mathematics and a deeper understanding of existing ones.
ThisStewart and Tall’s “Algebraic Number Theory and Fermat’s Last Theorem” and Weil’s historical
approach in “Number Theory: An Approach Through History from Hammurapi to Legendre” provide
context for the theorem’s impact beyond its initial proofs, demonstrating its foundational role in
algebraic number theory and its historical significance [9]. Meanwhile, Conway and Smith’s exploration
of “On Quaternions and Octonions” illuminates the deep connections between the theorem and algebra,
highlighting the quaternion algebra’s role in generalizing and proving sums of squares theorems [10].
7. Conclusion
This paper has sought to illuminate these facets, presenting a comprehensive review of the theorems
historical development, its pivotal role in advancing number theory, and the myriad ways it continues to
influence modern mathematical research. By highlighting the theorems ongoing relevance and potential
for future discoveries, it underscores the dynamic nature of mathematics, where ancient questions give
rise to contemporary challenges and innovations. In conclusion, the four-square theorem remains a
cornerstone of mathematical inquiry, a source of inspiration for both theoretical exploration and practical
application. Looking ahead, it is clear that the theorem not only constitutes a significant chapter in the
history of mathematics but also serves as a springboard for future generations of mathematicians to
explore the endless mysteries of numbers. This work provides a deeper understanding of the theorems
place in mathematical thought, reaffirming its timeless significance and the endless curiosity it inspires
References
[1] Lagrange, J. L. (1770). Demonstration d’un théorème d’arithmétique. Mémoires de l’Académie
Royale des Sciences et Belles-Lettres de Berlin.
[2] Silverman, J. H. (2020). A friendly introduction to number theory. Brown University.
[3] Stewart, I., & Tall, D. (1979). Algebraic number theory and Fermat’s last theorem. Cambridge,
MA: Cambridge University Press.
[4] Fermat, P. de (1670). Observationes ad Diophantum [Marginal notes to Diophantus].
[5] Diophantus of Alexandria. (3rd century CE). Arithmetica
[6] NRICH. (n.d.). An introduction to number theory. Retrieved from
https://nrich.maths.org/numbertheory
[7] Euler, L. (1772). De compositione numerorum ex quattuor quadratis [On the composition of
numbers from four squares]. Novi Commentarii Academiae Scientiarum Petropolitanae, 16,
64-93.
[8] Conway, J. H., & Smith, D. A. (2003). On quaternions and octonions. Wellesley, MA: A K
Peters/CRC Press.
[9] Weil, A. (1798). Number theory: An approach through history from Hammurapi to Legendre.
Paris, France: Springer.
[10] Crandall, R., & Pomerance, C. (2005). Prime numbers: A computational perspective. New York,
NY: Springer.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240576
179
The model of price of sailing ships based on Lasso regression
YueyingZhang1,3,4, XinyiZhou2,5, YingfeiWang2,6, DongminWang2,7
1Information and Computering Science, Jinan University, Guangzhou city, China
2Mathematics and Applied Mathematics, Jinan University, Guangzhou city, China
3Corresponding author
43545841780@qq.com
53196558422@qq.com
6ywang2739@gmail.com
71530846815@qq.com
Abstract. For the sample data of sailing ships and the listed price prediction of sailing ships
based on the characteristics of sailing ships found on the website, we first conducted data
cleaning on the original data obtained. In this stage, there were many missing values and outliers
in the original data. After filling the missing values with mode, We transform the classified
variables into dummy variables, and finally normalize them to convert the original data into the
training data of the model. Then, we obtained the predicted value of Listing Price (USD) through
multiple regression fitting. By calculating R2 as 0.929, it was found that the model fitting effect
was perfect, but there were too many variables due to the conversion of attribute variables to
dummy variables, so it was necessary to compress model variables to select key variables. Since
this topic is the explanation of Listing Price (USD), the coefficient of each variable needs to be
known, so tree model is not adopted. In linear model, Lasso regression mainly screens model
variables. In this case, Lasso is the main screening method. The mean square error of the listing
price predicted by the multiple regression model based on Lasso regression adjustment
parameters is 0.125, indicating that the model has high accuracy and the simulated listing price
predicted is relatively high.
Keywords: Lasso regression, Dummy variables, Multiple regression.
1. Introduction
As with many luxury goods, the price of a sailboat in the sailing market changes as the boat ages and
market conditions change. Since the COVID-19 epidemic, the consumption pattern of second-hand
sailing boats has been gradually accepted by consumers, and the second-hand sailing trading market has
gradually flourished, and the circulation demand of second-hand sailing boats is also increasing. In the
process of second-hand sailboat trading, the most difficult and important problem is the valuation of
second-hand sailboats, which is also the most relevant problem for traders. Second-hand sailboats are
different from general second-hand products, and there is a complexity of ”one boat, one condition”.
First, the price of second-hand sailboats is not only affected by their own configuration, such as model,
boat width, sail area, displacement, and other factors, but also affected by the region, market price, year
of manufacture. As a result, the price of second-hand sailboats cannot be evaluated in batches, which
reduces the valuation efficiency of the second- hand sailboat market. However, there is no complete and
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240628
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
180
reasonable pricing system in the second-hand sailing market at present. Therefore, it is urgent to find a
more accurate and reasonable valuation method and establish a sound evaluation system for second-
hand sailing market. In the age of the Internet, boaters provided COMAP with valuable economic and
research data on used sailboats sold in Europe, the Caribbean, and the United States in December 2020.
With the help of ever- advancing scientific algorithms and mathematical tools, how to efficiently analyze
and process these data and then find a suitable valuation model to determine the transaction price of
second- hand sailboats is the focus of current research.
2. Model building and solution
2.1. Data cleaning
2.1.1. Fulling the missing value
First, we took the data given in the title and the data[1, 2] we found about the characteristics of sailing
boats as the original data. In the original data, there were many missing values, and we carried out a
visual analysis of the missing values. The following figure shows the situation of missing values of each
characteristic variable. Therefore, we choose to directly delete the feature variables with more missing
values; For the characteristic variables with few missing values, we adopt the mode filling method to
process the missing values.
2.1.2. Check and deal the abnormal
After we deal with the missing value of the original data, we will find that there are still some outliers
in the data. First, we need to judge outliers, for which we use boxplot to visualize the data. Some sample
points in the sample that deviate significantly from the residual values are called outliers.
As for the outliers caused by dimensional errors in the samples, we adopt the method of dimensional
correction to deal with them. For outliers caused by other reasons, to reduce the errors in the model
training process, we adopt the method of deleting outliers.
2.1.3. Dummy variable transformation
Before screening characteristic variables, we need to convert the types of characteristic variables.
Among all variables that affect the listing price of second-hand ships, characteristic variables such as
Make, Variant and Geographic Region are disordered multi-classification variables. To quantify the data,
we usually assign values of 1,2,3,4. However, 1,2,3 and 4 have the order relation from small to large,
but in fact, there is no such size relation among classification variables, and they are equal and
independent. If 1,2,3 and 4 are substituted into the model, the result obtained is also unreasonable, so
we need to convert them into dummy variables. The value 0 or 1 reflects the different properties of the
variables.
2.1.4. Normalization
Before putting data into the training model, different characteristic variables often have different
dimensions and dimensional units, so direct input into the model will affect the final training results. To
eliminate the dimensional influence between different characteristic variables, it is necessary to conduct
standardized data processing to solve the comparability between data. The most typical method is to
conduct normalized data processing.
The normalization method adopted here is maximum and minimum normalization, that is, the
original data is linearly transformed into the range of [0,1] through linear function, and the calculated
results are normalized data. The dimensionless expression is transformed into a dimensionless
expression through transformation. The specific formula is as follow:
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240628
181
2.2. Model Preparation
2.2.1. Model evaluation coefficient: mean square error
Mean-square error (MSE) is a measure that reflects the difference between the estimator and the
estimator. MSE is a statistical measure and loss function commonly used in ML regression models, such
as linear regression. Its formula is shown in the figure:
Where yi is the true value and ˆyi is the predicted value.
In this paper, the estimator and the estimator are the listing price.
2.2.2. Adjust the compression penalty parameter λ
where
Where λ is the regulating parameter, sometimes called a hyper-parameter. λβ1 is the compression
penalty, and P is the number of arguments. Different λ will result in different mean square errors of
regression models with variables selected through the L1 regularization process. We calculate the
coefficients of λ and variables corresponding to the minimum mean square errors to determine the
optimal degree of the model.
2.3. Select characteristic variable
After the data cleaning of the original data, we get the processed data. Next, we adopt the optimization
stepwise regression and neural network to screen the characteristic variables that have a great impact on
the listing price and take the intersection of the variables screened by the two methods as the final
characteristic variable.
2.3.1. Model overview
Firstly, a multiple regression analysis model was established for n regression independent variables
x1,x2,··· ,xn and co-dependent variables Y = β0 + βixi + ε, i = 1,2,3,··· ,n. Each feature, that is, the
independent variable, has a corresponding slope coefficient βi . When we calculated the coefficient βi
through Python multiple regression analysis, we obtained the correlation and significance level of the
corresponding independent variable and dependent variable.
Then, we used Lasso regression and neural network to discard independent variables with poor
correlation and significance level and selected independent variables with strong correlation xi for
mathematical modeling again.
Meanwhile, in the process of obtaining the correlation table above, we will discuss the collinearity
between independent variables xi. If the collinearity between independent variables is strong, we will
screen out variables with relatively large characteristic parameters by adjusting α parameters in Lasso
regression. Variable parameters with relatively small mean square error are selected to build a model as
our final valuation model to explain Listing Price (USD).
2.3.2. Lasso regression model was established Lasso
Lasso regression[4, 5, 6] is a linear model, and this method is a compressed estimate. It obtains a more
refined model by constructing a penalty function, making it compress some regression coefficients, that
is, the sum of the absolute values of the force coefficients is less than a certain fixed value. It is also a
biased estimation for complex collinear data.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240628
182
For the linear regression problem with multiple variables, the fitting model is relatively complicated
due to excessive parameters. However, in order to prevent the overfitting phenomenon, the model should
be simplified as much as possible, and the majority of variables should be replaced by a finite few
variables to explain the estimated quantity. The commonly used methods for parameter selection include
sequence forward selection, sequence backward elimination, sequence forward selection and backward
elimination combination, and Lasso compression variable.
However, in this case, due to the variants of sailboats, there are too many dummy variables, and the
efficiency is too low whether the series is forward selection or backward elimination. Therefore, Lasso
compression variable model is adopted, and the coefficient of irrelevant variables is reduced to zero by
adding penalty term.
2.3.3. Concrete mathematical expression
Linear regression optimization objective:
n
β = argmin
i=0
Optimization objectives after regularization:
β = argmin
Where · 2 is the binary norm, that is, Rn in the vector space, let x = (x1,x2,··· ,xn).
2.3.4. Concrete modeling process
In the first question, the data table given in the question includes the manufacturer, sailboat model,
region, year, and price. To meet the requirements in the question, we need to analyze the sailboat
characteristics and regional economic conditions related to the price. We collected the relevant data of
the types of sailboats given in the title on the website of second-hand sailboats and collected the
economic conditions of cities in relevant regions. We decided to use a variety of GDP-related data to
express the regional economy and differentiated the sailboats in different regions by giving different
characteristic values through dummy variables. Finally, because there is no clear requirement in the title,
the data of single and double sails are combined in this question to facilitate the larger data set to have
better fitting effect in the subsequent multiple regression analysis. For sailing-related data, we collected
the waterline length LML, boat width, draft, displacement, sail area and average cargo throughput.
Shown in the following Table 1.
Table 1. HuTll data chart
LWL
Beam
Draft
Displacement
Sail Area
Average cargo
throughout
GDP
GDP per
capital
37.24
12.63
3.94
22046.0
824.0
45350000.0
2939.0
44494.0
36.06
12.99
6.07
15432.0
721.0
595000.0
57.8
13647.0
36.06
12.99
6.07
15432.0
721.0
595000.0
57.8
13647.0
36.06
12.99
6.07
15432.0
721.0
595000.0
57.8
13647.0
36.06
12.99
6.07
15432.0
721.0
595000.0
57.8
13647.0
36.06
12.99
6.07
15432.0
721.0
3150000.0
204.0
19147.0
37.07
13.02
6.23
19621.0
776.0
3150000.0
204.0
19147.0
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240628
183
Then for the newly formed number table, we use Python to carry out multiple regression analysis.
First, we standardized all the data to avoid the abnormal impact of measurement on the price of sailboats.
After standardized processing, we tabulated the data in Excel and understood the correlation between
various independent variables through the preliminary observation of the heat map followed by Figure
1:
Figure 1. Numerical variable correlation thermodynamic
For the standardized data, we conducted preliminary multiple linear regression analysis, and we
could get the parameter table as shown in the following figure to represent the correlation level between
each independent variable and dependent variable and the fitting of the multiple linear regression model.
At the same time, according to the results followed by Table 2, we know that each independent variable
has strong collinearity.
At the same time, we pass P > |t| on the income form, the significance level of sorting, through
technical processing choose strong correlation between independent variables, to build a new
mathematical model. In the specific case of this question, we chose Lasso to select the independent
variables in the question, instead of the method of stepwise regression. The reason is that, given the
unique background of the data in the question, there are many independent variables and many dummy
variables. If stepwise regression is used, there will be many data cycles, which will occupy a large
amount of storage space. Second, the existence of meaningless data loops, will slow down the efficiency
of the code. Therefore, we used this method to analyze model fitting for VIF value and R2 value, to find
out several independent variables with great correlation influence and reserve them.
Table 2. Regression coefficient analysis table
Name
coefficient
Standard error
t
P > |t|
Sail Area
-3301.72
1.31E+04
-0.252
0.801
Length
8314.47
1.76E+04
0.471
0.638
GDP2
7859.72
5756.478
1.365
0.172
GDP1
-8533.72
5757.049
-1.482
0.138
Average Cargo Throughput
-19640.00
1.00E+04
-1.962
0.050
LWL
32720.00
1.46E+04
2.24
0.025
Beam
109400.00
3.64E+04
3.009
0.003
Average GDP
-9556.25
3245.174
-2.945
0.003
Constant
377200.00
1.79E+04
21.059
0.000
Year
68790.00
2235.4
30.774
0.000
Draft
-103200.00
1.20E+04
-8.569
0.000
Displacement
67930.00
1.12E+04
6.081
0.000
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240628
184
For the selected independent variable, we can find that the multicollinearity problem in the multiple
regression analysis problem is obvious to the retained independent variable through variance expansion
coefficient. We analyze the possible collinearity problem through Lasso regression through parameter
adjustment and seek the optimal situation. Through the improved least square method and L1
regularization, we analyzed the collinearity of the data. On the one hand, we carried out ”feature
screening” for the dependent variables. On the other hand, we find a more meaningful independent
variable X with this method, which minimizes the mean square error of the model. The result is followed
by Figure 2:
Figure 2. Regularized path diagram
The specific algorithm process is shown in the figure above. Finally, the coefficient of each variable
is returned, and then the predicted value is calculated to obtain the mean square error to get the
optimization model. The predicted value is explained according to the coefficient of each variable,
namely the Listing Price.
Finally, the MSE of the optimal model is 0.125/2, so the multiple linear regression model based on
lasso regression parameter optimization has good goodness of fit. The result is followed by Table 3:
Table 3. Final parameters selected table
serial number
Characteristic variable name
15
Variant Swan 54
1
CRSSVG
16
Make Hallberg Rassy
2
Make HH Catamaras
17
Make Southerly
3
Make Discovery
18
Make Boreal
4
Make Nautor
19
Variant Pilot Saloon 48
5
Variant Series 5
20
Make Oyster
6
Make Nautitech
21
Length
7
Make Bestevaer
22
Year
8
Variant 52 Sport
23
LWL
9
Variant Atlantic 49MF
24
Beam
10
Variant SABA 50 Maestro
25
Sail Area
11
Variant 52
26
GDP
12
Variant 52F
27
AGDP
13
Variant V50 Mills
28
Europe
14
Make Outremer
29
USA
For the selected parameters in the table, the first 20 are dummy variables that have a relatively high
impact on the listing price, and their impact on the listing price is at the level of 1%−10%. The relevant
dummy variables include the manufacturer of the sailboat, the model of the sailboat variant and the
regional influence variable. The last nine are sailing-related characteristic data (such as ship width, ship
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240628
185
length, displacement, etc.) and relevant regional economic characteristics, which are all independent
variables selected in lasso regression analysis with strong correlation to listing price. For dummy
variables, the influence of region is mainly caused by regional characteristics, which will be discussed
together with regional economic factors in the second follow-up question. As for the influence of the
sailboat manufacturer and the sailboat variant on the price, we can understand the premium generated
by the brand effect. Besides, the characteristic variables related to the sailing ship itself, the length and
width of the ship, on the one hand, determines the size and habitability of the space, on the other hand,
determines the number of materials used in the hull and the more scientific design structure needed for
larger ships, so the captain and width of the ship have a significant impact on the listing price. Moreover,
considering the year of production of the ship, it also reflects the usable time of the ship, and considering
the survival characteristics of the sailing ship, the production quantity of each type of ship is limited.
Such uniqueness, like luxury goods, also has a significant impact on the change of listing price caused
by the year. Nautical miles can also indicate the range of the vessel in a refueling situation, which is
relatively specific to the buyer, so nautical miles have a significant impact on the listing price.
3. Conclusion
Compared with multiple linear regression, Lasso regression analysis adds a penalty norm L1. The
existence of the norm increases the stability of our model and makes the screening model more effective.
In the process of variable screening, Lasso controls the screening process through the hyperparameter
real lambda between (0,1) to ensure that the screening is a continuous process, while making the
screening more robust without losing the interpretability.
Lasso is suitable for the model with larger data volume and more missing values, and when the
meaningful variables are relatively limited, this kind of analysis effect is better. Because L1 norm tends
to produce sparse coefficient, Lasso regression has built-in feature selection. Meanwhile, the solution of
L1 norm is sparse, so it is more efficient in calculation when used together with sparse algorithm.
References
[1] https://www.ayc-yachtbroker.com/alliage-44
[2] https://www.yachtworld.com/yacht/2005-alliage-alliage-44-8666783/
[3] https://itboat.com/search?text=alubat+cigale+16
[4] Reducing bias and mitigating the influence of excess of zeros in regression covariates with
multioutcome adaptive LAD-lasso [J]M¨ott¨onen Jyrki;L¨ahderanta Tero;Salonen
Janne;Sillanp¨a¨a Mikko J. Communications in Statistics - Theory and Methods. Volume 53 ,
Issue 13 . 2024. PP 4730-4744
[5] Lasso regression under stochastic restrictions in linear regression: An application to genomic
data[J] Gen¸c Murat;Ozkale M. Revan Communications in Statistics - Theory and Methods.
Volume 53 ,¨ Issue 8 . 2024. PP 2816-2839
[6] High-dimensional nonconvex LASSO-type [formula omitted]-estimators [J] Jad
Beyhum;Fran¸cois Portier Journal of Multivariate Analysis. Volume 202 , Issue . 2024. PP
105303-
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240628
186
Leader-follower consensus for nonlinear multi-agent systems
under directed topology
Sicheng Lu
Shanghai Normal University, No.100 Guilin Rd. Shanghai, China
lusicheng0923@163.com
Abstract. This paper investigates the consensus problem for multi-agent systems (MASs) under
directed topology. The primary objective is to design the distributed control protocol such that
all agents can converge to the state of leader. The distributed control protocol is designed, and
we derive sufficient conditions by using Lyapunov stability theory to achieve consensus.
Theoretical analysis and numerical simulations are provided to verify the effectiveness of the
proposed control protocol.
Keywords: Multi-agent systems, Consensus, Stability , Distributed control.
1. Introduction
In the past decades, cooperative control has gradually become a research focus in the scientific
community due to the wide range of applications of multi-agent systems(MASs) in many fields, such as
biology, physics, and artificial intelligence [1]. Research of MASs mainly includes consensus, formation,
and controllability problem etc [1]. The task of consensus lies in designing a control input protocol that
enables all agents to converge to the same state in the end. However, in many real-world scenarios, the
convergence of controllers does not achieve a certain expected effect, for example, multiple unmanned
aerial vehicles need to fly to a specified speed for real-time continuous control of the environment and
so on. MASs controllability is an emerging area of research after the study of multi-agent systems
coherence. For a network of agents, external inputs are applied to the leader such that the followers reach
any expected final state from any initial state.
In terms of consensus and controllability, research on MASs is mainly divided into leaderless
consensus and leader-follower consensus [1-2]. And the key to analyze the topic is to design an input
control [3]. Lu etc. present two non-smooth leader -following formation protocols for non-identical
Lipschitz nonlinear MASs [3]. Hui Q. proposed a nonlinear consensus algorithm for first-order systems,
expressed as [4]: 󰇛󰇜 Φ󰇛󰇛󰇜󰇛󰇜󰇜
1
Stability conditions of the system under this nonlinear consensus algorithm were obtained by giving
Lyapunov method[8], and analysis was conducted on the nonlinear consensus under switching
topologies. Lin et al. study the consistency problem for a continuous-time nonlinear system and give
conclusion that the system achieves consensus if and only if the directed switching topological network
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240635
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
187
of the system have a sufficiently large connectivity range and strength [5]. Furthermore, the vector field
of each individual in the system must fall within a minimal sector made up of the individual itself and
its dependent individuals. Meanwhile the consensus problem usually involves the asymptotic stability
of the differential equation, which needs to be analyzed by means of the Lyapunov function. For
continuous time nonlinear system consensus algorithm, Moreaul designed Lyapunov function for
continuous time nonlinear system consensus algorithm [6]. As an extension of classic Lyapunov
function method, non-monotonically decreasing Lyapunov function method (NMDLF method), (Aeyels
& Peuteman, Citation1999) is applicable to complex time-varying dynamics, especially for fast time-
varying systems [7-8].
Motivated by the above analysis, the purpose of this paper is to analyze that the consensus of MASs
converges to the expected state under the condition that the designed control input protocol and the
corresponding parameters are satisfied. By giving a lemma, we transform the consensus problem for
MASs into an asymptotic stability problem for error systems. By investigating the error system, and
analyzing the asymptotic stability of the error systems, thus prove the consensus of the MASs converges
to the expected state.
Through theoretical analysis and numerical simulation, we verify the effectiveness of the control
input protocol and show the process of consensus for the MASs. The research in this paper not only
provides a solution to the consensus problem of MASs, but also provides theoretical support and
practical guidance for the design and realization of distributed control systems.
2. Preliminaries and problem formulation
2.1. Graph theory
Firstly, some notations will be given about the structure of an agent as well as definitions. 󰇛󰇜
refers to the graph of N-agents, where 󰇝12󰇞 denotes the vertex set of graph G.
󰇛󰇜 if and only if the j-th agent can receive the information of the i-th agent. Whats more,
is also the neighbourhood of so let 󰇝󰇛󰇜󰇞. Next 󰇟󰇠 is a
weighted adjacency matrix, where 0, and  0 if 󰇛󰇜0 otherwise. The
Laplacian matrix of G is that 󰇟󰇠, where  
 , and  . A
directed path from node 1 to node is equivalent to the existence of a sequence of ordered edges
󰇝󰇛12󰇜󰇛23󰇜󰇛1󰇜󰇞 in the directed graph G. If there exists a node called the root, which
has no parent node, such that the node has directed paths to all other nodes in the graph, then the directed
graph G contains a directed spanning tree. We define 󰇝12󰇞 as the attenuation
coefficient matrix associated with G, where 0 if the leader is a neighbour of i-th agent and
otherwise 0. It is assumed that the leader is self-active or moving independently. That is the
followers could receive information from the leader while the leader needs no information from any
follower.
2.2. Problem formulation
Consider the following nonlinear MASs with follower i can be described by :
󰇗󰇛󰇜12 (1)
And the dynamics for leader is described by: 󰇗0󰇛0󰇜 (2)
Where 󰇛󰇜 denotes the state variables of the N-agents,
󰇛󰇜is the state vector of the i-th agent and 󰇛󰇜represents the nonlinear
function,is the control input protocol to be designed, 󰇛󰇜 denotes the state
of the expected state vector, 󰇛󰇜 is the state of the leader which is also the
expected state.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240635
188
In order to obtain the main result, the following assumptions are needed:
The topological structure G for MASs includes a directed spanning tree with the leader being the
root.
There is a positive constant such that for any there hold
󰇛󰇜󰇛󰇛󰇜󰇛󰇜󰇜󰇛󰇜󰇛󰇜
Remark 1: Assumption 2 takes the practicality into consideration, and there are many practical
systems can meet the requirement of Assumption 2 such as chaotic systems like Chen system, the Lorenz
system, and the unified chaotic system have been verified to satisfy this assumption.
When discussing the speed of change of an agents state variables, we categorize the factors affecting
the speed of change into internal and external factors, and an objects state variables such as
displacement, velocity are often affected by the constraints of the fields inside the space in which it is
located and the effects of the external environment such as the interactions of other agents on itself, and
thus we build the above continuous time model of the i-th agent to portray the agents state variables.
But generally, the behaviour and state of these N agents are inconsistent, due to the needs of people
these N agents need to make behavior that meets the expectations, the corresponding mathematical
differential equation model shown in (2) denotes the state expected to be reached by the agents.
In order to gradually reach the expected state, each agent receives information from its respective
neighbours and passes information to its neighbours to update the state of the agent in the current
moment. The gap between the i-th agent and the other agents and the difference between each agent and
the corresponding expected state should be focused on and portrayed, so in the light of the idea, we will
give the following design of the control input function which is defined as consensus protocol:
󰇛0󰇜
112
Where is a positive constant.  is the element of weighted adjacency matrix and is the
element of . So, on the basis of consensus protocol, we can get the concrete model as following:
󰇗󰇛󰇜󰇛󰇜󰇛0󰇜
112
The coherent control problem is mathematically defined as follows: assume that the MASs contains
N agents, where the state of the i-th agent is denoted by 󰇛󰇜12, if when
we have

󰇛󰇜0󰇛󰇜012 then the MASs is said to have reached an expected state of
consensus.
2.3. Stability analysis
Firstly, an overall error function is defined as 󰇛󰇜󰇛1󰇛󰇜2󰇛󰇜󰇛󰇜󰇜 to represent the
difference from the expected state 0 at each moment .
Where 󰇛󰇜󰇛󰇜󰇛󰇜 indicates the difference between i-th agent and the expected state.
󰇛󰇜
 󰇛󰇜󰇛󰇜  is the ordinary differential equation for the i-th component
of the error function.
To investigate the consensus problem of the MASs, in other words, to certify 
󰇛󰇜0󰇛󰇜
. We need to give a lemma to establish the asymptotic stability of differential equations
error functions and the consensus of MASs is the same problem.
Lemma1: For the error function, if is nonsingular, one has 󰇛󰇜󰇛󰇜 󰇛󰇜
min󰇛󰇜
With min being the minimum singular value.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240635
189
So, if the solution of 󰇛󰇜
 󰇛󰇜󰇛󰇜  is asymptotic stable, then we can
get 
󰇛󰇜 and 
󰇛󰇜󰇛󰇜  by the Lemma1. It can be
concluded that the consensus problem of MASs can be transformed into the asymptotic stability problem
of its error systems.
3. Main results
In this section, the sufficient condition for the asymptotic stability of error systems will be given. For
the leader is globally reachable, at least one follower is connected to the leader, so 0.
Lemma2: For any , the eigenvalue for L the Laplacian matrix of G, the smallest eigenvalue is
always 0, which corresponds to the eigenvector being an all-one vector. The eigenvalue is the
algebraic connectivity degree, which reflects the connectivity of the graph.


Where the  is the maximum eigenvalue and is the second smallest eigenvalue.
Theorem 3.1. Suppose that the assumptions hold, the consensus of the system (1) (2) is achieved
under the following condition: 
󰇛
󰇝󰇞󰇜
Proof. The Lyapunov function is designed as
󰇛󰇜

Then, 󰇛󰇜
󰇗󰇛󰇜
 󰇗
 󰇛󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜
 󰇜
Since for any 󰇛󰇜󰇛󰇛󰇜󰇛󰇜󰇜󰇛󰇜󰇛󰇜
󰇛󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜
 󰇜
󰇛󰇛󰇜
󰇜
󰇛󰇜
 󰇛󰇜


󰇛󰇜



From Lemma2, one has
󰇛󰇜

 󰇛
󰇜
Thus, we can conclude that under the condition i and assumptions, the error systems are
asymptotically stable, which is 
󰇛󰇜
󰇛󰇜0󰇛󰇜. Then, the
consensus for the MASs is realized.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240635
190
4. Numerical simulation
In this section, an illustrative example will be presented to verify the effectiveness of our conclusion.
The simulations are performed in a two-dimensional space consisting of the X-direction and Y-direction
with five agents.
Give the nonlinear dynamical function of the system as follows
󰇗󰇛󰇜󰇛󰇜

The initial states of leader is: 󰇡
󰇢
And the initial states of followers are:
󰇡
󰇢󰇡
󰇢󰇡
󰇢󰇡
󰇢󰇡
󰇢
Where , the adjacency matrix and attenuation coefficient matrix is:


 

 


Figure 1. Topology graph.
Figure 2. Error in the X-direction.
Figure 3. Error in the Y-direction.
Figure 2 and figure 3 show that the error values of each agent with respect to the expected state in
the X-direction and Y-direction gradually converge to 0 over time, that is, it means that the five agents
eventually converge to the expected state consistently.
5. Conclusion
The expected consensus problem for nonlinear MASs is investigated in the paper. The topology of the
MASs is directed and the consensus can be realized. On the basis of the proposed distributed control
protocol and the Lyapunov stability theory, a sufficient condition is derived to reach the consensus for
MASs. Finally, the effectiveness of the distributed control protocol is verified by numerical simulation.
0 1 2 3 4 5
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240635
191
References
[1] Long, M. et al. (2023) Model-free algorithm for consensus of discrete-time multi-agent systems
using reinforcement learning method, Journal of the Franklin Institute, 360(14), pp. 10564
10581. doi:10.1016/j.jfranklin.2023.08.010.
[2] Gruyitch, L.T. (2007) Nonlinear Hybrid Control Systems, Nonlinear Analysis: Hybrid Systems,
1(2), pp. 139140. doi:10.1016/j.nahs.2006.10.001.
[3] Lü, J., Chen, F. and Chen, G. (2016) Nonsmooth leader-following formation control of
nonidentical multi-agent systems with directed communication topologies, Automatica, 64,
pp. 112120. doi:10.1016/j.automatica.2015.11.004.
[4] Hui, Q. and Haddad, W.M. (2008) Distributed Nonlinear Control Algorithms for network
consensus, Automatica, 44(9), pp. 23752381. doi:10.1016/j.automatica.2008.01.011.
[5] Lin, Z., Francis, B. and Maggiore, M. (2007) State agreement for continuoustime coupled
nonlinear systems, SIAM Journal on Control and Optimization, 46(1), pp. 288307.
doi:10.1137/050626405.
[6] Moreau, L. (2005) Stability of multiagent systems with time-dependent communication links,
IEEE Transactions on Automatic Control, 50(2), pp. 169182. doi:10.1109/tac.2004.841888.
[7] Aeyels, D. and Peuteman, J. (1999) Uniform asymptotic stability of linear time-varying systems,
Open Problems in Mathematical Systems and Control Theory, pp. 15. doi:10.1007/978-1-
4471-0807-8_1.
[8] Zhang, X., Chen, L. and Chen, Y. (2019) Consensus analysis of multi-agent systems with general
linear dynamics and switching topologies by non-monotonically decreasing Lyapunov
function, Systems Science &amp; Control Engineering, 7(1), pp. 179188.
doi:10.1080/21642583.2019.1620654.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/20240635
192
Retraction Agreement
Retraction Agreement
*This agreement is the official document for the retraction application
supported by EWA Publishing. *
Notes:
The retraction application should be approved by all authors listed in the
published article.
The retraction process is entirely irreversible and permanent. Authors may not
withdraw the retraction once the agreement has been executed (14 days after
receiving the retraction application).
The Statement of Retraction will be displayed on the publication website,
replacing the previous published article. The previous published article will be linked
with a revised title of “RETRACTED ARTICLE: [article title]”. All author information
will be retained in the new link.
The retracted article should not be submitted to any publications operated by
EWA Publishing.
No refunds will be issued.
If, after carefully reading the above notes, you confirm to proceed with the
retraction application, please complete the form below to initiate the retraction
process:
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0130
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
193
Retraction Application Form
Title of article
Research Progress on 2D Human Pose
Estimation Based on Deep Learning
Name of journal/proceedings
Theoretical and Natural Science
Name of volume
TNS Vol.41
Article DOI
10.54254/2753-8818/41/2024CH0130
Name of author(s) (in order)
Haoyu Liu
Name of corresponding author
Haoyu Liu
Affiliation of corresponding author
University of Electronic Science and
Technology of China
Email of corresponding author
731933957@qq.com
Reasons of retraction
(this part will be displayed in the
Statement of Retraction on the
publication website)
Under the review of my supervisor, there
are many things for improvement in my
paper, including the classification of
methods, the methods cited, and the
summary and analysis of the methods. So I
temporarily choose to retract this paper.
Thanks for your understanding.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0130
194
Please read the information below
If the retraction is disputed by the author(s):
This article will not be retracted unless all authors agree to this retraction
agreement.
If the retraction is disputed by the publisher:
This article will not be retracted until the author(s) receives the retraction
notification.
Authors have the right to contest the retraction.
The article will not be retracted within 14 days of receiving the application, and the
authors can contest the retraction during this period.
Once the article’s retraction has been executed:
The retraction will not be reversed.
The retracted article will not be accepted by any electronic or physical publications
of EWA Publishing;
It is the author’s responsibility to be aware of the information above.
The author has read all of the information above and agreed with this retraction.
Yes No
The hand-written signatures of all authors:
Date:2024/10/29
*EWA Publishing reserves the right of
final interpretation of this agreement.
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
DOI: 10.54254/2753-8818/41/2024CH0130
195