Theoretical and Natural Science: Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation PDF Free Download

Name: Theoretical and Natural Science: Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation PDF
Author: michellejames1996

1 / 205

0 views•205 pages

Theoretical and Natural Science: Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation PDF Free Download

Theoretical and Natural Science: Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation PDF free Download. Think more deeply and widely.

TNS

Theoretical and Natural Science

Proceedings of the 2nd International Conference

on Mathematical Physics and Computational Simulation

Glasgow, UK

August 9 - August 16, 2024

Volume 41

Editors

Anil Fernando

University of Strathclyde

Gueltoum Bendiab

University of Frères Mentouri

Marwan Omar

Illinois Institute of Technology

ISSN: 2753-8818

ISSN: 2753-8826 (eBook)

ISBN: 978-1-83558-493-4

ISBN: 978-1-83558-494-1 (eBook)

Publication of record for individual papers is online:

https://www.ewadirect.com/proceedings/tns/home/index

This work is fully Open Access. Articles are freely available to both subscribers and the wider public with permitted reuse.

No special permission is required to reuse all or part of article, including figures and tables. For articles published under

an open access Creative Common CC BY license, any part of the article may be reused without permission, just provided

that the original article is clearly cited. Reuse of an article does not imply endorsement by the authors or publisher.

The publisher, the editors and the authors are safe to assume that the advice and information in this book are believed to

be true and accurate at the date of publication. Neither the publisher nor the editors or the authors give a warranty,

expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been

made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This imprint is published by EWA Publishing

Address: John Eccles House, Robert Robinson Avenue, Oxford, England, OX4 4GP

Email: info@ewapublishing.org

Committee Members

CONF-MPCS 2024

General Chair

Anil Fernando, University of Strathclyde

Organizing Chair

Marwan Omar, Illinois Institute of Technology

Organizing Committee

Sharidya Rahman, Monash University

Büşra Oğuzhan, Çukurova University

Selda Kapan Ulusoy, Erciyes University

Mazhar Javed Awan, University of Management & Technology

Arshad Hassan Khan, FAST NUCES Islamabad

Technical Program Chair

Stavros Shiaeles, University of Portsmouth

Technical Program Committee

Bilyaminu Auwal Romo, University of East London

Bismark Singh, University of Southampton

Achintya Haldar, University of Arizona

Bhupesh Kumar, University of St Andrews

Altaf Khan, University of Mianwali

Yazeed Ghadi, Al Ain University

Jie Zhang, University of Bath

Mustafa Istanbullu, Çukurova University

Gueltoum Bendiab, University of Frères Mentouri, Constantine

Publicity Committee

Festus Adedoyin, Bournemouth University

Xiaolong Li, Beijing University of Posts and Telecommunications

Preface

The 2nd International Conference on Mathematical Physics and Computational Simulation (CONF-

MPCS 2024) is an annual conference focusing on research areas including mathematics, physics, and

simulation. It aims to establish a broad and interdisciplinary platform for experts, researchers, and

students worldwide to present, exchange, and discuss the latest advance and development in

mathematics, physics, and simulation.

This volume contains the papers of the 2nd International Conference on Mathematical Physics and

Computational Simulation (CONF-MPCS 2024). Each of these papers has gained a comprehensive

review by the editorial team and professional reviewers. Each paper has been examined and evaluated

for its theme, structure, method, content, language, and format.

Cooperating with prestigious universities, CONF-MPCS 2024 organized three workshops in Glasgow,

Constantine and Chicago. Dr. Anil Fernando chaired the workshop “Unlocking Video Contextual Ad

Insights: Enhancing Topic Explainability with Rich Multimodal Content Retrieval”, which was held

at University of Strathclyde. Dr. Marwan Omar chaired the workshop “Quantum Machine Learning:

Bridging Quantum Physics and Computational Simulations”, which was held at Illinois Institute of

Technology. Dr. Gueltoum Bendiab chaired the workshop “Machine Learning: Integrating Machine

Learning Techniques to Advance Network Security”, which was held at University of Frères

Mentouri.

Besides these workshops, CONF-MPCS 2024 also held an online session. Eminent professors from

top universities worldwide were invited to deliver keynote speeches in this online session, such as Dr.

Anil Fernando from University of Strathclyde and Dr. Marwan Omar from Illinois Institute of

Technology. They have given keynote speeches on related topics of mathematics, physics, and

simulation.

On behalf of the committee, we would like to give sincere gratitude to all authors and speakers who

have made their contributions to CONF-MPCS 2024, editors and reviewers who have guaranteed the

quality of papers with their expertise, and the committee members who have devoted themselves to

the success of CONF-MPCS 2024.

Dr. Anil Fernando

General Chair of Conference Committee

Workshops

Workshop – Glasgow: Unlocking Video Contextual Ad Insights: Enhancing Topic

Explainability with Rich Multimodal Content Retrieval

August 9th, 2024 (GMT+1)

Department of Computer and Information Sciences, University of Strathclyde

Workshop Chair: Prof. Anil Fernando, Professor in University of Strathclyde

Workshop – Constantine: Machine Learning: Integrating Machine Learning Techniques to

Advance Network Security

July 15th, 2024 (GMT+1)

Electronics Department, University of Frères Mentouri

Workshop Chair: Dr. Gueltoum Bendiab, Associate professor in University of Frères Mentouri

Workshop – Chicago: Quantum Machine Learning: Bridging Quantum Physics and

Computational Simulations

October 10th, 2024 (UTC -5)

ITM Department, Illinois Institute of Technology

Workshop Chair: Dr. Marwan Omar, Associate Professor in Illinois Institute of Technology

The 2nd International Conference on

Mathematical Physics and Computational

Simulation

CONF-MPCS 2024

Table of Contents

Committee Members ······························································································································

Preface ·······················································································································································

Workshops ················································································································································

Workshop

：

Machine Learning: Integrating Machine Learning Techniques to Advance

Network Security

Analyzing musical tones with fourier transformation ····································································· 1

Xilin Hong

A method to test the uniform convergence of function series ························································ 6

Zhian Wu

Workshop

：

Quantum Machine Learning: Bridging Quantum Physics and

Computational Simulations

The application of convex function and GA-convex function ······················································· 10

Dingrun Zhao

Research on Improved Crowd Detection Based on YOLOv5 ························································ 16

Qi Wen, Kecheng Li, Yue Wang

Prediction of heart disease based on logistic regression ································································ 25

Zixin Zhang

Analysis of the market value of Premier League attacker ····························································· 32

Wenji Liu

Harmonic analysis approach to the proof of Heisenberg inequality ··········································· 37

Yuchen Wang

Research on the influencing factors of student performance ························································ 43

Chenrui Pei

Analysis of the Relationship between NBA Player Salary and Their On-Court Performances 51

Zijian Yang

Schrödinger equation for various quantum systems based on Heisenberg's uncertainty

principle ················································································································································· 59

Kexin An

Analysis of the Principles of Quantum Computing and State-of-the-Art Applications ··········· 65

Zhuolun Li

Advances in monocular ORB-SLAM system: a review ·································································· 72

Ziyi Yuan

Prospects for the development of cartography through the integration of SLAM technology

with GIS technology ···························································································································· 78

Yaodong Tang

Comparative analysis of matrix factorization and graph convolutional networks in student 85

Tongye Wu

Review on VSLAM based on deep learning ···················································································· 91

Xin Shao

Intelligent assistive obstacle avoidance device based on SLAM and wearable technology ····· 98

Yang Zhang

The research on the factors affecting the World Happiness index ············································· 104

Yizhi Zong

Quantum Entanglement and Qubit Interactions: The Key to Quantum Supremacy ·············· 112

Han Zhang

Quantum Neural Networks: A New Frontier ················································································ 119

Boyu Zhang

Research on the Correlation between the Movement of the Dollar and the Price of Gold ····· 126

Yanxi Zhan

Improvement of visual servo system of industrial robot based on sliding mode control and

deep reinforcement learning ············································································································ 132

Yunzhe Zhou

The 2nd International Conference on Mathematical Physics and Computational

Simulation

Optimizing supply chain networks using mixed integer linear programming (MILP) ·········· 139

Xu Li, Xiaoheng Ji, Xiaolong Zeng

Environmental monitoring system design based on STM32 platform ······································ 145

Yuhe Tie, Peiming Chen

Spacecraft design for interstellar travel ·························································································· 154

Leyan Ouyang

Review on application of fractional Fourier transform in Linear Frequency Modulation signal

and communication system ·············································································································· 167

Zhuoran Wang

The sum of four squares: An exploration of Lagrange's theorem and its legacy in number

theory ··················································································································································· 175

Yifan Cheng

The model of price of sailing ships based on Lasso regression ··················································· 180

Yueying Zhang, Xinyi Zhou, Yingfei Wang, Dongmin Wang

Leader-follower consensus for nonlinear multi-agent systems under directed topology ······ 187

Sicheng Lu

Analyzing musical tones with Fourier transformation

Xilin Hong

School of Mathematical Sciences, Fudan University, Shanghai, 200433, China

23300180056@m.fudan.edu.cn

Abstract. This essay delves into the mathematical exploration of musical tones through the

application of Fourier Transformation, a pivotal tool in the field of digital signal processing and

acoustics. By converting complex musical tones from the time domain to the frequency domain,

Fourier Transformation enables the deconstruction of sounds into their constituent frequencies,

revealing the unique harmonic structures that contribute to the characteristic timbre of different

musical instruments. The focus of this analysis is particularly on the trumpet, chosen for its rich

harmonic content and distinctive sound. Through the examination of audio recordings, this study

uncovers the fundamental frequency and harmonics of the trumpet, demonstrating how these

elements combine to form its unique acoustic fingerprint. The process involves recording,

analyzing, and comparing musical tones using software tools like MATLAB and Python,

providing an accessible yet profound insight into the intersection of mathematics and music. This

essay not only highlights the technical methodology of Fourier Transformation in analyzing

musical tones but also explores its practical applications in music theory, digital audio processing,

and the broader field of acoustics. The findings underscore the transformative power of

mathematical analysis in understanding and appreciating the complex beauty of musical sounds,

opening avenues for further research and application in both the scientific and artistic domains.

Keywords: Fourier Transform, Musical Tones, Frequency Spectrum, Timbre Analysis

1. Introduction

Fourier Analysis, named after the French mathematician Jean-Baptiste Joseph Fourier, is a mathematical

technique that transforms a function of time, space, or any other variable into a function of frequency. It

decomposes complex waveforms into simpler components, specifically into sines and cosines, which

are easier to analyze and understand. This transformation is pivotal in numerous fields, including

engineering, physics, and, notably, music analysis [1-3].

The relevance of Fourier Analysis to music stems from its ability to dissect musical tones into their

constituent frequencies, offering a deep dive into the acoustic properties that define the unique sound,

or timbre, of musical instruments [4]. In music theory and acoustics, the application of Fourier Analysis

transcends basic tone analysis. It plays a crucial role in digital signal processing, enabling technologies

such as MP3 compression, noise reduction, and the synthesis of musical sounds [5]. For musicians and

sound engineers, Fourier Analysis provides a scientific basis for crafting sounds and understanding their

interaction in compositions and recordings. It bridges the gap between the physics of sound production

and the perception of music, offering insights into the construction of musical instruments and the

development of electronic sound synthesis and audio processing tools. Chen studied the interaction

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240109

(https://creativecommons.org/licenses/by/4.0/).

between music and vision based on Fourier transform, providing a visualization method that can be used

in musicology related research and interactive media creation methods [6]. By performing Fourier

transform on the audio of different instruments and analyzing the distribution of harmony between

instruments, the characteristics of each style were presented by Yuan [7]. Xu used discrete Fourier

transform to study the composition principle of musical notes on the original signal, and used inverse

discrete Fourier transform to generate music [8].

This paper will analyze musical tones, specifically focusing on recordings from a trumpet, piano, and

flute. The primary objective is to utilize Fourier Analysis, via MATLAB, to dissect and compare the

unique frequency signatures of these instruments. The methodology spans data collection,

preprocessing, application of Fourier Transform, and subsequent analysis.

2. Applications of Fourier Transformation

The applications of Fourier Transformation to the audio samples of the trumpet, piano, flute, piano and

triangle reveals intricate details about the acoustic properties that differentiate these instruments.

Utilizing MATLAB for the analysis, this section discusses the findings from the frequency domain

perspective, providing insights into the unique sonic signatures of each instrument.

2.1. Musical Instruments: Fundamental Frequencies and Harmonics

The Fourier Transform of each instrument’s audio recordings illuminated the presence of fundamental

frequencies corresponding to the played notes, alongside multiple harmonics that contribute to the

timbre or color of the sound.

The trumpet recordings showcased a strong fundamental frequency with a series of harmonics that

decayed less rapidly than those of the piano and flute in Figure 1. This characteristic brass “brightness”

is due to the trumpet’s ability to produce strong higher-order harmonics.

Figure 1. Time domain signal and frequency domain signal for the trumpet

Piano samples displayed a complex harmonic structure with a rich set of overtones in Figure 2. The

decay of these harmonics was more pronounced, contributing to the piano’s distinct reverberation and

tonal complexity.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240109

Figure 2. Time domain signal and frequency domain signal for the Piano

The flute’s frequency spectrum was simpler, with a clear fundamental frequency and fewer

harmonics in Figure 3. The flute’s sound is purer and more sine-wave-like, owing to the instrument’s

acoustical properties, which favor the fundamental frequency over the harmonics.

Figure 3. Time domain signal and frequency domain signal for the flute

The visualizations produced in MATLAB effectively highlighted the differences in harmonic content

among the instruments. Plots of magnitude against frequency for each instrument at various notes

provided a visual representation of the acoustic fingerprints. These plots were instrumental in identifying

the unique patterns of harmonics that define the sound of each instrument.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240109

2.2. Noise Instruments

This Other instruments like drums and triangles that don’t have a clear fundamental frequency are noise

instruments.

For drums, which produce rich and complex sounds with a broad spectrum of frequencies due to

their diverse modes of vibration, the Fourier Transform can decompose these sounds into their

constituent frequencies in Figure 4. This decomposition helps in understanding the timbral qualities of

the drum, revealing how different materials, shapes, and sizes affect the sound’s frequency content. By

analyzing the spectral content, sound engineers and instrument makers can modify and optimize drum

designs to achieve desired sound qualities, from the deep, resonant bass of a kick drum to the sharp,

concise attack of a snare.

Similarly, the triangle, despite its seemingly simple structure, produces a sound rich in overtones.

The Fourier Transform can uncover this intricate harmonic structure, showing a spectrum dense with

harmonics that contribute to its bright, penetrating quality. This analysis not only aids in the crafting of

triangles with specific tonal characteristics but also in digital synthesis and sampling, where

understanding the spectral content is crucial for recreating realistic triangle sounds in in Figure 5.

Figure 4. Time domain signal and frequency domain signal for the drum

2.3. Implications

The implications of this research extend far beyond academic inquiry, touching upon several practical

and theoretical aspects of music and sound engineering.

Understanding the frequency makeup of instruments can enhance teaching methods, providing

students with a more nuanced appreciation of music composition and instrument design. Insights into

the harmonic content and how it contributes to timbre can inform the design of new instruments or the

refinement of existing ones, aiming to achieve desired sound qualities. The principles uncovered through

Fourier Analysis are directly applicable in the development of audio processing software, including

effects, synthesis, and noise reduction algorithms. Producers and sound engineers can leverage this

knowledge to manipulate recordings more effectively, ensuring that the desired emotional and aesthetic

impacts of music are achieved.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240109

Figure 5. Time domain signal and frequency domain signal for the triangle

3. Conclusion

The exploration of musical tones through Fourier Transformation has yielded significant insights into

the unique acoustic characteristics of the trumpet, piano, and flute. This analytical journey, underpinned

by mathematical rigor and facilitated by MATLAB, has illuminated the complex interplay between

fundamental frequencies, harmonics, and the resultant timbre of musical instruments. By dissecting

sound into its constituent frequencies, Fourier Analysis has provided a quantifiable understanding of

what gives each instrument its distinctive sound.

This study represents a step towards demystifying the complex relationship between the physics of

sound production and the perceived qualities of musical tones. The application of Fourier

Transformation offers a powerful lens through which to view the intricacies of music, providing a

foundation for future research and innovation in music theory, acoustics, and digital audio technology.

With the development of technology, the potential for new discoveries and applications in the realm of

music and sound engineering will be vast.

References

[1] Zhu, H., Wen, X., Jin, W., He, Z., and Zeng, Yi. (2015) Oil and gas detection based on

deconvolution short-time Fourier transform. Progress in Geophysics, 5, 6.

[2] Zhou, H. and Wang, Y. (2008) Fourier transform is used to measure motor speed. University

Physics Experiments, 21, 54-56.

[3] Yuan, J. (2020) Comparison of Harmony between Timbres of Different Musical Instruments:

Application of Fourier Transform in Music. Chinese Writers and Artists, 000(002), 35-35.

[4] Smith, J.O. (2007) Mathematics of the Discrete Fourier Transform (DFT), with Audio

Applications. W3K Publishing.

[5] Brown, J.C. (1991) Calculation of a Constant Q Spectral Transform. Journal of the Acoustical

Society of America, 89, 425-434.

[6] Chen, J. (2019) Research on music visualization creation method based on Fourier transform.

Science and Informatization, 30, 2.

[7] Yuan, J. (2020) Comparison of Harmony between Timbres of Different Musical Instruments:

Application of Fourier Transform in Music. Chinese Writers and Artists, 000(002), 35-35.

[8] Xu, Q. (2017) Musical tone analysis and generation based on Fourier transform. Electronic World,

4, 2.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240109

A method to test the uniform convergence of function series

Zhian Wu

School of Mathematics and Statistics, Lanzhou University, Lanzhou, Gansu, China

wuzha21@lzu.edu.cn

Abstract. The series refer to performing infinite addition operations on infinite numbers or

functions in a certain order. It is hard to find out whether the positive function series converges

uniformly in many cases. In this article, a new method that replacing the sum of function terms

series with improper integral will be introduced, which is designed to solve problems that cannot

be solved by classical Weierstrass M-test. The Cauchy uniform convergence test will serve as

the basis for the entire proof process because it can lead the focus point from the whole sum to

the partial sum of the function series, where its value can be easier substituting by the value of

the improper integral. After using basic knowledge of the improper integral, the uniform

convergence can finally be known. By using this method, testing the uniform convergence of the

irregular function series even estimating its value can be possible accomplished.

Keywords: Uniform convergence of function series, Improper integral, Cauchy’s convergence

test, Weierstrass M-test, Mathematical analysis.

1. Introduction

A series is a sequence of countable real numbers and it is important to study its sum [1]. According to

historical records, Archimedes was the first person to give the sum of an infinite series. When calculating

the area under the arc of the parabola, he used the exhaustive method, which extremely approximated

the value of π [2-3]. However, people later realized that testing the convergence of the series rather than

directly calculating the sum of the series could indirectly understand the properties of a series. After that,

the sages have devoted themselves to the study of series convergence.

The function series as the topic of this paper is a series, where the terms are functions. Among the

various types of convergence, the uniform convergence is very ideal for a series because many properties

of the function series are preserved by its convergent function [4]. If a function series is equicontinuous,

then the property of continuity transfers to the limit function. Cauchy firstly came out with the theory of

uniform convergence. Later, Seidel and Stokes pointed out Cauchy’s limitations [5]. Cauchy then

acknowledged their advice and reached the Stokes’ conclusions [6]. Thomae used Cauchy’s theory for

his own theory of functions without realizing in time the difference between uniform convergence and

non-uniform convergence [7]. The Weierstrass M-test is also helpful to test the uniform convergence of

function series, but this is not a universal method [8]. Florentin used improper integral to approximate

the value of positive series, but the method of using improper integral to determine the series of function

terms has not yet appeared [9]. The subject of the paper is to give a method of testing about function

term series.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240105

(https://creativecommons.org/licenses/by/4.0/).

The paper is organized as following. In section 2, the basic knowledge will be shown. In section 3,

the proof of this method will be given. In section 4, two applications by using this method will be

displayed.

2. Basic Knowledge

Let’s recall some facts of improper integral and function series that will be applied for the proof.

Theorem 2.1 (Cauchy’s convergence test) For every ɛ > 0, if 



 (x) is uniformly convergent,

then there exists a natural number N and for every pN, when n > N,

12ɛ (1)

Example 2.2: Let (x) =, for all x [0, ρ], 0 < ρ< 1. Prove that the series is uniform convergent.

Proof:

Since  (x) ≤  For all kN, there exists |   | ≤  

= 󰇛󰇜= 

. Since 0 < ρ < 1, as n →∞,

→ 0. Hence (x)

is uniformly convergent.

For the integral 󰇛󰇜



, if it is convergent, its value can be simply given by replacing the infinity

with a natural number A that:

 󰇛󰇜

∞

lim



∞

 󰇛󰇜



(2)

Example 2.3: Calculate the integral: 





.

Choose a natural number b which is efficiently large to replace the infinity, then:



2

∞

lim

∞

2



lim

∞1

1

1

(3)

Hence the value of the improper integral is known.

3. Method

Let’s introduce some notations:

󰇛x󰇜

∞

 󰇛x󰇜 (4)

󰇛x󰇜 󰇛󰇜

∞

(5)

Theorem 3.1 For all kN, let f(x) =(x). The function f(x) is continuous and monotone between

every interval [k, k+1], then the method below can be used to test the uniform convergence of series:

󰇛󰇜󰇛󰇜󰇛󰇜 (6)

Proof:

According to the Cauchy’s convergence test, to test whether (x) is convergent or divergent is

identical to test the uniform convergence of (x).

Hence let’s consider the interval [n, ∞] where the function f is defined on is being divided into unit

subintervals [n, n+1], [n+1, n+2], …, [n+p-1, n+p], … for every pN.

Afterwards, the total sum of f(k), for every k ≥ n+1, actually is the (x). Then:

󰇛󰇜  󰇛󰇜

∞

  (7)

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240105

Now using the improper integral can get the upper and lower bound of (x).

  󰇛󰇜



 󰇛󰇜󰇛󰇜 󰇛󰇜



󰇛󰇜 (8)

On the other side:

󰇛󰇜 󰇛󰇜

∞

   󰇛󰇜

∞

 󰇛󰇜 (9)

Combining (8) and (9) together, finally (6) is finished now.

When testing some function term series which is hard to be worked out through classical Weierstrass

M-test, researchers can use this method and turn the series test into the improper integral to find out

whether the improper integral is convergent or not.

4. Application

Next let’s apply the method to a series that Weierstrass M-test can’t solve it directly.

Example 4.1 [10]: When α > 0, please discuss the uniform convergence of 



  on [0,

∞].

Proof:

When 0 < α ≤ 1:

By the method, (x) = 



 = 󰇛󰇜≤ (x). Let x = 

 and n → ∞. It is

easy to conclude that (x) is divergent. The series is divergent now.

When α > 1:

Using this method, (x) = 



 =  ≥  (x). Since α > 1,  i s

convergent to 0 when choosing the n that is efficiently large.

Hence the series is uniformly convergent on [0, ∞].

Example 4.2: Showing that 󰇛󰇜





 is uniformly convergent for x[0, ∞].

Proof:

It is easy to find that M-test doesn’t apply on this integral. So, using the method above:

(x) =





 

󰇡

󰇢 ≥ (x). If choosing a natural number which is efficiently large,

then (x) ≤ (x) < ɛ, for every ɛ > 0. Through this way the series is uniform convergence.

From the example it is obviously knowing that using improper integral to evaluate function series is

helpful when Weierstrass M-test is not applicable.

5. Conclusion

The connection between improper integrals and infinite series has an inseparable relationship between

their theory and application. When solving certain improper integrals, they can be transformed into

infinite series summation. In this paper, a new method to bypass Weierstrass M-test and obtain uniform

convergence is given and strictly proved. This method makes it possible to use improper integrals to

determine the uniform convergence of function term series. In addition, by calculating the improper

integral, the value of the function term series can be roughly estimated, which greatly facilitates

approximate calculations in practical applications. But this method only applies when the function term

series is positive. In the future, a method that can test all the function term series will be an expectation.

References

[1] Thompson,S. and Gardner, M. (1998) Calculus Made Easy. Macmillan and Co. London.

[2] O’Connor, J.J. and Robertson, E.F. (1996) A history of calculus. University of St Andrews.

[3] James, K. B. (1993) Archimedes and Pi-Revisited. School Sci. Math. 94, 127-29.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240105

[4] Nicholas, P. (2020) A note on convergence of sequences of functions. Topol.Appl. 275.

[5] Viertel, K. (2021) The development of the concept of uniform convergence in Karl Weierstrass’s

lectures and publications between 1861 and 1886. Arch. Hist. Exact Sci. 455-490.

[6] Henrik, K. S. (2005) Exceptions and counterexamples: Understanding Abel’s comment on

Cauchy’s Theorem. Hist. Math., 32, 453-480

[7] Christian, K. Tanguy, R. (2005) How can we escape Thomae’s relations? J. Math. Soc. Japan,

183-210

[8] Rudin, W. (1953) Principle of Mathematical Analysis. McGraw-Hill, Inc. New York.

[9] Florentin, S. (2006) A Triple Inequality with Series and Improper Integrals. Bull. Pure Appl.

Sci.,25,

[10] Chen, J. Yu, C. and Lu, J. (2018) Genuine Mathematical Analysis. Higher Education Press.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240105

The application of convex function and GA-convex function

Dingrun Zhao

School of Mathematics and Statistics, Central South University, Changsha, Hunan,

410083, China

7805220128@csu.edu.cn

Abstract. A convex function is a function that maps from a convex subset of a vector space to

the set of real numbers. Convex functions have some important properties, such as non-negativity,

monotonicity, and convexity, which can help us derive and prove inequalities. This paper

explores the concepts of convex functions and GA-convex functions, demonstrating their utility

in proving a variety of common and complex inequalities. Beginning with an overview of convex

functions and their extension to GA-convex functions, the study shows how these mathematical

tools can be effectively utilized in the context of inequality proofs. By leveraging the properties

of these functions, the paper successfully establishes rigorous proofs for a range of inequalities,

highlighting the versatility and applicability of convex and GA-convex functions in

mathematical analysis. The properties convex and GA-convex functions allow us to use it to

determine the direction of inequalities, prove inequalities, determine the optimal solution of

inequalities, and even prove Cauchy inequalities.

Keywords: Convex function, GA-convex function, Application.

1. Introduction

The concavity and convexity of functions have many applications in proving inequalities. Cha conducted

research on formulas related to the theorems of convex functions, deriving several important

inequalities, which were further applied to prove inequalities and solved conditional extremum problems

in 2004 [1]. In 2005, Xia derived the Jensen’s inequality from the concavity, convexity, and continuity

of functions [2]. Wu provided the definition of square-convex functions and methods for determining

square-convex functions. Then the Jensen-type inequality for square-convex functions was established

in 2005 [3]. In 2010, Song and Wan obtained a more concise Hadamard-type inequality for GA-convex

functions through their study of GA-convex functions [4]. Shi et al. obtained a new refinement of the

Hermite-Hadamard-type inequality for GA-convex functions in 2013 [5]. In the same year, Shi et al.

derived some new weighted Hadamard-type inequalities for differentiable GA-convex functions [6]. Wu

and Mao proved the Hermite-Hadamard inequality on a special region in 2022 [7].

This article mainly introduces convex functions and GA-convex functions. The paper first introduces

the definition of convex functions and its equivalent definitions, extends it to n numbers, and then proves

several common inequalities using its properties in section 2. This paper transitions from convex

functions to GA-convex functions, introduces its definition, proves its properties, creates an inequality,

and then proves a more complex inequality relationship in section 3.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240107

(https://creativecommons.org/licenses/by/4.0/).

2. Convex Function and its application

2.1. Properties of concave-convex function

The definition of concave-convex function will be introduced first, followed by an explanation of its

properties.

Definition 2.1. ([8]) The original definition of convex functions is derived from geometric intuition.

Assuming curve 󰇛󰇜󰇟󰇠, take 󰇟󰇠 such that . The equation of

the chord passing through the points 󰇛󰇛󰇜 and 󰇛󰇛󰇜 is

󰇛󰇜󰇛1󰇜󰇛2󰇜󰇛1󰇜

21󰇛1󰇜2

21󰇛1󰇜1

21󰇛2󰇜(1)

So 󰇛󰇜 is concave upwards or downwards in interval 󰇟󰇠,

󰇛󰇜󰇛󰇜2

21󰇛1󰇜1

21󰇛2󰇜(2)

Property 2.2. Suppose 󰇛󰇜 is concave upwards or downwards in interval 󰇟󰇠 then it holds that

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜.

Proof: Let

2

211󰇛21󰇜2

2111

212󰇛01󰇜(3)

If theand in equations (1) and (3) are interchanged, the result remains unchanged. This means

that the above results are independent of whether is greater than or less than , as long as 

󰇛󰇜. Therefore, set

2

2101

210112(4)

So 󰇛󰇜 is concave upwards or downwards in interval 󰇟󰇠 that can be replaced by another form:

󰇛12󰇜󰇛󰇜󰇛1󰇜󰇛2󰇜(5)

Definition 2.3. Let 󰇛󰇜 be defined on interval 󰇟󰇠, 󰇟󰇠,  , if

󰇛12󰇜󰇛󰇜󰇛1󰇜󰇛2󰇜(6)

Then it indicates that 󰇛󰇜 is concave up or concave down on the interval 󰇟󰇠.

2.2. The Application of Convex Functions in Proving Inequalities

In this subsection, common inequalities are proven using the properties of convex functions. First, a

lemma is introduced.

Lemma 2.4. Each Let󰇛󰇜 be convex upwards and downwards on 󰇟󰇠, 󰇟󰇠,

there exists, 

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜

 (7)

Proof: By induction, when, the proposition can be proven using (6). Assuming it holds for

 , prove that it also holds for  , 󰇟󰇠,

󰇛121

1󰇜󰇛 

112

1

1󰇜

Let  

 



󰇟󰇠, then

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240107

121

112

1

󰇛󰇜󰇛1󰇜󰇛2󰇜󰇛󰇜

 (8)

Example 2.5. Let. Prove:



11

21

123

12

(9)

Proof: First prove the right half of the equation.

123

12

12󰇟12

󰇠

ln 1ln 2ln 

ln 12

(10)

The inequality can be proven using convex function 󰇛󰇜 and the Lemma 2.4. Replacing

with 

󰇛󰇜 can prove the left half of the inequality.

3. GA-Convex Functions

3.1. Characteristics of GA-Convex Functions

The definition of GA-Convex Functions will be introduced first, followed by an explanation of its

properties.

Definition 3.1.([9]) The Let 󰇛󰇜 be a function defined on 󰇛󰇜. For any  and 

󰇛󰇜, it exists, 1

2

1󰇛1󰇜󰇛1󰇜󰇛2󰇜(11)

Then 󰇛󰇜 is called a GA-subconvex function on ,if the inequality sign is reversed; otherwise, it is

termed a GA-superconvex function on that interval.

Theorem 3.2. If a function 󰇛󰇜 is GA-convex on the interval 󰇛󰇜󰇛󰇜 , then for any

󰇛󰇜 and for 󰇛󰇜 , the function 󰇛󰇜 is GA-subconvex function on the interval

󰇛󰇜.

Proof: Let any 󰇛󰇜, and 󰇛󰇜, then

1

2

1󰇡1

2

1󰇢1󰇛1󰇜2

1󰇛1󰇜2󰇛1󰇜󰇛1󰇜󰇛2󰇜(12)

Where 󰇛󰇜 is GA-convex on 󰇛󰇜 . For any 󰇛󰇜 , since 󰇛󰇜 is GA-subconvex

function on 󰇛󰇜, for any 󰇛󰇜 it holds

1󰇛1󰇜2󰇛1󰇜󰇛2󰇜1󰇛1󰇜󰇛1󰇜󰇛2󰇜(13)

Therefore, 󰇛󰇜 is GA-subconvex function on interval 󰇛󰇜.

Theorem 3.3. Let a function 󰇛󰇜 be twice differentiable on the interval 󰇛󰇜. Then 󰇛󰇜 is

GA-convex on the interval  if and only if the following conditions hold:

(1) Let 󰇛󰇜 be GA-convex on , the inequality ″󰇛󰇜′󰇛󰇜0 must hold all  in .

(2) Let 󰇛󰇜 be GA-concave on , the inequality ″󰇛󰇜′󰇛󰇜0 must hold all  in .

Proof: It is easy to establish the connection between the second derivative of 󰇛󰇜 on the interval

󰇛󰇜 and the concavity/convexity of the function.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240107

Theorem 3.4. Suppose 󰇛󰇜 is GA-concave on the interval  , 󰇛󰇜

 It holds 󰇡󰇢󰇛󰇜󰇛󰇜󰇛󰇜

 



Proof: This theorem can be proved by induction. Then, it is easy to get if 󰇛󰇜is GA-Concave on

interval : 12

1

󰇛󰇜

1

(14)

3.2. Applications of GA-convex functions.

Theorem 3.5. ([10]) Suppose function 󰇟󰇠󰇛󰇜 is GA-Concave, it holds

󰇛󰇛

󰇜

󰇜 

󰇛󰇜



󰇛 



󰇜󰇛󰇜󰇛 



󰇜󰇛󰇜 (15)

If function  is GA-Convex, inverting the inequality sign is sufficient.

Proof: First prove the inequality on the right-hand side. It can be proved easily by taking the logarithm

on both sides. Let 



 and 





Let 

, it is easy to infer 󰇛󰇜. By the properties of GA-Concave, the following formula

can be derived.

󰇛󰇜



󰇛󰇜󰇛󰇜



󰇛󰇜󰇟󰇛󰇜󰇛󰇜󰇛󰇜󰇠



󰇛󰇜

󰇟󰇛󰇜󰇛󰇜󰇛󰇜󰇠



󰇛󰇜

󰇟󰇛󰇜󰇛󰇜󰇛󰇜󰇠󰇛󰇜󰇛󰇜



󰇟󰇛󰇜󰇛󰇜󰇛󰇜󰇠

󰇛󰇜󰇛󰇜󰇛󰇛󰇜󰇛󰇜󰇜󰇛󰇜





󰇛 

󰇜󰇛󰇜 

󰇛󰇜 (16)

Dividing both sides by  will get the inequality on the right-hand side. By the same way, the

inequality on the left-hand side can be proved. Let 󰵎󰵎󰇟󰇠. By the

definition of a definite integral and Theorem 3.4, the following formula can be derived.

󰇛󰇜





∞

󰇛Δ󰇜



1

∞󰇛󰇛Δ󰇜󰇜



1



󰇛

∞󰇟󰇛Δ󰇜

1



󰇠󰇜󰇛

∞󰇟󰇛Δ󰇜

1󰇠󰇜

󰇛󰇝1

Δ

∞

Δln 󰇡Δ󰇢

1󰇞󰇛󰇝 1





󰇞󰇜󰇛1

󰇧

󰇜1

󰇨 (17)

When 󰇛󰇜 , the inequality in (15) holds.

Example 3.6. ([10]) Suppose  ,

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240107



󰇛󰇜

󰇛

󰇜

󰇛

󰇜

(18)

Proof: This example can be proven by GA-concave functions and Theorem 3.5 By substituting

󰇛󰇜

 into the inequality on the left side of (15), it follows

󰇛

󰇜1



2

1

󰇛

󰇜1



󰇛

󰇜1

4

9󰇛

󰇜2󰇛󰇜2

41

󰇛

󰇜1

 (19)

Substituting 󰇛󰇜 into the inequality on the right side of (15) results in



(20)

Next, the proof of Example 3.6 reduces to prove:





2 (21)

Suppose , the original formula can be simplified as 󰇛󰇜󰇛󰇜

Construct a function 󰇛󰇜󰇛󰇜 and utilize the Lagrange Mean Value Theorem 󰇛󰇜󰇛󰇜

 

󰆒󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜.

Due to this common inequality:

1

11

1󰇛1󰇜2󰇛1󰇜 (22)

Therefore, the inequality (21) is proved.

Replacing  and b with  and  in (21) results in 󰇛󰇜

 

, multiplying both sides by

, 

󰇛󰇜

 can be obtained.

Only the last inequality needs to be proven now.

4

9󰇛

󰇜2

2

󰇛󰇜󰇛󰇜2

24

9󰇛󰇜20

1

18 󰇟󰇛4󰇜󰇛󰇜2󰇠0(23)

Therefore, the inequality Example 3.6 is proved.

4. Conclusion

This article first introduces the definition of convex functions from a geometrically intuitive perspective,

then extends from two points on an interval to n points, skillfully demonstrating that the harmonic mean

is less than or equal to the geometric mean, which is less than or equal to the arithmetic mean. In the

subsequent section, it extends the ordinary convex functions to GA-convex functions, studies their

sufficient and necessary conditions and properties, and ultimately constructs an inequality to prove the

complex inequality chain in the example. It is evident that convex functions can easily be used to prove

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240107

seemingly complex inequalities, but they also require assistance from other tools in mathematical

analysis. It is hoped that in the future, building upon the foundation laid by this research, researchers

can continue to advance the understanding and application of convex functions in the realm of

inequalities.

References

[1] Cha, L. (2004) Convex functions and inequalities. Journal of Ningbo Vocational and Technical

College, 8, 3.

[2] Xia, H. (2005). Convex functions and inequalities. Journal of Changzhou Institute of Technology,

18, 3.

[3] Wu, S. (2005). Square convex functions and Jensen-type inequalities. Journal of Capital Normal

University: Natural Science Edition, 26, 6.

[4] Song, Z. and Wan, X. (2010). Hadamard-type inequalities for Ga-convex functions. Science,

Technology and Engineering, 23, 3.

[5] Shi, T., Wu, H. and Jiao, Z. (2013). Two functions related to Hermite-Hadamard type inequalities

for Ga-convex functions. Journal of Guizhou Normal University: Natural Science Edition, 31,

[6] Shi, T. and Wu, H. (2013). Weighted Hadamard-type inequalities for differentiable Ga-convex

functions. Journal of Chongqing University of Science and Technology: Natural Science, 6, 5.

[7] Wu, Q. and Mao, Y. (2022). Properties of Multivariate Convex Functions and Their Hermite-

Hadamard Inequality. Mathematics in Practice and Understanding, 52, 268-272.

[8] Zhou, Z. (2006). In the process of proving inequalities, one must follow the general rules and

basic methods of reasoning for proving problems, and also, due to the ‘inequality’ aspect, it is

necessary to adopt some special proof methods. This article will use one of the properties of

functions - convexity - to prove some inequalities in high school algebra. Journal of Lanzhou

Institute of Education, 4, 58-60.

[9] Wu, S. (2004). Ga-convex functions and the Poincaré-type inequality. Journal of Guizhou Normal

University (Natural Science Edition), 2, 52-55.

[10] Hua, Y. (2008). Hadamard-type inequalities for Ga-convex functions. College Mathematics, 24,

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240107

Research on Improved Crowd Detection Based on YOLOv5

Qi Wen1, Kecheng Li2,4, Yue Wang3

1School of Computer Science, University of Xi'an for Polytechnic, Xian, China

2College of Computer and Cyber Security, Chengdu University of Technology,

Chengdu, China

3School of Optical-Electrical and Computer Engineering, University of Shanghai for

Science and Technology, Shanghai, China

4li.kecheng@student.zy.cdut.edu.cn

Abstract. With the acceleration of the process of modern urbanization and the improvement of

residents' material living standards, the flow of people in the public space is gradually

becoming saturated. The monitoring equipment in public places records a huge amount of

people flow information all the time, but due to the crowds tend to be dense and crowded.

Traditional machine learning cannot make accurate and efficient identification of a large

number of dense crowds, if the deep learning technology can be used to process the crowded

crowd captured by the surveillance camera and accurately identify the number of people in

public places, it provides an important guarantee for the flow of people in public areas and

safety construction. However, for crowded targets with occlusions, the traditional target

detection algorithm sometimes performs poorly. Based on the above background, this paper

introduces an enhanced deep learning framework utilizing the YOLOv5 neural network for

crowd detection research. aiming at the characteristics of dense and crowded crowds in public

areas. By improving convolutional layer C3 in the backbone structure of YOLOv5 neural

network and adding CBAM attention mechanism. Compared with the original YOLOv5, the

improved model has increased the maximum F1 value of crowd recognition at near, middle and

far distances. To sum up, the deep learning framework improved by YOLOv5 neural network

proposed in this paper has significantly improved the recognition accuracy of crowded people

in public areas.

Keywords: YOLOv5, Crowds, Image recognition, CBAM attention mechanism.

1. Introduction

With computer vision technology, big data tracking and the development of convolutional neural

networks, object detection plays a vital role in various fields. In practice, this method can monitor the

flow of people in public places and tourist attractions. At present, YOLOv5, faster, R-CNN and other

deep learning-based object detection methods have been applied to human flow detection. YOLO

(You Only Look Once), as a classic real-time target detection algorithm, its fast speed and good effect

make it widely used. It plays a crucial role in detecting pedestrian flow in public areas [1]. But when it

comes to crowd counting, now it's difficult to count pedestrians based on dense and crowded crowd

identification, suitable for large density, small target population, there's still a lot of room to explore,

and current crowd-based target detection, deep learning algorithms are still in their infancy, difficult to

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0126

(https://creativecommons.org/licenses/by/4.0/).

apply in real life scenarios, it has such problems as poor robustness, low precision and large

calculation amount [2]. To solve the above problems, there have been many studies that have made

remarkable progress. For example, by adding the Focus layer [3], the amount of model computation

can be reduced and the operation speed can be accelerated, thus improving the detection efficiency. In

addition, by introducing attention mechanism and adding SE module in the stage of network feature

fusion, the localization accuracy of information is improved. At the same time, using Soft-NMS to

replace the original NMS, the mean detection accuracy mAP@0.5 increased by 1.5%, and the recall

rate increased by 0.5% [4]. This paper designs an improved deep learning framework based on

YOLOv5 neural network. By improving the convolutional layer C3 in the backbone structure of

YOLOv5 neural network, add CBAM attention mechanism [5], to realize the improvement of crowd

identification accuracy at near, middle and far distances, allowing it to more accurately identify

obscured targets, thus improve the detection performance and practicability. The primary contents of

this paper include: Identify targets for crowds, this paper completes the crowd count by detecting the

torso of the person. First collect and create a data set for people detection and identification, the

dataset consists of 15,000 images. Then they trained on the data they had collected, according to the

training data, the characteristics of occlusion and congestion of the target are identified. Improvements

to YOLOv5, by adding SENet attention mechanism to YOLOv5, that is, each Channel is pooled,

through two fully connected layers, get the output vector, then, the nodes and channels of the second

fully connected layer are aligned [6]. The final output, the F1 value after training increased by 0.03

compared with the original YOLOv5. The recall rate went up from 0.71 to 0.73, solve the problem of

target recognition accuracy of crowded crowd at middle distance and far distance. However, the

experiment found that its accuracy in short-range target recognition decreased compared with the

original YOLOv5.To solve this problem, this paper introduces a new attention mechanism CBAM to

improve YOLOv5. By adding two new modules before the data entry of the original C3 module of

YOLOv5, channel attention mechanism and spatial attention mechanism, the two modules are

multiplied after each calculation is completed, by suppressing information that is not important in

terms of channel and space, respectively, the F1 value after training is 0.72, the confidence for

accuracy of 1 is 0.968, when the confidence is 0.5, the accuracy is 0.768, The recall rate was 0.76, the

F1 value of the original YOLOv5 is 0.68. The confidence for accuracy of 1 is 0.985, when the

confidence is 0.5, the accuracy is 0.723. The recalls rate was 0.71. Experiments show that the target

recognition accuracy of middle and far crowded crowd is improved. At the same time, it also

maintains the recognition accuracy of the original YOLOv5 in the close-range target. Solved the

model in the complex public scene, especially for a large number of people with small targets, the

identification accuracy of the problem.

2. Research methods

2.1. Introduction to YOLOv5 architecture

YOLOv5 is an efficient target detection model, which is characterized by a simple model architecture

and high efficiency of target recognition, especially suitable for real-time multi-target recognition

scenarios. The YOLOv5s version is used in this paper, this version also boasts the smallest depth and

feature map width among the YOLOv5 series. The basic components of YOLOv5s include Focus,

Conv, C3, SPP. The function of Focus is to decompose the high-resolution feature map into many

low-resolution feature maps, that is, by reducing the larger input image to a smaller input image to

improve the speed of calculation and the accuracy of feature extraction. Conv is a conventional

convolution layer in YOLOv5. The main goal is to convolve the input image through the convolution

kernel operation to achieve the purpose of feature extraction and processing. C3 is the key complex

convolutional layer module of YOLOv5s. The main idea revolves around dividing the input feature

map into two parts and processing them separately, and finally merge them to reduce the amount of

calculation as much as possible. SPP is a pooling module with a pyramid shape. It uses a maximum

pooling method to extract features from different spatial scales and perform multi-scale fusion, so that

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0126

the model has excellent recognition ability for various targets of different sizes. In general, the main

idea of YOLOv5s is to extract features through complex convolutional layers and strengthen feature

fusion through multi-scale pooling layers, to achieve the effect of fast and accurate recognition of

different targets.

2.2. Introduction of complex convolution layer C3

Figure 1. C3 fundamentals.

Figure 1 illustrates the fundamental principle of the C3 layer, with c1 denoting the input channels, c2

denoting the output channels, and c_ representing the channels generated during the intermediate

convolution process. In general, there are four convolutional layers: convolutional layer 1 and

convolutional layer 2 are exactly the same, and their role is to adjust the number of channels of the

feature map for subsequent processing; The bottleneck layer is also a convolutional layer, similar to

convolutional layer 1 and convolutional layer 2, which mainly provides a bypass for the convolution

operation and directly connects to the subsequent connection layers; After the connection in the

channel dimension, the convolution layer C3 can perform the convolution operation on the feature

map as a whole, and the number of channels is converted from 2c_ to c2, so as to generate the final

feature map. In general, the main feature of C3 layer is that it can fuse features of various scales more

quickly through the combination of parallel convolution and concatenation operations to enhance the

model's feature extraction capabilities.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0126

2.3. Improvement scheme

Figure 2. C3 fundamentals.

Based on the architecture of YOLOv5s, this paper makes code improvements. The main improvement

scheme is to replace the original complex convolutional layer C3 of YOLOv5s with the C3CBAM

module to increase its recognition ability for dense crowds. C3CBAM module actually adds two new

modules: adding spatial attention and channel attention modules before the input data of the original

C3 module. As shown in Figure 2, the two modules multiply after the calculation is completed

respectively, so as to suppress the unimportant information in terms of channel and space respectively.

The channel attention module can dynamically learn the significance of individual channels over time,

reduce the effect of irrelevant feature channels to reduce redundant information, and enhance the

model's robustness. The spatial attention module helps the model to understand the relationship

between pixels more accurately through continuous learning of spatial position weights, so as to

enhance the function of feature extraction [7]. By adding the C3CBAM module to the C3 layer of

YOLOv5s, the scheme presented in this paper not only boosts the model's recognition accuracy for

individuals but also enhances its effectiveness in complex scenes, particularly for numerous

individuals with small targets.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0126

2.4. Introduction of channel attention module

Figure 3. Channel attention module fundamentals.

As shown in Figure 3, this is the basic principle of the channel attention module. The input here refers

to the feature map that was originally fed to the C3 layer. After input, the feature map is divided into

two parts for adaptive Max pooling and adaptive average pooling respectively. The role of this step is

to make the feature map retain the information of the global maximum value and the global average

value respectively. Then the two parts enter the convolution layer and the activation layer respectively,

which reduces the number of channels and extracts key information while introducing nonlinearity to

increase model’s expression ability. Convolution is then performed to recover the original number of

channels. Finally, the results of the processed maximum pooling and the results of the average pooling

were added to fuse the features of the two pooling strategies, and then the Sigmoid activation function

was used to compress the output range to between [0, 1] to generate the channel attention weight. In

this way, the importance of each channel can be obtained when performing the output. In general, the

channel attention module enhances the expression ability of important channels by concatenating the

convolutions of maximum pooling and average pooling results [8].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0126

2.5. Introduction to Spatial Attention Module

Figure 4. Spatial attention module fundamentals.

As depicted in Figure 4, this illustrates the fundamental principle of the spatial attention module. Once

again, the output refers to the feature map that was originally fed to the C3 layer, not the output of the

channel attention module. First, the research needs to perform Max pooling and average pooling

operations on the input feature maps, which can extract the global maximum and average value in the

spatial dimension. Then, the results obtained by Max pooling and average pooling are concatenated to

obtain a feature map with increased dimensions and containing the results of both types of pooling.

Finally, convolution and Sigmoid activation are also performed to enhance the key spatial location

features and compress the output range. In general, the spatial attention module performs convolution

and activation on the concatenation of the maximum pooling and average pooling results to achieve

the effect of extracting important information from the spatial dimension[9]. In comparison, the main

difference between the spatial attention module and the channel attention module is the difference in

the pooling operation, and the difference in the method of combining the two parts of the pooling

results during feature fusion.

2.6. Introduction to the dataset

The experiment uses the CrowdHuman dataset, which contains a total of 15,000 dense crowd images,

in which a total of 339,565 objects have been marked for recognition. Due to the small size and fuzzy

contour of a considerable part of targets in the dataset, this paper believes that this dataset is very

suitable for the training and verification of dense crowd recognition[10]. Since the label specification

of the dataset itself does not conform to the training scheme of YOLOv5, and the image sizes in the

dataset are different, the format of the label should be adjusted before training, so that the label meets

the label format of YOLOv5 and can adapt to different image sizes. In this paper, only the first part of

the three parts of the data set of the training set is used in the training, and some images are too simple

to recognize, such as images of several people standing in a row to take group photos, etc., so this

paper removes these images in the verification.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0126

3. Experimental results

3.1. Introduction to SENet

Figure 5. SENet Fundamentals

The effect of the SENet module is similar to this paper, which is a module that enhances the feature

representation power to make the neural network perform better. As shown in Figure 5, the SENet

module first performs the convolution operation on the input feature map to change the original size C

'×H' ×W 'feature map into C×H×W. Next, Squeeze operation, namely Fsq, is applied to the obtained

new feature map, and a 1×1×C vector is obtained by global average pooling. This vector is then fed

into the fully connected layer Fex learns the importance of each channel. Finally, through Fscale. The

product of the weight vector and the original feature map is the final output feature map [11].

3.2. Comparison of target detection effects

Figure 6. YOLOv5.

Figure 7. YOLOv5(aftertraining).

Figure 8. SENet.

Figure 9. CBAM Improvement.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0126

As shown in the figure 6, it can be seen that the untrained YOLOv5 identifies a lot of non-human parts

and many people fail to recognize them, mostly because there are many and sparse detection targets in

the images of the training set. YOLOv5 has a good recognition effect after training with the dense

crowd training set CrowdHuman, as figure 7 shows, it lacks the ability to recognize long-distance

targets. Figure 8 illustrates SENet has a strong recognition ability for distant targets, but its recognition

ability for medium and near ranges is not as good as the original YOLOv5. The improved model after

CBAM shown as figure 9, can not only accurately identify the medium and close targets, but also

improve the ability of distant target recognition, which really improves the recognition accuracy.

3.3. Comparison of experimental data

Table 1. Validation data comparison.

Index

YOLOv5 CBAM

YOLOv5

F1 maximum

0.72 (when the confidence is 0.436)

0.69 (when the confidence

is 0.427)

Confidence (when the precision is 1)

0.768

0.985

Accuracy at a confidence level of 0.5

0.109

0.723

Recall

0.76

0.71

As shown in Table 1, the F1 maximum value of YOLOv5 improved by CBAM is slightly improved,

and its recognition accuracy is improved from 0.723 to 0.768 when the confidence level is 0.5. The

most significant improvement was in recall, which went from 0.71 to 0.76. This shows that the

improved model has a great improvement in the recognition of positive samples compared with the

original model.

4. Conclusion

By improving the forward propagation function of YOLOv5 model and the complex convolutional

layer C3 in the backbone network, this paper significantly improves the recognition accuracy of target

detection. The model can learn features more effectively during training. By adjusting the structure

and optimizing the parameters of the C3 convolution layer, the robustness and accuracy of the model

in dealing with complex scenes are enhanced. The results show that after improved YOLOv5 model, it

has achieved significant performance improvement on multiple public data sets, which proves the

effectiveness of the proposed method.

However, this study has some limitations. Firstly, although the accuracy of the enhanced model has

increased, its computational complexity and inference time have also increased, which may bring

certain challenges in practical applications. Moreover, the improvements in this paper are mainly

aimed at specific dense crowd detection tasks, and for other types of visual tasks (such as image

segmentation or pose estimation), the effect is uncertain and needs further verification. Future research

can be further explored in the following aspects: First, to further optimize the computational efficiency

of the model and reduce resource consumption; Second, the proposed method is extended to other

types of deep learning models and tasks to verify its universality. The third is to combine other

advanced technical means, for example, incorporating attention mechanisms and multi-scale feature

fusion, to further improve the performance and adaptability of the model. By improving the forward

propagation function of YOLOv5 model and C3 convolution layer, this project successfully improved

the accuracy of target detection and provided new ideas and methods for subsequent research. Expect

to see more relevant innovations and breakthroughs in practical applications and other fields in the

future.

Authors contribution

All authors contributed equally, regardless of the order of authorship.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0126

References

[1] Jiang, X. K., Liao X. L. and Li Y. B. (2019). Regional crowd flow statistics based on Deep

learning. Digital Users, 29(21), 165-167

[2] Zhan, W. W. (2022). Design and implementation of crowd counting and anomaly detection

system. Beijing: Beijing University of Technology.

[3] Chen, B., Dai, S, L. and Ye, B, Y. (2023). Yolo-based social distancing detection method for

people in public areas. Artificial intelligence and robotics research, 12(3), 10.12677/AIRR.

2023.123023

[4] Cong, X, H., Li, S, X., Chen, F, K. and Meng, Y. (2023). An improved dense pedestrian

detection algorithm based on YOLOv5. Computer science and applications, 13(6), 1199-

1207

[5] Wang, X., Dong, Q., Yang, G. Y. (2023). Crops diseases and insect pests recognition based on

optimized CBAM improvement YOLOv5. Computer system application, 32 (7), 261-268. 10.

15888 / j.carol carroll nki. Csa. 009175.

[6] Li, X. P., Zhang, Y. B., Li Y. P., et al. (2023). An improved algorithm for infrared image target

detection based on YOLOv5s. Laser & Infrared, 53(7), 1043-1051. 10.3969/j.issn.1001-5078.

2023.07.010.

[7] Pei, Y. H., Xu, L. M., & Zheng, B. C. (2022). Improved YOLOv5 for Dense Wildlife Object

Detection. BiometricRecognition:16thChineseConference, 569-578.

[8] Ji, D. J., and Cho, D. H. (2021). ChannelAttention: Utilizing Attention Layers for Accurate

Massive MIMO Channel Feedback. IEEE Wireless Communications Letters, 10(5), 1079-

1082. https://doi.org/10.1109/LWC.2021.3057934

[9] You, C. (2021). Research on Smoke and Flame Image Classification Algorithm Based on BAN.

Zhejiang Sci-Tech University.

[10] Xu, H. H., Wang, X. Q., Wang, D., et al. (2023). Object detection in crowded scenes via joint

prediction. Defense Technology, 21(3), 103-115.https://doi.org/10.3969/j.issn.2214-9147.

2023.03.008

[11] Hu, J., Shen, L., Albanie, S., Sun, G., and Wu, E. (2019). Squeeze-and-Excitation Networks.

Computer Vision and Pattern Recognition. https://doi.org/10.48550/arXiv.1709.01507

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0126

Prediction of heart disease based on logistic regression

Zixin Zhang

School of Data Science, Capital University of Economics and Business, Beijing,

100000, China

32021230064@cueb.edu.cn

Abstract. Heart disease is a major threat to human health, with a variety of contributing factors,

and is not easily cured. This paper will present a dataset from a cardiovascular study of residents

of Framingham, Massachusetts. First, the validity of the three models, logistic regression,

random forest, and decision tree, is estimated by comparing information such as accuracy,

precision, recall, and F1 values. The optimal model, i.e., the logistic regression model, was

selected by plotting ROC curves and using AUC as a reference criterion for assessing the

predictive effectiveness of the models. Then the raw data and data were preprocessed, including

dealing with missing values. Finally, a logistic regression model was developed to analyze the

influencing factors of heart disease. The purpose of this study was to use the results of the logistic

model to help doctors and patients in heart disease treatment. The results show that the model

has a good predictive effect.

Keywords: Logistic regression, heart disease, ROC curve.

1. Introduction

Heart disease is a disease that afflicts many individuals and families. As technology develops and living

standards improve, more and more people are paying more attention to their health. In recent years, the

incidence of heart disease in many regions has been on the rise, and the loss of life caused by heart

disease is also rising year by year. The World Health Organization estimates that 12 million people die

of heart disease each year globally. For example, in some developed countries, such as the United States,

more than half of the inhabitants die because they suffer from cardiovascular diseases. To reduce the

incidence of heart disease and the mortality rate of the population due to heart disease, further targeted

interventions should be used to study the factors of heart disease.

First, many researchers believe that reducing the incidence of acute postoperative lung injury in

neonates with heart disease can significantly improve child survival [1]. Among adults, many bad

lifestyle habits may also be a major factor in the predisposition to heart disease. For example, it has been

suggested that the incidence of cardiovascular disease due to smoking is higher in China than in the non-

smoking population [2]. Metabolic diseases such as high fasting plasma glucose (HFPG) are significant

and risky factors that lead to cardiovascular disease in humans [3, 4]. In China, with the gradual

development of the economy, the lifestyle and nutritional structure of the population have changed

dramatically, and lifestyle habits such as excessive sugar intake and lack of exercise have led to an

increasing prevalence of HFPG [5].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0103

(https://creativecommons.org/licenses/by/4.0/).

The disease burden of ischemic heart disease (IHD) attributable to HFPG in Chinese residents has

obvious gender and age group characteristics. From a gender perspective, all the disease burden

indicators of the female population are lower than those of the male group, and the trend of disease

burden in the total population is more susceptible to the male group, which may be related to the structure

of the female organism [6]. The main reasons for lower life expectancy in men also include behavioral

factors such as smoking and alcohol consumption, genetic and physiological factors, and higher rates of

injury mortality [7, 8]. However, some findings are contrary to popular belief, that light drinkers are less

likely to develop aortic stenosis than never-drinkers [9, 10]. For example, if a person drinks 60 grams

of alcohol per day, he may have a lower risk of developing the disease than someone who drinks 10

grams of alcohol per day [9, 10].

Wang et al. have shown in their studies that heart disease is often closely related to disability in the

elderly [11]. When older adults were selected for the study, the results showed that the risk of the disease

increased twofold for every 10 years of age [12]. In terms of education level, Ni concluded that the risk

of developing activity of daily living (ADL) limitations in elderly cardiac patients with elementary

school or higher education was 0.666 times higher than that of elderly cardiac patients who had never

attended school [13]. Married, cohabiting and educated urban elderly cardiac patients had a lower risk

of ADL limitation [13].

In summary, it was initially determined that the prevalence of heart disease is related to several

factors such as age, gender, genetic factors, amount of smoking, amount of alcohol consumption, level

of education, marital status, and current status of social development. The study will predict which type

of patients are most likely to develop heart disease in the future by analyzing given characteristics,

comparing differences between patients, and making predictions about future trends, with the ultimate

goal of expecting to provide a basis for reducing the incidence of heart disease.

2. Methods

2.1. Data source and description

This study utilizes a dataset provided by the Kaggle platform, which is derived from an ongoing

cardiovascular study of residents in the town of Framingham, Massachusetts. The dataset has a total of

4,239 samples, each with 16 variables. Fifteen of the variables are independent, with each variables

attribute being a potential risk factor. The last variable “TenYearCHD” is the dependent variable,

indicating whether the patient is at risk of having coronary heart disease (CHD) in the next ten years.

2.2. Selection and description of indicators

Among all the variables, both quantitative variables such as “Age”, “CigsPerDay” and categorical

variables such as “Male”, “Education” are included. Due to the different types of variables, in this paper,

the variables involved in the data will be interpreted according to the type of data. Each quantitative

variable is shown in Table 1 and each categorical variable is shown in Table 2.

Table 1. Overview of quantitative variables.

Variable

Description

Range

Age

Age of the patient

32-70

CigsPerDay

Average number of cigarettes smoked per person per day

0-70

TotChol

Total cholesterol level

107-696

SysBP

Systolic blood pressure

83.5-295

DiaBP

Diastolic blood pressure

48-142.5

BMI

Body Mass Index

15.54-56.8

HeartRate

Heart rate

44-143

Glucose

Gucose level

40-394

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0103

Table 2. Overview of categorical variables.

Variable

Description

Range

Male

Male or Female

Male=1, Female=0

Education

Educational situation

Less than high school=1,

High school grads=2,

College grads=3, Post-

college grads=4

CurrentSmoker

Currently smoking or not

Yes=1,No=0

BPMeds

On blood pressure medication or not

Yes=1,No=0

PrevalentStroke

Had a previous stroke or not

Yes=1,No=0

PrevalentHyp

Hypertensive or not

Yes=1,No=0

Diabetes

Diabetes or not

Yes=1,No=0

TenYearCHD

10 year risk of coronary heart disease

Yes=1,No=0

2.3. Method introduction

There are many ways to predict whether or not a patient will suffer from heart disease. However, the

predicted results are sometimes very different from the real situation, which is related to whether or not

the patient can get timely treatment or even the patient's life, so it is crucial for the patient to make a

correct prediction or judgment [14]. Logistic regression belongs to the probabilistic regression model,

is a kind of generalized linear model, widely used in probabilistic prediction and classification, has the

characteristics of simple, efficient and strong interpretability [15, 16]. In this study, the samples in the

above dataset were processed accordingly by using logistic regression, and the results obtained from the

processing were further analyzed by observing the results of model fitting, etc., to obtain the main factors

influencing the diagnosis of heart disease.

Logistic regression is a type of regression analysis in statistics that is applied to predict the outcome

of the dependent variable from predictors or independent variables, where the dependent variable usually

refers to categorical dependent variables. Also, in logistic regression, the dependent variable is always

binary. Below is the logistic regression equation:

󰇛󰇜  󰇛01122󰇜

1󰇛01122󰇜 (1)

󰇛󰇜

1󰇛󰇜  󰇛01122󰇜 (2)

After inserting all the variables, the author gets the following equation:

󰇛󰇛󰇜

1󰇛󰇜  0 1  2    (3)

Where  denotes the explanatory variable, which in the logistic regression model denotes whether

or not heart disease is diagnosed.denotes the explanatory variable, which in the model is specified as

the factors influencing whether or not one has heart disease.  is the parameter to be estimated.

3. Results and discussion

3.1. Correlation analysis

Figure 1 demonstrates the heat map that can reflect the relationship between the features, through which

the correlation between the features can be directly observed. The heat map shows the correlation

between every two data, and the value range chosen in this paper is between -1 and 1, i.e., greater than

0 indicates that the two selected data are positively correlated, less than 0 indicates that the two selected

data are negatively correlated and equal to 0 indicates that the two selected data are not correlated. The

larger the absolute value of the value indicates that the stronger the correlation and vice versa the weaker

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0103

the correlation. As can be seen from Figure 1, the four variables diaBP, SysBP, PrevalentStroke, and

age show positive correlation and larger coefficients than the other variables with TenYearCHD,

indicating that they are more intimately related to whether or not the disease is present.

Figure 1. Related heat map.

3.2. Comparison of different models

In this paper, the effectiveness of the logistic regression model is derived by comparing the logistic

regression model with two commonly used models named random forest and decision tree. The various

models were compared in terms of four indicators: accuracy, precision, recall and F1 value. The results

are shown in Table 3. The comparative ROC curves of the three models are plotted in Figure 2.

Table 3. Comparison of three models.

Model

Accuracy

Precision

Recall

Logistic regression

0.835

0.538

0.057

0.104

Random forest

0.831

0.438

0.057

0.101

Decision tree

0.736

0.229

0.246

0.237

According to the results of the above three models, no model excels in all aspects, i.e., no model

outperforms the other models in all indicators. However, on a comprehensive consideration, the

accuracy (0.835) and precision (0.538) of the logistic regression model are in the first place. The recall

and F1 values are in second place. According to the ROC curve, the area under the curve (AUC) of this

regression is 0.65, which is not the highest, but it's only different from the random forest model by 0.02.

This result indicates that the logistic regression model has a good predictive effect on the heart disease

data used in the present study, and it is also of great significance for the subsequent prediction of heart

disease data used for similar purposes.

It is important to choose the model with better results, and after a comprehensive evaluation, this

paper decides to use the logistic regression model for the subsequent research.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0103

Figure 2. ROC curve of three models.

3.3. Logistic regression results

Before performing the logistic regression, the study requires some data preprocessing steps. Firstly, the

missing values are processed, which is done by removing the null values, with the aim of ensuring that

the data is clean and usable. The study then divides the processed dataset into two parts: a training set

and a test set, where the training set is used to train the logistic regression model, and the test set is used

to evaluate the performance of the model (Table 4).

Table 4. Logistic regression results.

Variable

Male

0.4067

0.127

0.001

1.502

Age

0.0301

0.007

0.000

1.031

Education

-0.1665

0.057

0.004

0.847

CurrentSmoker

-0.1314

0.184

0.475

0.877

CigsPerDay

0.0226

0.007

0.002

1.023

BPMeds

0.4812

0.283

0.090

1.618

PrevalentStroke

1.5126

0.613

0.014

4.538

PrevalentHyp

0.8944

0.151

0.000

2.446

Diabetes

0.8820

0.344

0.010

2.416

TotChol

-0.0003

0.001

0.835

1.000

SysBP

0.0138

0.005

0.003

1.014

DiaBP

-0.0242

0.008

0.001

0.976

BMI

-0.0531

0.015

0.000

0.948

HeartRate

-0.0292

0.005

0.000

0.971

Glucose

0.0011

0.003

0.673

1.001

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0103

In this study, 15 factors affecting the determination of heart disease were used as independent

variables and then performed the binomial logistic regression. The regression results were organized as

shown in Table 4. Table 4 gives the estimated values of the parameters, and the mean square error

corresponding to the values, in addition to the p-value and OR. Where it is considered significant when

p is less than 0.05; the OR value means the result of comparing the probability of a particular probability

occurring with the probability of it not occurring, which in this paper is expressed as the ratio of having

a heart attack to not having a heart attack in the condition of that independent variable.

3.4. Discussion

From the regression results in Table 4, it can be seen that: male, age, education, cigsPerDay,

prevalentStroke, prevalentHyp, diabetes, sysBP, diaBP, BMI and heartRate have a statistically

significant (p<0.05) effect on having heart disease, which is inextricably associated with heart disease

disease were inextricably linked. On the contrary, currentSmoker, BPMeds, totChol, and glucose did

not have a significant effect on the presence of heart disease (p>0.05), they were not the main influencing

factors for the final confirmation of heart disease.

According to the positive and negative regression coefficients, there is a negative correlation between

the level of education and the ten-year risk of heart attack, indicating that a higher level of education

may reduce the risk, which can also be seen in Figure 2. The coefficients for diaBP, BMI, and heartRate

are also negative, indicating that these variables have a negative effect on the diagnosis of heart disease.

The results also show that gender has a significant effect on the final diagnosis of heart disease, i.e., men

may have a higher 10-year risk of heart attack than women, which may be related to the different

lifestyles of men and women, for example, far more men than women choose to smoke or drink alcohol

in their lives. In addition, the rest of the influencing factors have a positive effect on the ten-year risk of

heart attack, with the slopes of age, cigsPerDay, and sysBP being relatively flat, and the slopes of

prevalentStroke, prevalentHyp and diabetes being larger, indicating that the above variables affect the

final diagnosis of heart disease to varying degrees.

4. Conclusion

Heart disease is an important problem that threatens human health with various factors and it is not easy

to cure. To further analyze the causative factors of heart disease, this paper compares multiple models

and finally uses logistic regression to model 15 variables that affect heart disease. The model aims to

predict the probability of developing coronary heart disease over a ten-year period based on

demographics, lifestyle and health-related factors. The results show that male, age, education,

cigsPerDay, prevalentStroke, prevalentHyp, diabetes, sysBP, diaBP, BMI and heartRate are important

factors in the diagnosis of heart disease. Finally, based on the ROC curve and AUC, it can be seen that

the logistic regression model performs well for the prediction of heart disease. It is hoped that the

conclusions drawn from this study will be helpful in the field of cardiology, provide reference for both

doctors and patients, and gain valuable time to save patients' lives.

References

[1] Jiang L, Ding S, Zhang L P, et al. 2017 Changes in plasma neutrophil gelatinase-associated lipid

transport protein (NGAL) in relation to acute postoperative lung injury in infants and children

with congenital heart disease. Advances in Modern Biomedicine, 17(8), 1570-1573.

[2] Kondo T, Nakano Y, Adachi S, et al. 2019 Efects of tobacco smoking on cardiovascular disease.

Circ J, 83(10), 1980-1985.

[3] Wu S, Xu W, Guan C, et al. 2023 Global burden of cardiovascular disease attributable to

metabolic risk factors,1990 -2019: an analysis of observational data from a 2019 Global

Burden of Disease study. BMJ Open, 13(5).

[4] Chia C W, Egan J M, Ferrucci L 2018 Age-related changes in glucose metabolism, hyperglycemia,

and cardiovascular risk. Circ Res, 123(7), 886-904.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0103

[5] Jin Y, So H, Cerin E, et al. 2023 The temporal trend of disease burden at-tributable to metabolic

risk factors in China, 1990-2019: an analysis of the Global Burden of Disease study. Front

Nutr.

[6] Huang Y, Li Y L, Yan W T, Wang G, Wang B W, Xie P 2024 Trend Analysis and Future Trend

Forecast of Ischemic Heart Disease Burden Attributable to Fasting Hyperglycemia in China,

1990-2019. Chronic Disease Prevention and Control in China, 32(3), 176-182.

[7] Janssen F, Bardoutsos A, Ei Gewily S, et al. 2021 Future life expectancy in europe taking into

account the impact of smoking, obesity and alcohol. ELife.

[8] Li FW, Wen SJ, Tang QX, et al. 2020 Impact of injury-related deaths on life expectancy in China.

Cadernos de saude publica, 36(11).

[9] Larsson S C, Wolk A, Beck M 2017 Alcohol consumption, cigarete smoking and incidence of

aortic valve stenosis. J Intern Med, 282(4).

[10] Markus M R, Lieb W, Stritzke J, et al. 2015 Light to moderat alcohol consumption is asociated

with lower risk of aortic valve sclerosis: the study of health in pomerania (SHIP). Arterioscler

Thromb Vasc Biol, 35(5).

[11] Wang R F, Luo Y, Chen Z S, et al. 2021 Relationship between cardiometabolic co-morbidities

and disability in Chinese middle-aged and elderly people. Journal of Jilin University (Medical

Edition). 47(3), 761-769.

[12] Ji H T, Zhao Y X, Yu X Q, Zhang C C, Liu Z D and Chai Q 2023 Effect of smoking and low-

density lipoprotein cholesterol interaction on valvular heart disease. Preventive Medicine

Forum, 29(1), 46-49.

[13] Ni Z H 2023 Construction of a predictive model for the risk of limited ability to perform activities

of daily living in elderly cardiac patients. Geriatrics research, 4(6), 33-38.

[14] Zhang X H 2023 Factor Analysis of Heart Disease Diagnosis Based on Logistic Regression and

Decision Tree. Modern information technology, 7(7), 117-123.

[15] Zhang Y Y, Ge R G and Sun G 2020 A study of patients' perceptions of excessive medical

examinations and influencing factors based on binary logistics regression. China Health Care

Management, 37(12), 893-895+899.

[16] Yan J J, Wu H, and Han B D 2020 Multifactorial Logistics Regression Analysis of Risk Factors

for Residual Cavity Formation after Tuberculous Septic Thorax Surgery. Medical Innovation

in China, 17(18), 128-131.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0103

Analysis of the market value of Premier League attacker

Wenji Liu

Faculty of Science, University of British Columbia, Vancouver, V6Z 1T4, Canada

wliu36@student.ubc.ca

Abstract. The main purpose of this study is to use the method of multiple linear regression to

conduct a comprehensive discussion on “Factors affecting the market price of Premier League

striker players”. In the era of increasingly hot soccer, the transfer of stars is a big attraction in

the transfer period every year, but there are still many clubs signing overpaid and underpaid

players. The overall objective of this study is to find the determinants of players’ price, so as to

provide a reference for clubs to improve the utilization of funds in the transfer period. In this

study, a dataset of player data for the 17-18 Premier League season was first downloaded via

Kaggle. Then, the dataset obtained from Kaggle was used for empirical analysis to identify

correlations that significantly affect the market price of players, and multiple linear regression

analysis was performed after processing these data. Through the calculations, it was determined

that match performance and goals scored had a significant positive impact on market value, and

age and match possession had a non-significant negative impact on market value, which suggests

that there is a need for the relevant team managers to optimize these aspects in order to promote

a virtuous cycle of club development and team performance.

Keywords: Football, English premier league, market value.

1. Introduction

In recent years, with the growing interest in soccer, soccer has become not just a sport but a multi-billion

dollar industry that attracts fans, sponsors and investors from all over the world [1]. In this industry, the

English Premier League (EPL) is one of the most popular and competitive leagues with significant

transfer fees and salaries for top players [2]. In particular, the value of strikers in the EPL has been a

topic of great interest, with a variety of factors influencing their market value [3].

The literature suggests that there are a number of independent variables that can have an impact on

player value. There are also various models used to assess the performance rights of soccer players.

Some of the more important ones are age, performance points (goals and assists weighted), playing time,

starts, red and yellow cards, etc. [4] In addition, due to the high number of injuries and illnesses that can

be caused in soccer, physical factors, especially the presence or absence of disease, are also one of the

possible considerations for player value [5].

In addition, the issue of financial inputs in the field of sports plays an important role in the economic

sphere. There are two reasons for this situation, namely external and internal reasons. External reasons

refer to different companies realizing their respective soccer investment goals. For example, non-profit

relationships with sports clubs are utilized to build strong international sports brands, such as

Manchester City and Real Madrid [6]. On the other hand, there are also internal reasons that are closely

related to sporting activities, such as loyalty to the club and an emotional connection to the sport.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0108

(https://creativecommons.org/licenses/by/4.0/).

However, both reasons are directly related to the sporting performance of the sponsoring discipline and

the wider context of any sporting event, hence the focus of this study is on soccer performance as a

source of strong emotional responses from sporting event participants (e.g., sport administrators,

sponsors, and spectators) [7].

The aim of this paper is to analyze the factors affecting the value of strikers in the Premier League

and to develop a linear regression model to value footballers playing in the striker position, taking into

account econometric modeling assumptions. By examining a range of variables such as performance

indicators, age and nationality, the study seeks to provide a comprehensive understanding of the drivers

of transfer fees and salaries for these players [8]. Understanding these factors is crucial not only for

clubs and agents involved in player transfers, but also for fans and analysts wishing to assess the market

value of players and enables clubs and stakeholders to make more informed decisions in the transfer

market to increase the value of their investment and ultimately the spectacle of the game [9, 10].

2. Methodology

2.1. Data source

The data used in this study was taken from the Kaggle website and has a cut-off date of the end of

October 2018. The dataset contains all available information on the variables of in-game performance,

market value and nationality for all forward players in the Premier League. A linear econometric

mathematical model was used to price the hypothetical market value of soccer players. In the

econometric modeling of soccer players’ performance rights, this work attempts to eliminate all formal

estimation problems such as normality of residuals, linear relationships and heteroskedasticity. The

result is a new linear regression model that prices the market value of the most valuable forward players

using selected variables and appropriate estimation methods.

2.2. Indicator selection

The analysis carefully selects specific indicators to deepen the understanding of the factors that influence

player value. These metrics include factors such as age, nationality, on-field performance, utilization

rate, and club. The analysis ensures that these metrics will be an effective tool for analyzing the complex

dynamics of forwards’ market value (Table 1).

Table 1. Descriptive analysis

Indicator

Mean ± standard deviation

Variance

Median

Standard error

Age

25.857±3.681

13.548

0.297

Page_views

1122.760±1190.539

1417382

671.5

95.936

Fpl_value

6.458±1.715

2.941

0.138

Fpl_sel

0.037±0.067

0.005

0.011

0.005

Fpl_points

66.675±63.980

4093.502

5.156

By utilizing these datasets, this paper seeks to delve into the complexities of player value. While

acknowledging the comprehensiveness of these datasets, the author must also recognize their limitations,

particularly in terms of on-field performance as well. These considerations are critical to maintaining

the integrity and validity of the analysis. Furthermore, the careful selection of indicators forms the

cornerstone of revealing the multifaceted influences on the value of Premier League strikers. Through

this integrated approach, the author aims to provide a valuable contribution to the existing knowledge

base in this area by elucidating the complex interplay between the various factors that influence player

value.

2.3. Method introduction

The study began with data screening to select potentially relevant variables, and data were analyzed

using multiple linear regression. The study selected age, number of times the players’ wiki interface was

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0108

accessed, points scored in the match, possession and overall value of the match as independent variables.

Descriptive and frequency analyses were conducted on these variables to highlight their characteristics

and to facilitate the eventual multiple linear regression analysis of player value.

3. Results and discussion

3.1. Correlation analysis

Multiple linear regression analyses were conducted using age, number of visits to the player wiki

interface, match score, possession and total match value as independent variables and market value as

the dependent variable. The following table shows the Pearson visualization chart between five

independent variables and dependent variable (market value) (Figure 1).

Figure 1. Pearson visualization chart of variables

From Figure 1, all the independent variables except age have a high positive correlation with the

dependent variable (MARKET VALUE). While the correlation coefficient between age and market is

only -0.024, indicating that there is no significant linear correlation between their two variables. Using

the correlation plot as a basis, this experiment continued with a linear regression analysis of those five

variables. The table shows that 154 samples participated in the analysis without any missing data (Table

2).

3.2. Model results

From Table 2, it can be seen that AGE, PAGE_VIEWS, FPL_VALUE, FPL_SEL, and FPL_POINTS

are the independent variables, and MARKET VALUE is the dependent variable in the multiple linear

regression. It can be seen that the model is formulated as:

𝑚𝑎𝑟𝑘𝑒𝑡 𝑣𝑎𝑙𝑢𝑒 = −28.824 − 0.232 ∗𝑎𝑔𝑒 + ⋯ + 0.024 ∗𝑓𝑝𝑙 𝑝𝑜𝑖𝑛𝑡𝑠 (1)

The above equation shows that as age and fpl_sel increase, the market value of a player decreases.

When page_review, fpl_value and fpl_points increase, the player’s market value also increases. In

addition, changes in fpl_value and fpl_sel significantly affect market value due to differences in the

coefficients.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0108

Table 2. Summary of the results of the multiple linear regression analysis

Non-standardized

coefficient

Standardized

coefficient

Covariance

diagnosis

Standard

Error

Beta

VIF

Toleranc

age

-0.232

0.145

-0.056

-1.603

0.111

1.077

0.929

Page_vie

0.001

0.047

0.902

0.369

2.396

0.417

fpl_value

7.341

0.584

0.827

12.562

0.00**

3.809

0.263

fpl_sel

-9.102

10.402

-0.04

-0.875

0.383

1.869

0.535

fpl_points

0.024

0.012

0.101

1.981

0.049*

2.29

0.437

3.3. Discussion

The combined analysis shows that age and possession significantly reduce the market value of a player.

However, fpl_value and fpl_points increase a player’s market value. In addition, the number of hits on

a player’s wiki page does not affect market value. In fact, a player’s off-season game performance also

tends to be negatively correlated with age during the game season. And, managers often judge whether

a player deserves a higher salary based on the player’s in-game performance, such as fpl_value,fpl_sel.

Overall, the linear regression model developed in this experiment can more clearly help clubs intuitively

determine the appropriate salary.

In addition to the quantitative variables analyzed above, experts also believe that the market value of

a striker is affected by a variety of other factors, including nationality, performance of the club in which

he is playing, whether he is a foreign player, and whether he is in a BIG6 club.

4. Conclusion

In this study, a multiple linear regression model was used, with player market value as the dependent

variable, and age, overall on-field performance, on-field goals, on-field possession, and daily hits on

players’ wiki pages as the independent variables. Meanwhile, this paper also considers some control

variables, such as player position and league level, to ensure the accuracy of the research results. This

paper delves into the relationship between many influencing factors of the price of Premier League

striker players. By analyzing a large amount of player data, this paper produces a series of statistical

results and calculates the degree of influence of all independent variables on the market value of players.

When other potential variables are taken into account, age, overall match performance, number of goals

scored in a match and match possession are found to have a significant impact on a player’s market

price. Based on the regression model of the study, some suggestions can be made for scouts and

managers to operate in future transfer periods. When choosing transfer targets as well as determining

prices, businessmen should carefully consider these four factors to improve the effective utilization of

funds and thus improve the team’s performance.

Meanwhile, analyzing factors influencing football market value is crucial for understanding the

economics of the sport. Future research in this area should consider several key elements: injury history

and fitness, contractual factors, club performance and financial health, economic indicators. A player’s

injury history and current fitness levels significantly impact their market value. Longitudinal studies

tracking injury patterns and recovery times can help predict future performance and market fluctuations.

Besides, the length and terms of a player’s contract, including buyout clauses and salary, play a

significant role in market value. Analysis of contract trends across different leagues can provide a

comparative understanding of how these factors influence valuations. Furthermore, the financial

stability of a player’s club and its performance in domestic and international competitions can affect

market values. Clubs with higher revenues and successful track records are likely to influence higher

market valuations for their players. Apart from those indicators, broader economic factors, such as

inflation rates, currency exchange rates, and global economic health, can also impact football market

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0108

values. Studies examining the correlation between these economic indicators and football market trends

would provide valuable insights. By integrating these factors, future research can develop more

comprehensive models to predict football market values, aiding clubs, agents, and investors in making

informed decisions. Advanced statistical techniques and machine learning algorithms could be

employed to handle the complexity and interdependence of these factors, providing a robust framework

for market analysis.

References

[1] Kun Z 2002 Relation between supply and demand in the occupational football market of China.

Journal of Physical Education.

[2] Tobar F and Ramshaw G 2022 Welcome to the EPL: analysing the development of football

tourism in the English Premier League. Soccer and Society, 23(4), 432-450.

[3] Kennedy P and Kennedy D 2017 A political economy of the English Premier League. In

Routledge eBooks, 49-69.

[4] Adiwiyana H I and Harymawan I 2021 Factors that determine the market value of professional

football players in Indonesia. Jurnal Dinamika Akuntansi, 13(1), 51-61.

[5] Hägglund M, Waldén M and Ekstrand J 2012 Risk factors for lower extremity muscle injury in

professional soccer. ˜the œAmerican Journal of Sports Medicine, 41(2), 327-335.

[6] Majewski S M 2015 Is this a business that feeds on emotions or is it an ALTRUSM behavior?

Polish football financing case. Acta Universitatis Lodziensis. Folia Oeconomica.

[7] He M, Cachucho R and Knobbe A J 2015 Football Player’s Performance and Market Value.

LIACS, 87-95.

[8] Majewski S 2016 Identification of factors determining market value of the most valuable football

players. Journal of Management and Business Administration Central Europe, 24(3), 91-104.

[9] Metelski A 2021 Factors affecting the value of football players in the transfer market. Journal of

Physical Education and Sport, 21, 1150-1155.

[10] Adiwiyana H I and Harymawan I 2021 Factors that determine the market value of professional

football players in Indonesia. Jurnal Dinamika Akuntansi, 13(1), 51-61.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0108

Harmonic analysis approach to the proof of Heisenberg

inequality

Yuchen Wang

School of Northeast YuCai, Shenyang, 110000, China

yyluyinbo@tzc.edu.cn

Abstract. the Heisenberg uncertainty Principle is a fundamental principle in quantum mechanics,

which was developed by the German physicist Werner Heisenberg and was proposed by him in

1927. This principle states that for a pair of physical quantities that share phase space, such as

position and momentum, it is impossible to accurately measure their values at the same time.

There are several variants of it in harmonic analysis studies, and the article will introduce some

of them in  space and  space. In the process of providing the Heisenberg inequality, the

article proved the Plancherel identity and Schwartz inequality by using Fourier transform and

inverse Fourier transform. Finally, author solved the equation of the wave function 󰇛󰇜 . The

famous physicists Heisenberg proposed one of the more novel ideas in quantum mechanics – the

existence of unobservable orbits cannot be assumed, which did bring great influence in quantum

mechanics. The article will introduce the conception of Heisenberg inequality and try to finish

the proof.

Keywords: Fourier transform, inverse Fourier transform, Cauchy-Schwartz inequality,

Plancherel identity.

1. Introduction

Harmonic analysis is a branch of mathematics that deals with the expansion of functions into Fourier

series or Fourier integrals and related problems. It originates from the superposition problems of

decomposing a periodic oscillation into simple harmonic oscillation in physics, and has now developed

into a discipline with wide application [1]. Harmonic analysis not only involves mathematics, but also

plays an important role in many disciplines such as information processing and quantum mechanics.

Harmonic analysis is also used in tidal analysis, through which the tidal changes in a certain period can

be calculated and the tidal properties of the area can be analyzed. Thus, harmonic analysis of tides is an

important method used in Marine engineering for the analysis prediction of tidal changes [2].

Quantum mechanics, as a physical theory, is a branch of physics that studies the motion laws of

microscopic particles in the material world. It mainly studies the basic theories of the structure and

properties of atoms, molecules, condensed matter, as well as atomic nuclear and elementary particles.

Together with relativity, it forms the theoretical basis of modern physics. Quantum mechanics is not

only one of the basics theories of modern physics, but also widely used in chemistry and many other

modern technologies [3].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0123

(https://creativecommons.org/licenses/by/4.0/).

Heisenberg inequality, which is also called by Heisenberg principle of uncertainty, is the bridge

between the two theories. And the article will focus on how to prove the Heisenberg inequality using

harmonic analysis and apply the results to the quantum mechanics.

2. Methods and Theory

2.1. background knowledge and method

The author can use the method of taking the sum of a series of orthogonal basis to approximate a periodic

function, essentially turning it into a sum of functions representing different frequencies [4]

󰇛󰇜 

∞

∞󰇛1󰇜

To calculate c, the author will use the properties of orthogonal basis to simplify the result.

Multiplying on both sides of the equations, the author will get

󰇛󰇜  󰇛󰇜

∞

∞󰇛2󰇜

Taking the definite integral from 0 to T at both ends of the above equation

1

 󰇛󰇜



0󰇛3󰇜

Now having accessed with the definition of Fourier series, the article will introduce Fourier transform

to you. The author will begin by expanding the function f(t) as Fourier series on the interval 󰇟󰇠

󰇛󰇜 

∞

∞1

 󰇛󰇜

2



2

󰇛4󰇜

On can take the limit of T tends to infinite, then the author will get [5]

󰇛󰇜1

2󰇧 󰇛󰇜

∞

∞󰇨

∞

∞󰇛5󰇜

Then, the article has shown the definition of Fourier transform, the new function is only related to

the given frequency w, which describes the distribution density of the component in f(t)

󰆹 󰇛󰇜

∞

∞󰇛6󰇜

2.2. structure and content of the article

In the first part of the Sec. 3.1, the author will choose a certain dense subspace with good properties

∞󰇛1󰇜, in which space, the equality can be proved easily only through properties of complex numbers,

integration by parts, Cauchy-Schwartz inequality and the properties of rapidly decreasing function. All

the properties will be proved by the author later. In the second Sec. 3.2, the author will generalize the

results proved in ∞󰇛1󰇜 to a more general function space 2󰇛1󰇜. The author uses a function series 

to approximate function f, which converges uniformly to 0 in the integral as n approaches infinity, which

is also convergent, the original function and derivative being convergent under the 2 norm. In this

circumstance, the derivative approximates the f derivative. The squares of the two norms remain 0 and

form the square of the integral. If n goes to infinity, the equation holds, which is easy to estimate later

with inequalities. Finally, the author finds the specific function 󰇛󰇜 by solving an ODE, hence getting

the results and the application condition.

3. Results and Application

3.1. proof in 

󰇛󰇜space

The calculation and properties of complex numbers are very important in this paper. By doing which,

author will do the contraction [6]

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0123

1

2󰇛󰇜7

If , then 󰇛󰇜. One can prove the inequalities in 󰇛󰇜 space by

using the calculation related to the Fourier transform and the inverse Fourier transform.

󰆹󰇛󰇜 󰇛󰇜

∞

∞8

󰇛󰇜 󰆹󰇛󰇜

∞

∞9

By using the properties of rapidly decreasing function, one can know

lim

∞󰇛󰇜010

󰇛󰇜21

∞

∞11

Then, let the author prove the Heisenberg inequality if f

∞󰇛󰇜, which is a rapidly decreasing

function.

42 2󰇛󰇜2

∞

∞ 2

∞

∞󰆹󰇛󰇜212



 󰇛󰇜2

∞

∞ 2󰆹󰇛󰇜2

∞

∞13

Then, the author will use Plancherel’s identity. The proof is as follows.

 󰇛󰇜2

∞

∞ 󰇛󰇜󰇛󰇜

∞

∞14

 󰇛󰇜

∞

∞

2 󰇛󰇜

∞

∞15

1

2 󰇛󰇜

∞

∞ 󰇛󰇜

∞

∞16

1

2 󰇛󰇜

∞

∞󰇛󰇜17

 1

2󰇛󰇜2

∞

∞18

In the steps 2 and 3, just take inverse Fourier transform and Fourier transform in order. Then, by

using Plancherel’s identity, the author can turn the formula into [7]

 󰇛󰇜2

∞

∞ 󰈅󰇛󰇜

 󰈅2

∞

∞19

Then, the author will use Cauchy-Schwarz’s inequality: For any two elements x and y in the inner

product space, Schwarz’s inequality states that the square of the absolute value of their inner product is

not greater than the product of their norms.

Here, the author is going to prove Schwartz’s inequality. For functions 󰇛󰇜,

󰇛󰇜

2 󰇡20󰇢 20

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0123

󰇛󰇜2󰇛󰇜󰇛󰇜

2󰇛󰇜

2󰇛󰇜󰇛󰇜󰇛󰇜

422󰇛󰇜2

221

And because of , then the formula can be written as [8]

󰇛󰇜22

The result is as required. According to the inequality which the author has proved yet, one can turn

upwards formula into

 ′󰇛󰇜󰇛󰇜

∞

∞223

Because the basic property of complex number (the norm of a complex number is greater than the

norm of its real part), 

󰇛󰇜. Then, the author can change the result into

 1

∞

∞󰇡′󰇡′󰇢󰇢224

The properties of complex function show that If A=a+b, then 󰇛󰇜. By using

this properties, the author can rewrite result [9]

1

4 1

2󰇡′′󰇢

∞

∞225

By using the derivative multiplication rule: 󰇛󰇜󰆒󰆒󰆒, the result 󰇛󰆒󰆒󰇜 can be

written as 󰇛󰇜󰆒. Thus, the result can be transformed into

1

4 󰇡2󰇢′

∞

∞226

In the next step, the author will use integration by parts

1

4󰇡2󰇢∞

∞ 2

∞

∞227

Because the function is a rapidly decreasing function, which means the first polynomial equals to 0.

Then people can get the result

1

4󰇛󰇜2

∞

∞21

42

428

People now know in this space, the norm of f is 1. Thus, the article have got the result

 2󰇛󰇜2

∞

∞ 2󰇛󰇜2

∞

∞ 1

16229

3.2. proof in 󰇛󰇜space

Firstly, the author proved in a certain dense subspace with good properties. Then, the author will

generalize to a more general function space, which is proving the equality in 2space.

Because 20 (function f is a rapidly decreasing function in any space), one may assume that

󰆹2∞. If the opposite circumstance holds, there’s nothing to prove because the result will be much

greater. In this case, you can’t measure accurately both the location and the momentum of a particle.

This means the Plancherel’s identity which the author has proved, ′

2󰆹, also holds in the 21

space. Thus, the proof for ∞1 also holds for this circumstance [10]

 ′′

∞

∞󰇛30󰇜

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0123

Now, the author set a function ∞in order to approximate f . Because the subspace is dense,

there are continuous functional series. As for ∞ the series converges uniformly to 0 in the integral,

which is also analytically convergent, and the original function and derivative converge under the 2

norm. The function meets the requirement

lim

∞1422

󰆹2

∞

∞󰇛31󰇜

lim

∞2

2′′2

2lim

∞1422

󰆹20

∞

∞󰇛32󰇜

They can be proved simply by finding two equations

2

2 

󰆹2

∞

∞′′2

2 422

󰆹2

∞

∞󰇛33󰇜

And because of󰇛󰇜󰇛󰇜

󰆹. The author can use Cauchy-Schwartz inequalities to

zoom the formula

󰇩 14221

∞

∞󰇪12

󰇩 14222

󰆹2

∞

∞󰇪12

󰇛34󰇜

By using Schwartz’s inequality. For any fixed , and one can have

 ′′

∞

∞lim

∞lim

∞ ′′

 

lim

∞lim

∞󰇩󰇛2󰇜

 2

 󰇪

lim

∞󰇛󰇜2󰇛󰇜22

2󰇛35󰇜

The proof in this step is with the same logic with the one in 2.1, because the function is a rapidly

decreasing function, one can rewrite the result into 

. Then, the author has finished the whole

proof in this space.

For the next step, the article will focus on the specific wave function  . By observing the proof,

author finds that for specific , which always satisfies a differential equation 󰆒󰇛󰇜󰇛󰇜. Then,

the author solves the ODE by separating variables, the author gets the solution, which is 󰇛󰇜



 , and  

, .

3.3. Results and Application

The exact expression of Heisenberg’s inequality first appeared in the study of quantum mechanics when

researchers were trying to determine the position and momentum of an example at the same time.

Suppose that there is a electron moving along a line and there are laws of physics that can be described

by a state function .

The position of the electron is described by the probability that the particle located in (a,b). Function

󰇛󰇜2 is the density function, and the expectation function is

 󰇛󰇜2

∞

∞󰇛36󰇜

Then the author can discuss the value of the x that minimizes the error, which is a great significance

in quantum mechanics. The error is 󰇛󰇜2󰇛󰇜2

∞

∞󰇛37󰇜

And the error of the momentum is 󰇛0󰇜2󰇛󰇜2

∞

∞󰇛38󰇜

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0123

4. Conclusion

According to the Heisenberg inequality the author has proved, the result is just the product of the error

of the position and momentum is greater than 1 162

. There are plenty of applications of Heisenberg

inequality. For example, magnetic resonance imaging is a medical imaging technique used to observe

the internal structure of biological tissues. In magnetic resonance imaging (MRI), the resonance signal

of the atomic nuclear can be obtained by applying the enhanced magnetic field and electromagnetic

pulse to the object under test. According to Heisenberg’s uncertainty principle, doctors cannot accurately

measure the position and momentum of an atomic nucleus at the same time, so in MRI, people can only

get position or momentum information to a certain extent, which is why MRI images are often blurry.

The Heisenberg Uncertainty Principle is a fundamental principle in modern physics that has profoundly

changed people’s understanding of the natural world. Although this principle prevents people from

accurately determining the position and momentum of an elementary particle at the same time, it has

not stopped people from using this principle to perform some important calculations and analysis. In the

future, with the development of science and technology, people may find more opportunities to use the

uncertainty principle to better understand and apply the fundamental laws of the natural world.

References

[1] McCarthy D W, Probst R C, Low F J. (1985). Infrared detection of a close cool companion to

Van Biesbroeck. Astrophysical Journal, 290, 29-42.

[2] Lévy-Leblond, Jean-Marc. (2021). Correlation of Quantum Properties and the Generalized

Heisenberg Inequality. American Journal of Physics, 54(2), 135–36.

[3] Lahti, Pekka J., Maciej J. Maczynski. (1987). Heisenberg Inequality and the Complex Field in

Quantum Mechanics. Journal of Mathematical Physics, 28(8), 1764–69.

[4] Grünbaum, F. Alberto. (2023). The Heisenberg Inequality for the Discrete Fourier Transform.

Applied and Computational Harmonic Analysis, 15(2), 163–67.

[5] Stan, Aurel. (2005). On Heisenberg Inequality. Communications in Contemporary Mathematics,

07(01), 75–88.

[6] De La Peña, Luis. (1980). Conceptually Interesting Generalized Heisenberg Inequality. American

Journal of Physics, 48(9), 775–76.

[7] Mueller, C., and Stan A. (2005). A Heisenberg Inequality for Stochastic Integrals. Journal of

Theoretical Probability, 18(2), 291–315.

[8] Wiener, Norbert. (1930). Generalized Harmonic Analysis. Acta Mathematica, 55, 117–258.

[9] Hewitt, Edwin, and Kenneth A. Ross. Abstract Harmonic Analysis. Springer Berlin Heidelberg,

1963.

[10] Schwab, Keith C., and Michael L. Roukes. (2015). Putting Mechanics into Quantum Mechanics.

Physics Today, 58(7), 36–42.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0123

Research on the influencing factors of student performance

Chenrui Pei

School of Civil Engineering, Southwest Jiaotong University, Chengdu, 610000, China

pcr21cp2@outlook.com

Abstract. The aim of this report is to analyze the factors influencing student performance and to

develop a predictive model for Grade Point Average (GPA) based on five aspects: demographic

details, study habits, parental involvement, extracurricular activities, and academic achievement.

Utilizing a multiple linear regression model, this report identifies key factors that significantly

impact academic performance. The dataset includes a total of 14 student characteristics, such as

parental education level, weekly study time, extracurricular activities, absences and so on.

Through stepwise regression, non-significant factors were iteratively eliminated, leading to the

development of a predictive model to determine the primary influences on student performance.

The research findings underscore the significant role of weekly study time, absences, tutoring,

parental support, extracurricular activities, sports, and music in student performance. In contrast,

age, gender, ethnicity, parental education, and volunteering have negligible impact on GPA.

These insights provide actionable guidance for educators and policymakers to implement

targeted measures to enhance student performance.

Keywords: Student performance, GPA, multiple linear regression.

1. Introduction

Student performance is a fundamental criterion for evaluating excellence, as it reflects learning ability,

intelligence, self-management skills, and more. High scores can boost students' self-confidence, help

them gain admission to better universities, secure scholarships, and attract employers' attention,

significantly impacting their future success [1]. The importance of academic performance often leads to

anxiety. The Survey Report on Chinese Parents' Educational Anxiety Index, released in September 2018,

analyzed 3205 questionnaires and found that the comprehensive anxiety index of parents' education

reached 67 points out of 100, indicating a relatively high level of anxiety [2]. Therefore, it is crucial to

discover the factors related to student achievement. The factors influencing student achievement and

educational outcomes are multifaceted, complex, and interrelated. Students' attributes and abilities,

social relationships, and family and societal structures all impact academic performance to varying

degrees [3]. Moreover, studies have shown that students' academic performance is related to their

cognitive style (CS), self-regulated learning (SRL), and working memory (WM) [4]. This paper aims to

identify suitable methods to determine the factors influencing student achievement and predict their

impact.

In 2021, Alani and Hawas conducted a comprehensive study on the factors affecting student

performance at Sohar University. They surveyed various faculties, gathering data from 562 students

through questionnaires. This data was critically analyzed using regression analysis. The study revealed

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0131

(https://creativecommons.org/licenses/by/4.0/).

that environmental factors significantly influence student performance, with students expressing a

preference for a quiet and comfortable university environment. Furthermore, the linear regression model

indicated that teachers with strong teaching skills and diverse teaching techniques positively impact

student performance [5].

In 2023, a teacher discovered a significant positive correlation between academic performance and

volitional quality through a comparative test of these factors in ordinary and excellent classes at a high

school. The volitional characteristics of students vary significantly across different grades and academic

levels, while gender differences in volitional character strength are not pronounced [6].

In 2024, Kocsis and Molnár conducted a study using meta-analyses and systematic reviews of up to

900 studies based on 600,000 university students to identify factors affecting student performance. The

results showed that output variables GPA and obtained credits (ECTS) are mediated by two parts:

student factors and throughput factors. Student factors include intrinsic motivation, self-regulated

learning strategies, self-efficacy, and prior education, while throughput factors include work, finances,

and academic engagement. However, there were contradictory results regarding age and family

conditions. GPA, ECTS, and gender are the most relevant factors affecting student performance [7].

In summary, this report will use regression models to identify factors impacting student learning and

build models to predict the relationship between student achievement and different factors.

2. Methodology

2.1. Data source

The dataset for this paper is from the Kaggle website (Student Performance Dataset). This dataset

contains comprehensive information from 2392 high school students, and all datasets were used in this

paper.

2.2. Variable selection

The dataset is sufficient and there are no missing data. Due to the fact that GPA and grade class are both

indicators of student academic performance, this paper chooses to delete grade class. In addition, as the

Student ID is only a serial number and has no impact on GPA, it is deleted.

Table 1. List of dependent and independent variables.

Variable

Logogram

Meaning

Age

𝑥1

The age ranges from 15 to 18 years

Gender

𝑥2

Male (0), Female (1)

Ethnicity

𝑥3

Caucasian (0), African American (1), Asian (2), Other (3)

Parental Education

𝑥4

None (0), High School (1), College (2), Bachelor's (3),Higher (4)

Study Time Weekly

𝑥5

Weekly study time in hours

Absences

𝑥6

Number of absences during the school year

Tutoring

𝑥7

No (0), Yes (1)

Parental Support

𝑥8

None (0), Low (1), Moderate (2), High (3), Very High (4)

Extracurricular

𝑥9

No (0), Yes (1)

Sports

𝑥10

No (0), Yes (1)

Music

𝑥11

No (0), Yes (1)

Volunteering

𝑥12

No (0), Yes (1)

GPA

𝑌

Grade Point Average on a scale from 2.0 to 4.0

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0131

The final selected data consists of 12 variables (age, gender, ethnicity, parental education, study time

weekly, absences, tutoring, parental support, extracurricular, sports, music, volunteering) and a

dependent variable (GPA). The specific student characteristics of this dataset are shown in Table 1.

2.3. Method introduction

This article employs a multiple linear regression model to fit student grades. In statistics, linear

regression determines a line that best represents the overall trend of a data set [8]. Multiple linear

regression is a statistical technique used to analyze the impact of several independent variables on a

dependent variable. This section will mainly aim to compare the predictive ability and fitting accuracy

of the model before and after removing some variables. The initial model includes 12 potential

explanatory variables.

By using the stepwise regression method, iteratively remove variables that show low statistical

significance. Stepwise regression is a technique that uses an automated process to select predictor

variables. This method evaluates variables at each step based on criteria for a series of T or F tests,

ultimately determining the final group of variables for the regression [9]. Rebuild the model after each

elimination and re-evaluate the remaining variables. After completing the stepwise regression, select the

final multiple linear regression model.

3. Results and discussion

3.1. Descriptive statistics

Visualizing the impact of gender and extracurricular on grade class through line graphs (Figure 1). The

division between genders is roughly equal, indicating minimal influence on student academic

performance. However, students participating in extracurricular activities demonstrate significantly

better grades compared to those who do not participate.

Figure 1. Line charts of Gender and Extracurricular on Grade Class

Bar charts effectively illustrate the quantitative relationships between ethnicity, parental support, and

student performance. As shown in Figure 2, the analysis reveals that the proportion of students across

different grades remains consistent, indicating that ethnicity does not affect student performance.

Conversely, there is a clear trend showing that higher levels of parental support correlate with better

student grades.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0131

Figure 2. Bar charts of Ethnicity and Parental Support on Grade Class

Scatter plots are used to measure the number of absences and age. Scatter plots visually display the

relationship between two variables and the approximate distribution of the data. They provide key

information such as data distribution, sample size, and the identification of outliers [10]. By studying

the distribution of the points on Figure 3, it is aimed to determine the correlation and to summarize the

distribution pattern of the points. For age, Figure 3 shows an approximate distribution of each age at

different grade class, suggesting that age has no impact on student performance. For absences, there is

a clear negative correlation between student performance and the number of absences. The more

absences, the lower the student grades.

Figure 3. Scatter plots of Absences and Age on GPA

3.2. Correlation analysis

In the dataset, there are a total of 12 factors that affect student performance, and the Pearson correlation

coefficient between these factors and GPA is shown in the following figure 4:

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0131

Figure 4. Sample Figure Caption

The study data reveals that absences have the strongest negative correlation with GPA, indicating

that the more a student is absent, the poorer their academic performance. In contrast, GPA shows

significant positive correlations with weekly study time, tutoring, and parental support, suggesting that

extracurricular study and external assistance can significantly boost academic performance.

Additionally, activities such as extra curricular, sports, and music are positively correlated with GPA,

though these correlations are not statistically significant. Factors like age, gender, ethnicity, parental

education, and volunteering exhibit very weak correlations with GPA. Overall, the factors influencing

GPA are multifaceted, with attendance, study habits, and parental support playing crucial roles.

3.3. Model

3.3.1. Initial Model

After conducting a correlation analysis of the factors influencing student performance, a multiple

regression analysis was performed to establish a comprehensive model that includes all variables. The

general mathematical model for multiple linear regression is as follows:

𝐸(𝑌)= 𝛽0+ 𝛽1𝑥1+ 𝛽2𝑥2+ ⋯ + 𝛽12𝑥12 + 𝑒 (1)

In the above formula: is a constant term, and e is the error term accounting for the variability not

explained by the independent variables.

Table 2 presents the regression coefficients of the multiple linear regression model. From the table,

it can be observed that X1, X2, X3, X4, and X12 have no significant impact on the dependent variable,

as their p-values are greater than 0.05. Additionally, all variables have VIF values close to 1, indicating

that there is no issue of multicollinearity. Therefore, there are 7 independent variables that have a

significant impact on the dependent variable Y. Based on the regression coefficients, the multiple linear

regression equation is as follows:

𝐸(𝑌)=1.3391 −0.006𝑥1+0.011𝑥2+⋯−0.005𝑥12 (2)

The fitted multiple linear regression model yields an R-squared value of 0.954 and an adjusted R-

squared value of 0.954, indicating a high degree of fit. Figure 5 shows the line plot comparing the test

data with the predicted data. The trends of the two lines are consistent and exhibit a high degree of

similarity. This suggests that the model effectively captures the overall trend of the data and achieves

high predictive accuracy, even though there are some deviations in specific values.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0131

Table 2. Regression coefficient table for the initial model

S.E.

Beta

P>|t|

VIF

Constant

1.3391

0.015

1.901

87.483

0.000

11.563

-0.006

0.005

-0.006

-1.424

0.155

1.012

0.011

0.009

0.005

1.164

0.245

1.006

0.005

0.004

0.005

1.085

0.278

1.004

0.000

0.005

0.000

0.027

0.978

1.006

0.166

0.005

0.166

36.687

0.000

1.005

-0.844

0.005

-0.844

-187.141

0.000

1.004

0.258

0.010

0.119

26.279

0.000

1.005

0.148

0.004

0.165

36.640

0.000

1.004

0.190

0.009

0.092

20.394

0.000

1.005

X10

0.185

0.010

0.085

18.861

0.000

1.006

X11

0.153

0.011

0.061

13.467

0.000

1.005

X12

-0.005

0.012

-0.002

-0.425

0.671

1.004

Figure 5. Test data and predicted data

3.3.2. Stepwise regression

Utilizing backward elimination, predictors will be iteratively removed from the initial model if their p-

values exceed the threshold of 0.05. Starting with the full model, X1, X2, X3, X4, and X12 will be

eliminated based on their initial p-values. After each removal, the model will be refitted, and the process

will be repeated until all remaining predictors have p-values below the threshold.

Based on the data above, it is evident that all predictor variables have p-values less than 0.05,

indicating significant effects on the dependent variable Y. Among them, X6 has a negative effect, while

the others have positive effects. The VIF values are relatively low, suggesting little multicollinearity

among the predictors. Therefore, the improved linear regression equation is:

𝐸(𝑌)=1.348 +0.166𝑥5−0.844𝑥6+⋯+0.152𝑥11 (3)

The fitted multiple linear regression model yields an R-squared of 0.954 and an adjusted R-squared

of 0.954, indicating a high degree of fit. The F-statistic is a statistical measure used to assess the overall

significance of the model. In this model, the F-statistic is 5649, with a corresponding probability value

close to 0, indicating that the model is significant.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0131

Table 3. Regression coefficient table for the improved model

S.E.

Beta

P>|t|

VIF

Constant

1.348

0.011

1.901

118.566

0.000

0.166

0.005

0.166

36.775

0.000

1.003

-0.844

0.005

-0.844

-187.392

0.000

1.002

0.258

0.010

0.118

26.293

0.000

1.350

0.148

0.004

0.165

36.624

0.000

2.015

0.190

0.009

0.092

20.502

0.000

1.443

X10

0.186

0.010

0.085

18.965

0.000

1.341

X11

0.152

0.011

0.061

13.455

0.000

1.209

3.3.3. Comparison results

Based on the results of two linear regression models, Table 4 lists the characteristics of the two models

used to compare their performance in fitting and predictive ability.

Table 4. Comparison between the two models

Initial model

Improved model

R-squared:

0.954

Adj. R-squared

0.954

F-statistic

3295

5649

MSE

0.03866

0.03877

RMSE

0.19663

0.19691

AIC

-776.1

-781.4

BIC

-703.9

-737.0

Based on the comparison, the two sets of model results show very close values for R-squared,

adjusted R-squared, MSE, and RMSE, indicating similar performance in fitting the data and predicting

accuracy. However, the F-statistic value of 5649 for the improved model is significantly higher than the

initial model, suggesting that the variables in the second model have a more significant overall impact

on the dependent variable (GPA). From the perspective of AIC and BIC, the values of the improved

model are smaller, indicating a slight advantage in balancing model fit and complexity. In conclusion,

the improved model, as compared to the initial model, demonstrates better variable influence, fitting

effectiveness, and simplicity.

4. Conclusion

This study aims to explore the factors influencing student achievement and predict their impact through

comprehensive data collation and multiple linear regression analysis. The dataset includes information

from 2,392 students and initially comprises 13 variables. After thorough preprocessing, the dataset

underwent analysis using various visualization techniques such as line charts, bar charts, and scatter

plots. These visualizations provided a preliminary analysis of the significance and correlation (positive

or negative) of various factors on student achievement. Notably, among the three factors negatively

correlated with GPA, except for absences, gender and parental education showed no significant

correlation with GPA.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0131

This study initially employs a multiple linear regression model, integrating all influencing factors to

further examine their relationship with student performance. Factors with low correlations were

excluded based on p-values, and VIF values were checked to avoid multicollinearity issues. Ultimately,

stepwise regression confirmed that seven factors significantly impact student performance: study time

weekly, number of absences, tutoring, parental support, extracurricular, sports and music. Among these

factors, absences were negatively correlated with student performance, while the other factors were

positively correlated.

In this study, the influence of different factors on students' achievement is determined, but only the

overall prediction is made, and the impact of each factor on students' achievement cannot be accurately

specified. In order to improve this, different factors can be reanalyzed and grouped, and linear regression

can be performed again to obtain the influence of single or a small number of combined factors on

student achievement.

References

[1] Plessis S 2023 5 Reasons Why Grades Are Important. Working paper.

[2] Li J 2021 A Study on the Formation Mechanism of Educational Anxiety among Parents of

Primary and Secondary School Students: A Case Study of Chongqing City, Chongqing

University of Business and Technology, 10, 16-21.

[3] Utah State Board of Education 2019 Factors influencing student learning. Hanover Research, 1-

[4] Wang T and Kao C 2022 Investigating factors affecting student academic achievement in

mathematics and science: cognitive style, self-regulated learning and working memory.

SpringerLink, 50(5), 789-806.

[5] Alani F S and Hawas A 2021 Factors Affecting Students Academic Performance: A Case Study

of Sohar University. PSYCHOLOGY AND EDUCATION, 58(5), 4624-4635.

[6] Yong Z 2023 A Study on the Correlation between Academic Performance and Willpower Quality

of High School Students, Journal of Ningxia University, Humanities & Social Sciences Edition

45(4), 142-149.

[7] Kocsis A and Molnar G 2024 Factors influencing academic performance and dropout rates in

higher education. Oxford Review of Education, 1-19.

[8] Stewart K 2024 Linear regression Britannica. Working paper.

[9] Miller A and Panneerselvam J 2021 A review of regression and classification techniques for

analysis of common and rare variants and gene-environmental factors. Science Direct, 466-

485.

[10] Sainani K L 2016 The Value of Scatter Plots. Statistically Speaking, 1213-1217.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0131

Analysis of the Relationship between NBA Player Salary and

Their On-Court Performances

Zijian Yang

Department of Statistics, University College London, WC1E 6BT, the United

Kingdom

Christianyzj@outlook.com

Abstract. This research attempts to investigate the connection between NBA player salary and

on-court performance. By collecting and analyzing NBA player salary data and related game

statistics, some interesting trends and correlations are found. Through scatter plots, error-bar, bar

chart and clustered line showing the direct relationships between the factors of players’

performance and their salaries. In addition, the importance of the independent and dependent

variables is examined using correlation analysis in order to judge their positive or negative

relationships. Linear Regression model could show the level of influencing on the variables. The

results of the study show that some highly paid players perform well in the game. Further

statistical analysis shows that players’ score attempt is not the only factor affecting their salaries,

and factors such as assists and blocks made per game also play an important role. These findings

have implications for managers, players and fans, and help to better understand and evaluate the

true value of players.

Keywords: Player salary, correlation analysis, linear regression model.

1. Introduction

The National Basketball Association (NBA) stands as one of the most prominent professional basketball

leagues globally, featuring elite athletes known for their exceptional skills and commanding salaries.

The analysis of NBA player salaries is a topic of significant interest, influenced by various factors that

shape the financial landscape of the league. Understanding the determinants of NBA player salaries is

crucial for players, team management, fans, and researchers seeking insights into the intricate dynamics

of sports economics.

Players' performances generate income for the owners, who then pay the players according to these

earnings [1]. Wang used adaptive Lasso, SCAD and Elastic Net to explore the main factors affecting

the level of players' salary in the statistical analysis of NBA players' salary, and found that Ridge

regression, Lasso and Elastic Net had similar mean square error due to other models. All are located

near 0.21 [2]. Many empirical studies have examined wage trends in baseball and other sports due to the

wealth of performance and compensation data available for athletes in professional team sports [3].

Regressions calculating distinct income and performance trajectories for each talent quintile were

conducted in order to demonstrate the degree of bias generated by typical ordinary least squares (OLS).

By using this method, the likelihood that pooled regressions of productivity or income on experience

would provide a "flatter" temporal profile than what is actually the case will be decreased [3]. The fact

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0147

(https://creativecommons.org/licenses/by/4.0/).

that this player is in the league is the cause of the significant TV money these clubs are making, along

with the other NBA players. As a result, players must get payment for both their on-court performance

and the portion of TV contract earnings that they are accountable for [4]. The test statistics and all of

the coefficients are valid and significant.

In terms of most performance metrics, like as points or rebounds, a player's stronger performance

during contract year will translate into more financial savings for the organization in the year of signing

a new deal [5]. The top 25% and top 50% of NBA players had the biggest rises in their proportion of

total salary paid. In the 1985–86 season, the top 25% received almost 56% of all wages paid; in the

2015–16 season, they received roughly 64% of all salaries paid, an increase of eight percentage points.

The percentage of total compensation paid to athletes in the top 50% increased similarly [6]. Greater

returns to skill: The most proficient athletes will probably make more money, while the less skilled

athletes will probably make less, if returns to skill rise and can account for more variance in earnings

[6]. In this study, usage rate (USG%) also yielded an interesting finding. The number of possessions a

player "uses" in a game is known as their use rate. Thus, athletes that are the center of attention for

offensive, such as point guards like Kevin Durant and LeBron James, position played, market size,

endorsements, and team success, this study seeks to uncover patterns and relationships that shed light

on the salary structures with high usage rates. Salary and USG% have a positive correlation, which

makes sense given that a player who is using the ball more frequently is taking more shots for his team

[7]. Applying a straightforward regression analysis to confirm the relationship between pay and altruistic

behavior. It is discovered that there is no collinearity in this regression when it comes to the association

between altruistic conduct and wage, since the VIF value is less than 10 and the overall model F value

is significant. Regression coefficient β=0.457 shows that altruistic behavior is significantly positively

impacted by wage [8].

It is believed that players may accept the idea that some players are better than others and that the

better players should get paid more, regardless of the work and production of each individual player.

However, the second aspect of pay disparity is what this paper refers to as "unjustified inequality," or

inequality that is not supported by and dependent on performance judgments included in the model of

compensation determination [9]. Empirical research by Staw and Hound demonstrated that the National

Basketball Association (NBA) uses the draft order in addition to a player's predicted on-court production

when allocating playing time [10]. This statistical research aims to delve into the key factors influencing

NBA player salaries. By analyzing a diverse set of variables such as player performance metrics, through

a rigorous statistical approach, this paper aims to provide a nuanced understanding of the intricacies

surrounding NBA player compensation, offering valuable insights into the drivers of remuneration in

professional basketball.

2. Methods

2.1. Data Source

The Kaggle website has the dataset that was utilized in this work (NBA Player Salaries, 2022-2023

Seasons). This dataset contains several factors of the players’ on-court performance with 467 samples

and more than 50 variables. The dataset combines player per game and advanced statistics from the

NBA 2022-2023 season with player salary data to create a comprehensive resource for learning about

the financial and performance elements of basketball players that play professionally. The dataset is the

outcome of obtaining traditional per-game and advanced statistics from Basketball Reference in addition

to player salary data from Hoopshype.

2.2. Variable Selection

The paper's data set includes a total of 467 NBA players with different positions and different ages.

However, their on-court performance (Free Throw, 3 Points Attempts, 2 Points Attempts, Blocks Per

Game, Assists and Win Shares), those 6 variables are the determinants of players’ salary. The basic

overview of each quantitative variable is shown in Table 1.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0147

Table 1 lists the six components of NBA players' on-court performance together with the maximum,

minimum, mean, standard deviation, and median values of their salary. The total number of samples are

467 and 7 variables are chosen.

Table 1. Overview of quantitative variables.

Variables

Min

Max

Mean

Median

Salary (Million)

0.005

48.07

8.417

10.708

3.7

1.436

1.569

0.9

3PA

11.4

2.793

2.261

2.4

2PA

17.8

4.325

3.571

3.2

BLK

2.5

0.379

0.364

0.3

AST

10.7

2.108

1.958

1.4

-1.6

12.6

2.329

2.533

1.5

2.3. Method Introduction

The effect of X (quantitative or categorical) on Y (quantitative), as well as the existence, direction, and

strength of any influence link, are investigated using regression analysis. Initially, the model's fitting is

examined using the R-square value. The VIF value and tolerance value may also be examined; tolerance

= 1/VIF value and a VIF value more than 5 suggests the presence of a collinearity issue. Tolerance less

than 0.2 suggests a collinearity issue. Check to see whether the model has any collinearity issues.

whether so, ridge regression or stepwise regression can be used to remedy the issues. Next, the

importance of X is examined in this work; if it is significant (p value < 0.05 or 0.01), it indicates that X

influences Y. A detailed analysis of the impact relationship's direction is then provided.

3. Results and Discussion

3.1. Descriptive Analysis

Figure 1 shows 6 factors of NBA players’ on-court performance that might be related to their salaries

and what the relationships are between the 6 factors and salaries respectively. It is clearly to see from

the 6 scatter plots that the 6 factors (Free Throw, 3 Points Attempts,2 Points Attempts, Blocks Per Game,

Assists and Win Shares) have a remarkable relationship with NBA players’ salaries.

Figure 1. Scatter plot of the relationship between FT and Salary

The scatter plot above clearly shows the linear relationship between FT and Salary, and the

relationship between these two variables is positively linear and strongly correlated. The scatter plot

shows that when NBA players use more FT on the court, their Salary will be higher.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0147

Figure 2. Scatter plot of the relationship between 3PA and Salary.

Figure 2 shows that the relationship between the two variables 3PA and Salary is positively and

moderately related. It means that NBA player could get much higher salary when they get more 3 points

attempt on the court.

Figure 3. Pie chart of the relationship between 2PA and Salary

It can be seen from the above Figure 3, ROC curves are constructed for a total of one Salary item to

judge its diagnostic value for 2PA, and the "gold standard" is set first. Take the number 1.000 as the

cutting point, 1.000 as the positive, and the others as the negative. The proportion of positive is 3.00%,

and the proportion of negative is 97.00%.

Figure 4. ErrorBar of the relationship between BLK and Salary

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0147

Figure 4 indicates that BLK has a week but still positive relationship with Salary, since when the

value of BLK gets higher, the value of Salary gets approximately higher. The trend is approximate but

still clear and convincing.

Figure 5. Bar Chart of the relationship between AST and Salary

The figure 5 above demonstrates that the relationship between AST and Salary is strong and positive.

Because the bars from the left to the right get nearly taller and taller, it can sufficiently tell that with

more assists on the court, NBA players could get higher salary.

Figure 6. Clustered line of the relationship between WS and Salary

Figure 6 shows that WS has a quite strong and approximately positive relationship with Salary. In

above Figure 1- 6, it is not hard to find that those 6 factors (Free Throw, 3 Points Attempts, 2 Points

Attempts, Blocks Per Game, Assists and Win Shares) all have a direct, strong and positive relationship

with players’ salary which means that NBA players could get a higher salary by better on-court

performance.

3.2. Correlation Analysis

Table 2 below shows how the correlation analysis was used to examine the relationship between Salary

and FT, 3PA, 2PA, BLK, AST, and WS, respectively, and how strong the relationship was expressed

using the Pearson correlation coefficient.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0147

Table 2. Pearson correlation between 6 factors and the salary

Salary

0.674**

3PA

0.492**

2PA

0.682**

BLK

0.301**

AST

0.594**

0.625**

* p<0.05 ** p<0.01 significant level

There is a considerable positive association between salary and FT, as shown by the correlation value

of 0.674 between the two variables and the significance level of 0.01. There is a considerable positive

association between salary and 3PA, as seen by the correlation value of 0.492 between the two variables,

which has a significance level of 0.01. There is a considerable positive association between salary and

2PA, as shown by the correlation value of 0.682 between the two variables with a significance level of

0.01. There is a substantial positive link between Salary and BLK, as indicated by the correlation value

of 0.301 between the two variables, with a significance level of 0.01. There is a considerable positive

association between salary and AST, as shown by the correlation value of 0.594 between the two

variables, which is significant at the 0.01 level. There is a substantial positive association between Salary

and WS, as indicated by the correlation value of 0.625 and significance of 0.01, respectively. Table 3

presents that the salary is related to FT, 3PA, 2PA, BLK, AST and WS, and the strength of the

association was represented by the Pearson correlation coefficient.

Table 3. The correlation among 7 variables

Salary

3PA

2PA

BLK

AST

Salary

0.674**

3PA

0.492**

0.488**

2PA

0.682**

0.871**

0.455**

BLK

0.301**

0.294**

-0.059

0.383**

AST

0.594**

0.646**

0.584**

0.694**

0.084

0.625**

0.719**

0.369**

0.693**

0.491**

0.540**

There are many values indicate that the relationship between 6 factors of on-court performance and

players’ salary is significant and positive since the calculated the coefficients is less than 1and more

than 0 (0.674,0.492,0.682,0.301,0.594 and 0.625).

3.3. Regression Results

Table 4 below shows that firstly, the model's fitting is examined; specifically, the R-square value, the

VIF value, and the tolerance value may all be used to study the model's fitting. Tolerance = 1/VIF value;

in general, a VIF value > 5 indicates the presence of a collinearity issue, and a tolerance <0.2 does the

same. Check the model to see if there are any collinearity issues.

Table 4. Results of Linear Regression

Unstandardized Coefficients

Standardized

Coefficients

Collinearity

Diagnosis

Beta

VIF

Constant

-2476947.598

670573.077

3.694

0.000**

1125733.724

471445.409

0.165

2.388

0.017*

4.964

0.201

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0147

3PA

828401.61

188199.978

0.175

4.402

0.000**

1.643

0.609

2PA

608793.669

215148.025

0.203

2.83

0.005**

5.359

0.187

BLK

2424712.235

1158363.963

0.083

2.093

0.037*

1.617

0.618

AST

748267.705

267460.844

0.137

2.798

0.005**

2.488

0.402

787786.48

213926.572

0.186

3.683

0.000**

2.666

0.375

R 2

0.558

Adj R 2

0.552

F (6,460)=96.810,p=0.000

D-W

Value

1.092

Denote: Dependent Variable=Salary

* p<0.05 ** p<0.01

For the purposes of the linear regression analysis, the dependent variable is salary, and the

independent variables are FT, 3PA, 2PA, BLK, AST, and WS from Table 4. The following is the model

formula:

𝑆𝑎𝑙𝑎𝑟𝑦 = −2476947.598 +1125733.724 ∗𝐹𝑇 + ⋯ + 787786.480 ∗𝑊𝑆 (1)

With an R-square value of 0.558, the model can account for 55.8% of the difference in salary between

FT, 3PA, 2PA, BLK, AST, and WS. The model's F-test resulted in a passing score (F=96.810,

p=0.000<0.05), suggesting that at least one of the factors FT, 3PA, 2PA, BLK, AST, and WS affected

salary. Furthermore, the model's multicollinearity test reveals that the model had VIF values larger than

5. However, if it is less than 10, it may indicate the presence of a specific collinearity issue that may be

resolved via stepwise or ridge regression. Additionally, it is advised to look for independent factors that

show strong association, exclude those variables, and then re-analyze. Additionally, the model's F-test

results show that it passed (F=96.810, p=0.000<0.05), indicating that the model's design is significant.

4. Conclusion

In this research, it indicates that a degree of correlation between NBA players' salary levels and their

on-court performance. Highly paid players tend to have better on-court performances. Players’ salaries

have strong significance with those 6 aspects of performances (FT, 3PA, 2PA, BLK, AST and WS).

Each of those factors have a positive relationship with players’ salaries. It represents that with better on-

court performance, players could get higher salaries. Visual figures (Scatter plot, Error Bar, bar chart

and Clustered Line), Correlation analysis and Linear Regression model in the context accurately

demonstrate and support the result of the study.

It cannot be denied that due to limited amount of data, this model might have some errors related to

the variables, and the sample did not cover all seasons and players, causing slightly differences, which

might affect the accuracy of results. However, the advantages cannot be ignored as well. The graphical

strategy comprehensively shows the visualization of the variables and makes the result clearer.

Further research can explore more subareas, such as the relationship between player salary and

performance at different positions, and player performance in the playoffs and regular season, to learn

more about the connection between player performance and remuneration. Finally, this study provides

some insights into the complex relationship between NBA player salary and on-court performance,

which can help club managers and sports agents make more informed decisions on player salary setting

and transfer strategies.

References

[1] Tarman, A. (2005) The Effect of Monopsony Power in Major League Baseball on the Salaries of

Players with Less Than Six Years in the Majors. Honors Projects, 31.

Table 4. (continued).

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0147

[2] Yan. C.H. (2023) A comparative analysis of multi-model salary classification prediction for

American baseball players. Journal of Nanning Normal University (Natural Science Edition),

40, 79-87.

[3] Hakes, J.K. and Turner, C. (2011) Pay, productivity and aging in Major League Baseball. Journal

of Productivity Analysis, 35, 61-74.

[4] Stanek, T. (2016) Player Performance and Team Revenues: NBA Player Salary Analysis. CMC

Senior Theses. Claremont McKenna College.

[5] Li, N.Y. (2014) The determinants of the salary in NBA and the overpayment in the year of signing

a new contract. Dissertations & Theses - Gradworks. Clemson University.

[6] Jonah, F. (2017) Salary Inequality in the NBA: Changing Returns to Skill or Wider Skill

Distributions? CMC Senior Theses.

[7] Daniel. H. (2014) An Analysis of New Performance Metrics in the NBA and Their Effects on

Win Production and Salary. The faculty of the University of Mississippi in partial fulfillment

of the requirements of the Sally McDonnell Barksdale Honors College.

[8] Hsiung, T.L. (2014) The Relationships among Salary, Altruistic Behavior and Job Performance

in the National Basketball Association. Center for Promoting Ideas, USA.

[9] Simmons, R. and Berri, D.J. (2011) Mixing the princes and the paupers: Pay and performance in

the National Basketball Association. Labour Economics, 18, :381-388.

[10] Nuesch, S. (2009) A note on the endogeneity of the pay-performance relationship in professional

soccer. Economics Bulletin, 29, 1850-1855.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0147

Schrödinger equation for various quantum systems based on

Heisenberg's uncertainty principle

Kexin An

Department of Mathematics, The Ohio State University. 281 W Lane Ave, Columbus,

OH 43210

an.416@osu.edu

Abstract. This article establishes the proof of the Schrödinger equation for numerous quantum

systems, utilizing Heisenberg's uncertainty principle. The Fourier transform connects functions

in the time and frequency domains, resulting in the mathematical inequality that is the foundation

of the uncertainty principle. In the part of Methods and Theory, the article derives the uncertainty

principle through Fourier transforms by defining the mean and variance of angular frequency

and time, and subsequently expanding the integral. This establishes the fundamental connection

between time and frequency domains, illustrating the constraints imposed by quantum mechanics.

In the part of Results and Application, the article applies the uncertainty principle to derive the

Schrödinger equation under different conditions: free particle, particle in a box, harmonic

oscillator, and hydrogen atom. For each case, the article assumes wave function solutions, uses

the uncertainty in position and momentum to estimate kinetic and potential energies, and shows

that the total energy matches the ground state energy derived from the Schrödinger equation. The

results highlight the critical role of Heisenberg's uncertainty principle in understanding key

aspects of quantum mechanics, providing a unified framework for these diverse systems.

Keywords: Fourier Transform, Heisenberg's Uncertainty Principle, Quantum Mechanics,

Schrödinger Equation.

1. Introduction

Quantum mechanics is the essential theory that describes particles' behavior at the atomic and subatomic

levels. It provides a framework for understanding the physical properties of nature at small scales, where

classical mechanics fails to apply. The development of quantum mechanics has led to numerous

technological advancements, including semiconductors, lasers, and quantum computing [1]. By

describing the wave-particle duality of matter and energy, quantum mechanics reveals the probabilistic

nature of physical phenomena, which is essential for the accurate prediction and manipulation of

microscopic systems [1]. Heisenberg's uncertainty principle is the core of quantum mechanics,

underscoring the fundamental limits of measurement and observation in the quantum realm.

Mathematically, the uncertainty principle can be derived using Fourier transforms, which relate

functions in the time and frequency domains. The principle can be expressed as ΔΔℏ

2. The

relationship between Heisenberg's uncertainty principle and Fourier transforms emphasizes the

relationship between time and frequency domains, which is essential for comprehending the behavior

of quantum systems [2]. The uncertainty principle has diverse applications in quantum mechanics,

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0117

(https://creativecommons.org/licenses/by/4.0/).

including elucidating the stability of atoms, the behavior of particles in box, and the quantization of

energy levels.

This article is organized as the following. In the part of Methods and Theory, by using Fourier

Transform to Prove Heisenberg uncertainty principle, it explains how the Heisenberg uncertainty

principle is derived using the properties of Fourier transforms. The derivation starts with mathematical

inequality and proceeds through defining the mean and variance of angular frequency and time. By

interpreting these results, the uncertainty principle is established. The part of Fourier transform in time-

dependent Schrödinger equation discusses the application of Fourier transforms in quantum mechanics,

specifically in transitioning between the position and momentum representations of the wave function.

The time-dependent Schrödinger equation, a foundational equation in quantum mechanics, is introduced,

describing how a physical system's quantum state changes over time [3]. In the results and application,

using Heisenberg's uncertainty principle to prove Schrödinger equation under free particle condition, it

assumes a plane wave solution for a free particle and demonstrates how the uncertainty principle leads

to the time-dependent Schrödinger equation.

The key steps involve recognizing the relationships between energy, momentum, and the wave

function's form. When using Heisenberg's uncertainty principle to prove Schrödinger equation under

particle in a box, it considers a particle confined in a one-dimensional box. It shows how the uncertainty

in position and momentum aligns with the quantized energy levels obtained from the Schrödinger

equation. If Heisenberg's uncertainty principle is used to prove Schrödinger equation under Harmonic

Oscillator, it addresses the harmonic oscillator, verifying the ground state energy using the uncertainties

in position and momentum. The results are related to the known solutions involving Hermite

polynomials. By utilizing Heisenberg's uncertainty principle to prove Schrödinger equation for the

hydrogen atom problem, it deals with the hydrogen atom, using the Bohr radius to estimate the

uncertainties and derive the ground state energy. The result matches the solution obtained from the

Schrödinger equation, demonstrating the fundamental role of the uncertainty principle in quantum

mechanics.

2. Methods and Theory

2.1. Using Fourier transform to prove Heisenberg uncertainty principle

A fundamental notion in quantum physics is the Heisenberg Uncertainty Principle, which claims that it

is difficult to simultaneously know the precise position and momentum of a particle [4]. This principle

can be mathematically derived using Fourier transforms, which relate functions in time and frequency

domains.

The proof starts with the following mathematical inequality:

 󰈅

2Δ2󰆹󰇛󰇜󰆹

󰈅2

∞

∞0󰇛1󰇜

This inequality uses properties of the Fourier transform and derives the uncertainty principle. The

mean and variance can be defined as the following. The Mean and Variance of ω are 

󰆹󰇛󰇜





 and 󰇛󰇜󰆹󰇛󰇜





 . The Mean and Variance of t are 





 󰇛󰇜 and 󰇛󰇜



 . Using the above definitions to expand the integral:

 󰈅

2Δ2󰆹󰇛󰇜󰆹

󰈅2

∞

∞ 

2Δ22󰆹2

2Δ2󰇧

󰆹





󰆹󰇨󰈅󰆹

󰈅2

∞

∞󰇛2󰇜

By simplifying the right-hand side, it is found that 

 

󰇛󰇜



 .

Combining these results, the Heisenberg uncertainty principle is

ΔΔ1

2󰇛3󰇜

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0117

Angular frequency  is related to momentum  by: 

. Therefore 

. Substituting into

the uncertainty principle for angular frequency and time: 

, 



. By multiplying both

sides by , 

. Interpreting  as , it is found that 

.

Hence, the product of the uncertainties in time and frequency domains is bounded below by a

constant, which is a representation of the Heisenberg uncertainty principle. The derivation emphasizes

the profound connection between time and frequency domains, as encapsulated by the Fourier transform,

and their role in understanding the behavior of quantum systems.

2.2. Fourier Transform in Quantum Mechanics and Time-Dependent Schrödinger Equation

The Fourier transform can be used to turn a function of time or space into a function of frequency or

momentum [5]. In quantum mechanics, the Fourier transform is used to switch between the position

representation and the momentum representation of the wave function. The Fourier transform of a wave

function 󰇛󰇜 is given by 

󰇛󰇜1

2 󰇛󰇜

∞

∞ 󰇛4󰇜

The inverse Fourier transform is:

󰇛󰇜1

2 

󰇛󰇜

∞

∞ω󰇛5󰇜

The time-dependent Schrödinger equation describes how the quantum state of a physical system

evolves over time [6]. It is a foundational equation in quantum mechanics and is given by:

ℏ󰇛󰇜

 

󰇛󰇜 󰇛6󰇜

where 󰇛󰇜 is denoted by the wave function of the system, ℏ is denoted by the reduced Planck constant,

and 

 is denoted by the Hamiltonian operator. For a particle in a potential 󰇛󰇜 the Hamiltonian

operator can be expressed as: 

ℏ2

22

2󰇛󰇜󰇛7󰇜

3. Results and Application

3.1. Prove Schrödinger equation under the free particle condition

Assume a plane wave solution for a free particle

󰇛󰇜󰇛󰇜󰇛8󰇜

where  is the wave number, and ω is the angular frequency. Using the de Broglie relation ℏ and

ℏ, the time-dependent Schrödinger equation for a free particle is

ℏ

 ℏ2

22

2󰇛9󰇜

Compute the time derivative: 

  and compute the second spatial derivative: 2

22, the

author can relate  and  to Energy and Momentum.

For a free particle, the energy E is purely kinetic:  2

2ℏ22

2. The angular frequency ω is related

to the energy by ℏ. Thus, ℏℏ22

2. This implies ℏ2

2. Substitute ω into the time derivative

equation: 

 ℏ2

2. Rewrite the equationℏ

 ℏ22

2, and using the second spatial derivative:

ℏ2

22

2ℏ22

2, it is found that the Schrödinger equation is:

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0117

ℏ

 ℏ2

22

2󰇛10󰇜

By assuming the wave nature of particles and using the Heisenberg uncertainty principle, it arrives

at the Schrödinger equation for a free particle. The key steps involve recognizing the relationships

between energy, momentum, and the wave function's form, which are all consistent with the constraints

imposed by the uncertainty principle [7].

3.2. Prove Schrödinger equation under particle in a box

Consider a particle confined in a one-dimensional box of width . The potential 󰇛󰇜 is given by

󰇛󰇜󰇫0if0xa

∞

0 󰇛11󰇜

The time-independent Schrödinger equation for a particle of mass  in a potential V(x) is given by

Eq. (6). When 󰇛󰇜, the equation simplifies to

ℏ2

22

2 󰇛12󰇜

The solution to the Schrödinger equation where 󰇛󰇜0 is given by 󰇛󰇜2

sin 󰇡

󰇢,

where n is a positive integer. The corresponding energy levels are:

22ℏ2

22󰇛13󰇜

For a particle in the ground state , the wave function is:

1󰇛󰇜2

sin 󰇡

󰇢󰇛14󰇜

The uncertainty in position, , can be approximated as: 

. The uncertainty in momentum, ,

can be estimated using the uncertainty principle:  





To relate these uncertainties to the Schrödinger equation, the expression for the kinetic energy of the

particle is: 2

2. The uncertainty in energy due to the uncertainty in momentum is: ΔΔ2

2ℏ2

22.

This energy uncertainty matches the ground state energy 12ℏ2

22. Thus, the Heisenberg uncertainty

principle is consistent with the energy levels from the Schrödinger equation for a particle in a box. The

ground state energy and demonstrated its alignment with the Schrödinger equation. It serves to illustrate

that the uncertainty principle forms a fundamental basis for comprehending the quantization of energy

levels within confined systems [8].

3.3. Prove Schrödinger equation under harmonic oscillator

The one-dimensional harmonic oscillator is given by: 󰇛󰇜1

222. The time-independent

Schrödinger equation for a particle of mass in 󰇛󰇜 is ℏ2

22

2󰇛󰇜. For a harmonic oscillator,

substituting 󰇛󰇜1

222gives: ℏ2

22

21

222. The solutions to this equation involve

Hermite polynomials: 󰇛󰇜222󰇛󰇜, where α= 

ℎ.  is a normalization constant,

and  are the Hermite polynomials. The corresponding energy levels are:

1

2ℏ󰇛15󰇜

For the ground state (0), the wave function is:

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0117

0󰇛󰇜󰇡 

12󰇢12222󰇛16󰇜

The uncertainties in position  and momentum  for the ground state are given by: Δ

22ℏ

2, Δ22ℏ

2. For the ground state of the harmonic oscillator, it

verifies the uncertainty principle ΔΔℏ

2ℏ

2ℏ

2. The uncertainties in position and

momentum are related with the energy of the harmonic oscillator: 2

21

222. For the ground

state: 2 ℏ

2, 2ℏ

2. Substituting these into the energy expression:

ℏ

41

22ℏ

2ℏ

2󰇛17󰇜

Thus, the energy matches the ground state energy 01

2 obtained from the Schrödinger equation.

It demonstrates that the limits placed on the precise position and momentum of the particle lead directly

to the quantized energy levels of the harmonic oscillator [9].

3.4. Prove Schrödinger equation for the hydrogen atom problem

The energy for an electron in a hydrogen atom is given by the Coulomb potential: 󰇛󰇜 2

40. The

time-independent Schrödinger equation for the hydrogen atom in spherical dimensions is:

ℏ2

22󰇛󰇜 󰇛18󰇜

By separating variables, the radial part of the Schrödinger equation is:

ℏ2

2󰇧2

2󰇛1󰇜

2󰇨2

40 󰇛19󰇜

For the hydrogen atom, let's assume the uncertainty in the electron's position  is on the order of

the Bohr radius 0: 

The uncertainty in momentum  can be estimated using Heisenberg's uncertainty principle:

Δℏ

Δℏ

0󰇛20󰇜

The kinetic energy T can be approximated as: 󰇛󰇜

 



. The potential energy V is: 



. The total energy  is the sum of kinetic and potential energy

 ℏ2

20

22

400󰇛21󰇜

To find the ground state energy, minimize  with respect to 0: 

. Then, 







.

Solving for 0, it is found that 

. Substitute 0 back into the expression for :

 2

800󰇛22󰇜

This is the ground state energy of the hydrogen atom, which matches the result obtained from solving

the Schrödinger equation [10].

4. Conclusion

This article demonstrates the application of Heisenberg's uncertainty principle to derive the Schrödinger

equation for various quantum systems, including free particles, particles in a box, harmonic oscillators,

and the hydrogen atom. By using the fundamental limits imposed by the uncertainty principle, it shows

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0117

how the quantization of energy levels arises naturally within these systems. The proof underscores the

connection between the principles of quantum mechanics and the Fourier transforms used to describe

them. The derivations presented provide a clear and coherent framework for understanding the

foundational aspects of quantum mechanics. With the uncertainty principle, it derives the Schrödinger

equation, which manages the behavior of quantum systems. The article offers a unified approach to

deriving the Schrödinger equation for different quantum systems using Heisenberg's uncertainty

principle. This helps in understanding the common underlying principles that govern these systems. The

use of Fourier transforms to derive the uncertainty principle and subsequently apply it to different

quantum systems adds a level of mathematical rigor to the derivations, ensuring that the results are

robust and consistent. However, the article has the limitations. Some derivations rely on simplifying

assumptions, such as approximating uncertainties or assuming certain forms of wave functions. These

assumptions, while useful for illustrative purposes, are not fully capture the complexity of real-world

quantum systems. When considering the methods in more complex quantum systems, such as those with

several interacting particles or external fields, it reduces constraints in the future study. Combining the

analytical framework offered with numerical simulations makes it possible to provide deeper

understanding and more precise predictions for a wider variety of quantum phenomena.

References

[1] Chen, L. P., Kou, K. I., Liu, M. S. (2015). Pitt's Inequality and the Uncertainty Principle

Associated with the Quaternion Fourier Transform. Journal of Mathematical Analysis and

Applications, 423(1), 681-700.

[2] Ballentine, L. E. (2014). Quantum Mechanics: A Modern Development. World Scientific

Publishing Company.

[3] Feit, M. D., Fleck Jr, J. A., & Steiger, A. (1982). Solution of the Schrödinger Equation by a

Spectral Method. Journal of Computational Physics, 47(3), 412-433.

[4] Busch, P., Heinonen, T., & Lahti, P. (2007). Heisenberg's Uncertainty Principle. Physics Reports,

452(6), 155-176.

[5] Bracewell, R. N. (1989). The Fourier Transform. Scientific American, 260(6), 86-95

[6] Berezin, F. A., Shubin, M. (2012). The Schrödinger Equation (Vol. 66). Springer Science &

Business Media.

[7] Shananin, N. A. (1994). On Singularities of Solutions of the Schrödinger Equation for a Free

Particle. Mathematical Notes, 55(6), 626-631.

[8] Hojman, S. A., Asenjo, F. A. (2020). A new approach to solve the one-dimensional Schrödinger

equation using a wavefunction potential. Physics Letters A, 384(36), 126913.

[9] Havin, V., Jöricke, B. (2012). The uncertainty principle in harmonic analysis (Vol. 28). Springer

Science & Business Media.

[10] Nakatsuji, H. (2005). General Method of Solving the Schrödinger Equation of Atoms and

Molecules. Physical Review A—Atomic, Molecular, and Optical Physics, 72(6), 062110.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0117

Analysis of the Principles of Quantum Computing and State-

of-the-Art Applications

Zhuolun Li

School of Physics and Astronomy, University of St Andrews, St Andrews, the United

Kindom

zl200@st-andrews.ac.uk

Abstract. Contemporarily, quantum computing has emerged as a promising field, offering

potential breakthroughs in various computational tasks that are currently limited by classical

computing. With this in mind, this study delves into the principles of quantum computing,

exploring the fundamental concept of quantum entanglement and its implications for

computation. After outlining the historical development and research significance of quantum

computing, this research presents an overview of the latest advancements in the field. The paper

then focuses on the principles of quantum computation, including the use of qubits and quantum

gates, illustrated with relevant mathematical formulations and diagrams. Furthermore, this study

discusses the state-of-the-art applications of quantum computing, showcasing recent

achievements and results obtained from these cutting-edge technologies. A comparative analysis

with traditional algorithms highlights the advantages and potential gains offered by quantum

computing. Finally, the current limitations of quantum computing are discussed and the insights

into future research directions and prospects are proposed for this exciting field.

Keywords: quantum computing, quantum entanglement, quantum principles, traditional

algorithms.

1. Introduction

Reinvigorating computation, quantum computing has captured the imagination of interdisciplinary

researchers and trailblazers. This revolutionary paradigm promises to transcend the boundaries of

classical computing through the exclusive capabilities of quantum mechanics. Envisioned as a game-

changer, it strives to tackle complex challenges with a velocity and precision unprecedented in

traditional computing. Delving into its historical roots, quantum computing finds its genesis in the early

20th century, a time when intellectual giants such as Max Planck, Niels Bohr, and Werner Heisenberg

laid the theoretical cornerstone for elucidating the dynamical interactions of material particles and

energy phenomena on the atomic and subatomic scales. Their contributions formed the intellectual

scaffolding upon which quantum computing's aspirations are built.

The impetus to leverage quantum systems for computational endeavors notably accelerated during

the 1980s. Richard Feynman's seminal 1982 work, titled "Simulating Physics with Computers,"

underscored the inherent inefficiency of simulating quantum phenomena on classical machines,

stemming from the exponential bloat of computational demands with system expansion [1]. This

revelation reignited interest in exploiting the unique features of quantum mechanics for computational

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0155

(https://creativecommons.org/licenses/by/4.0/).

pursuits. Following Feynman's seminal contributions, quantum computing has undergone swift

advancements, propelled by breakthroughs in quantum physics, computer science, and engineering

domains. Initial endeavors concentrated on showcasing core quantum computing principles,

encompassing phenomena like quantum teleportation and entanglement-mediated communication

frameworks. Nonetheless, notable strides in constructing scalable quantum computing platforms,

capable of tackling intricate algorithms, were not attained until the dawn of the new millennium. This

achievement was fueled by substantial investments from multiple sectors, including governments,

academia, and industry, acknowledging quantum computing's potential to address the humanity's most

critical challenges.

In recent years, quantum computing has experienced notable advancements, with a multitude of

milestones accomplished across different aspects. A particularly significant accomplishment is the

development of quantum processors that possess an escalating number of qubits. Initially, quantum

processors were restricted to just a few qubits, significantly constraining their computational capabilities.

Nevertheless, current advanced systems now feature hundreds or even thousands of qubits, facilitating

more intricate computations and expanding the horizons of quantum computing's potential. These

advancements have materialized due to breakthroughs in materials science, fabrication techniques, and

control electronics. Researchers have introduced innovative qubit implementations, including

superconducting qubits, optical quantum, they are all presenting their own set of advantages and

challenges. Furthermore, progress in microwave engineering and cryogenics has facilitated precise

control and manipulation of quantum states, which is vital for executing intricate quantum algorithms.

A pivotal achievement in the evolution of quantum computing lies in the demonstration of what is

known as 'quantum supremacy,' a term coined by John Preskill. This concept highlights the quantum

computer's capacity to execute a designated task at a pace unparalleled by any classical computer, even

when the latter employs immense parallel processing capabilities [2]. Notably, in 2019, Google asserted

that it had achieved this quantum supremacy milestone with its 53-qubit 'Sycamore' processor. This

accomplishment entailed solving a random circuit sampling problem in merely 200 seconds, a feat that

would have ostensibly consumed millennia for even the world's swiftest supercomputer [3]. Amidst

ongoing discussions regarding the significance and repercussions of this breakthrough, it stands as a

pivotal step in showcasing the immense potential harbored by quantum computing.

The motivation behind this paper stems from the growing recognition of the importance of quantum

computing in addressing challenges that currently exceed the capabilities of classical computing. With

computational problems continually increasing in complexity, there is a pressing need for innovative

computational paradigms that can handle the exponential growth in computational demands. Quantum

computing emerges as a promising solution, leveraging the unique properties of quantum mechanics to

achieve substantial speedups for certain classes of problems. The organization of this paper is structured

as following. Sec. 2 delves into the intricacies of quantum entanglement, the unconventional correlation

underpinning the prowess of quantum computing. Sec. 3 outlines the fundamental principles of quantum

computation, encompassing qubits and quantum gates, supported by mathematical formulations and

visual aids. Sec. 4 explores the cutting-edge applications of quantum computing, showcasing recent

triumphs and outcomes stemming from these groundbreaking technologies. Sec. 5 contrasts quantum

algorithms with their classical counterparts, elucidating their advantages and potential benefits. Sec. 6

appraises the existing constraints within quantum computing and provides insights into prospective

research avenues and the future landscape of this burgeoning field. Lastly, Sec. 7 concludes the paper

by recapitulating the core discoveries and their broader implications.

2. Descriptions of quantum entanglement

Quantum entanglement, a singular feature of quantum mechanics, signifies an intricate

interconnectedness among two or more quantum particles. This strong correlation renders the state of

any one particle inseparable from the rest, transcending even spatial separation. This nonlocal aspect is

among quantum mechanics' most intriguing and counter-intuitive qualities, underpinning the core tenets

of quantum information processing and computation. Quantum entanglement is mathematically

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0155

formulated in the realm of quantum mechanics, leveraging the constructs of linear algebra and complex

Hilbert spaces. A composite quantum system's pure state, comprising two or more subsystems, can be

mathematically represented as a vector residing in the tensor product of the individual subsystems'

Hilbert spaces. When the state cannot be decomposed into a simple product of the individual subsystem

states, it is classified as entangled, highlighting the inseparability of the system components.

Consider a two-qubit scenario where each qubit can inhabit either the |0⟩ or |1⟩ state. In this context,

separable states are straightforwardly represented by |00⟩, |01⟩, |10⟩, or |11⟩. However, the Bell state

|𝛷+⟩ = (|00⟩ + |11⟩)/√2 exemplifies entanglement, a state that resists decomposition into individual

qubit states. This entanglement creates a profound connection, where the measurement of one qubit

instantaneously influences the state of its entangled counterpart, defying spatial barriers. This nonlocal

correlation serves as the foundation for a myriad of quantum communication and computational

protocols. In 1935, Albert Einstein, Boris Podolsky, and Nathan Rosen introduced the EPR paradox [4],

which questioned the completeness of quantum mechanics by asserting that entanglement contradicted

local realism. Nevertheless, subsequent scientific investigations have repeatedly validated the

predictions of quantum mechanics, upholding the authenticity of entanglement and its nonlocal

characteristics. Quantum computing leverages entanglement to offer a fundamentally novel approach to

parallel information processing compared to classical methods. By exploiting entanglement, quantum

computers embark on multiple computational trajectories concurrently, harnessing the superposition

principle to perform intricate computations with heightened efficiency vis-à-vis classical counterparts.

3. Principle of quantum computation

Quantum computation uniquely exploits the salient features of quantum mechanics to perform

calculations in a completely distinct manner from classical computation. At its core, quantum computing

relies on the qubit, a fundamental unit of information that differs markedly from the classical bit in

various crucial respects. A classical bit is binary, constrained to the states 0 or 1. Conversely, a qubit

boasts a super positional ability, concurrently inhabiting a blend of these two states. Mathematically,

this blend is framed as a linear integration of |0⟩ and |1⟩ states, designated as ∣ψ⟩ = α∣0⟩ + β∣1⟩,

where α and β are complex coefficients under the normalization rule ∣α∣² + ∣β∣² = 1. This

superposition grants the qubit superior information-carrying potential over its classical counterpart,

permitting it to signify a continuous spectrum of states, transcending the limitations of mere binary

representation.

Quantum circuit elements, namely quantum gates, serve as the fundamental components for

manipulating qubits to execute computational procedures. Distinct from classical logic gates that

function sequentially on individual bits, quantum gates exhibit the capability to simultaneously interact

with one or multiple qubits, leveraging the superposition and entanglement features inherent in quantum

states. Several prototypical quantum gates are:

• The Hadamard Gate (H), which transforms a basis state into a superposition state. For instance, the

gate transforms the state |0⟩ into an equal superposition of |0⟩ and |1⟩.

• The Controlled-NOT (CNOT) Gate, which implements a conditional NOT operation between two

qubits. If the control qubit is in the |1⟩ state, the target qubit's state flips; otherwise, it remains

unchanged.

• The Toffoli Gate, a universal quantum gate capable of simulating any classical logic circuit. It toggles

the target qubit's state solely when both control qubits are simultaneously in the |1⟩ configuration.

• These gates underscore the quantum computing paradigm's parallel processing capabilities, enabling

computations that far surpass those achievable by conventional means.

Quantum algorithms capitalize on the innate parallelism and entanglement properties of quantum

computers, resulting in substantial speed enhancements compared to classical algorithms. Grover's

search algorithm, for instance, achieves a quadratic speedup in identifying a target element within an

unordered N-element list, requiring just O(√𝑁) steps, vastly superior to the O(N) steps of its classical

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0155

counterparts [5]. Similarly, Shor's factorization algorithm performs the task of large integer factorization

in polynomial time, whereas the most advanced classical factoring methods operate in sub exponential

time, underscoring the remarkable efficiency gains offered by quantum algorithms [6]. The

mathematical framework of quantum algorithms typically entails representing quantum states as vectors

in a complex Hilbert space and utilizing linear algebra to depict the evolution of these states when

quantum gates are applied. Illustrated in Figure 1 is a quantum circuit diagram, exemplifying the

utilization of quantum gates in the manipulation of qubits.

Figure 1. Quantum circuit used in numerical simulations (Photo/Picture credit: Original).

In addition to the basic algorithms, the specific implementation methods of quantum computers are

also various. At present, there are several mature quantum computers. Ion trap quantum computers

utilize charged ions suspended in electromagnetic fields as qubits. They offer long coherence times,

high-fidelity gates, and scalability potential. Recent advances have shown stable confinement of

hundreds of ions, but scaling to larger numbers remains a challenge. Advanced control techniques and

cryogenic traps aim to minimize decoherence. Hybrid quantum-classical systems simplify complex

algorithm implementation. Applications include quantum simulation, optimization, and cryptography.

Overcoming scaling, error correction, and integration challenges is crucial for practical deployment.

Despite these hurdles, ion trap quantum computers show promise for realizing fault-tolerant quantum

computation [7]. Superconducting quantum computers harness the unique properties of superconducting

materials at extremely low temperatures to realize efficient manipulation and stable storage of quantum

bits (qubits). They leverage quantum superposition and entanglement to solve complex problems with

unprecedented efficiency, far surpassing classical computers. Core to their operation are

superconducting quantum chips, which serve as the foundation for qubit operations. With applications

spanning drug discovery, material science, cryptography, and secure communications, superconducting

quantum computers represent a promising direction in quantum computing research. Advancements

such as China's indigenously developed "Origin Quantum Computing" demonstrate the practicality and

sophistication of this technology [8]. Optical quantum computers constitute a cutting-edge technology

that harnesses the distinct properties of light to execute computations. They employ photons as qubits,

leveraging quantum superposition, interference, and entanglement to facilitate parallel processing and

efficient information handling. This enables optical quantum computers to address complex problems

with unprecedented speed and efficiency, surpassing the limitations of classical computers [9]. Silicon

photonics computers, or silicon photonics-based computing systems, utilize silicon-based photonic

integrated circuits to manipulate and process information using photons instead of electrons. This

technology combines the advantages of silicon, a traditional material for integrated circuits, with the

speed and bandwidth of optical communication [10]. Topological quantum computers represent a

promising avenue in quantum computing research. Leveraging the topological properties of certain

quantum systems, they aim to achieve fault-tolerant quantum computation. These computers encode

quantum information in a manner that is intrinsically resilient to decoherence and errors, enhancing

stability and reliability [11].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0155

4. Applications for quantum computation

Quantum computation showcases immense potential across diverse application domains, ranging from

optimization and machine learning to cryptography and materials science. This section delves deeper

into these applications, emphasizing the recent achievements and implications stemming from

advancements in quantum computing technologies. By leveraging the unique properties of quantum

mechanics, quantum computing promises to revolutionize these fields and pave the way for

groundbreaking solutions.

4.1. Optimization

Quantum computation presents formidable potential for tackling optimization challenges prevalent

across industries like logistics, finance, and engineering. These problems necessitate pinpointing the

optimal solution amidst numerous configurations, posing significant computational hurdles for classical

computing frameworks. By exploiting the inherent advantages of quantum mechanics, quantum

computing aims to revolutionize the resolution of such optimization tasks.

Quantum annealing, a heuristic optimization algorithm inspired by the metallurgical process of

annealing, can be executed on quantum computers to discover approximate solutions for intricate

optimization problems. This algorithm initializes a system of qubits in a superposition state and

progressively cools it down to its ground state, where the lowest energy configuration signifies the

optimal solution to the optimization problem.

4.2. Machine learning

Quantum computing presents a transformative opportunity for machine learning, promising streamlined

neural network training and the emergence of innovative quantum-driven algorithms. Quantum neural

networks (QNNs) capitalize on the exclusive attributes of quantum mechanics to embody and

manipulate data, marking a fundamental divergence from traditional neural network architectures. This

approach enables QNNs to process information in a fundamentally different manner, leveraging the

inherent advantages of quantum computation for enhanced performance and efficiency. Parameterized

variational quantum algorithms (pVQAs) have emerged as an encouraging approach for solving

optimization challenges leveraging quantum computation. These algorithms deploy a circuit architecture

parameterized by quantum gates, which encodes the solution space for a given optimization problem.

The parameters of this quantum circuit are subsequently refined through a classical optimizer aimed at

minimizing a predefined cost function.

This blend of quantum and classical computation has garnered attention for its application in diverse

machine learning endeavors, including the development of quantum support vector machines and

quantum autoencoders. By strategically combining the advantages of both paradigms, pVQAs enable

the efficient tuning of sophisticated machine learning models, thereby addressing intricate optimization

tasks with greater proficiency.

4.3. Cryptography

Quantum cryptography, alternatively known as quantum key distribution (QKD), promises an

unparalleled level of security that transcends the limitations of classical cryptography. Drawing upon

quantum mechanical principles, notably the no-cloning theorem and the uncertainty principle, QKD

ensures that any interception attempt on a quantum communication channel is detectable, consequently

safeguarding against unauthorized access to transmitted data [12].

In QKD, a sequence of quantum states (typically photons) is transmitted between two communicating

parties. Any attempt to measure these quantum states to extract information will inevitably disturb them,

revealing the presence of an eavesdropper. By monitoring the disturbance in the transmitted quantum

states, the communicating parties can detect any eavesdropping attempts and abort the communication

if necessary.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0155

4.4. Material science

Quantum computing presents a transformative opportunity for material science by facilitating the

simulation of intricate quantum systems that are computationally overwhelming for classical machines.

Classical approaches to modeling quantum systems, encompassing molecules and solid-state materials,

grapple with an exponential surge in computational intricacy as the system dimensions expand.

Conversely, quantum computers excel at efficiently embodying and manipulating quantum states,

rendering them ideally suited for simulating these systems with precision and efficacy.

Quantum phase estimation techniques can aid in elucidating the energy distribution of molecular

Hamiltonians, a crucial aspect for deciphering the electronic configuration and chemical characteristics

of molecules. This knowledge forms the cornerstone for crafting innovative materials tailored to specific

attributes like heightened conductivity, sturdiness, or catalytic prowess. By mimicking the dynamics of

quantum systems at an atomic level, quantum computers accelerate the quest for groundbreaking

materials with transformative applications.

5. Comparison with traditional algorithms

Quantum computing algorithms exhibit notable advantages over traditional algorithms in terms of

computational speed and efficiency, particularly for problems that are inherently challenging to solve

using classical methods. In the realm of optimization problems, quantum annealing and QAOA can

frequently discover high-quality solutions more rapidly than classical heuristics, harnessing the

parallelism and entanglement properties inherent in quantum computation. Analogously, for machine

learning tasks, QNNs and VQAs possess the potential to expedite training processes and enhance model

accuracy by leveraging the distinct capabilities of quantum computers.

However, it is essential to recognize that quantum computing does not represent a panacea for all

computational challenges. Despite the existence of problems that can be effectively tackled by

conventional algorithms, the cost associated with developing and sustaining quantum computers poses

a substantial obstacle for widespread adoption. Moreover, the pursuit of practical quantum algorithms

that outperform their classical counterparts remains an active field of inquiry, confronted with numerous

hurdles that have yet to be navigated.

6. Limitations and prospects

Quantum computing remains an emerging technology, and widespread implementation necessitates

addressing numerous substantial obstacles. A pivotal challenge lies in the delicacy of qubits, which are

susceptible to decoherence and noise. Decoherence arises when quantum coherence dissipates due to

environmental interactions, transforming the quantum state into a classical blend. Additionally, noise

may originate from various factors, including qubit realization flaws, control electronics imperfections,

and ambient conditions, posing further difficulties. To address decoherence and noise issues, scientists

are advancing sophisticated error mitigation techniques and constructing more resilient qubit

architectures. Techniques like surface codes and topological quantum error correction are being

employed, wherein logical qubits are encoded across numerous physical qubits. This approach enables

the detection and correction of errors without perturbing the encoded quantum information. Nonetheless,

these error correction strategies necessitate a substantial overhead, both in terms of the quantity of

physical qubits required and the intricacy of the control circuitry. Another hindrance in the current

landscape of quantum computing is the constraint in qubit interconnectivity within processors. Most

processors sport a scattered qubit network, where direct links exist between only a select few qubits.

This restricted connectivity poses obstacles for intricate quantum algorithm deployment, potentially

necessitating auxiliary SWAP gates for data transfer between distant qubits. Scientists are actively

investigating diverse architectural blueprints to enhance qubit connectivity, exploring options like 2D

lattices, 3D arrays, and superconducting resonators, among others.

Despite these limitations, the prospects for quantum computing are bright. Advances in materials

science, quantum hardware design, and algorithm development are driving rapid progress in the field.

New qubit implementations, such as topological qubits and spin qubits, are being explored to improve

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0155

qubit coherence times and reduce the impact of noise. Additionally, the development of hybrid quantum-

classical systems that leverage the strengths of both technologies is likely to accelerate the adoption of

quantum computing in the near term. Quantum computing, in the long haul, bears the capacity to

transform diverse industries and enable solutions to problems that classical computers struggle with. For

drug discovery, it can accelerate the identification of potential drug candidates by simulating molecular

interactions at the atomic level. Analogously, in climate modeling, quantum computing can bolster

simulation precision by efficiently tackling the intricate dynamics of atmospheric and oceanic systems.

7. Conclusion

In conclusion, quantum computing has emerged as a promising avenue with multifaceted applications,

outperforming traditional computers in terms of computational swiftness and proficiency across

domains from optimization and machine learning to cryptography and materials science. Despite

formidable challenges, including qubit fragility and the necessity for practical error correction codes,

the prospects for quantum computing appear bright. With relentless progress in hardware, software, and

algorithmic advancements, quantum computing promises to transform into an indispensable instrument

for tackling humanity's pressing challenges. Researchers and engineers are relentlessly striving to

transcend the limitations of contemporary quantum technologies and extend the boundaries of quantum

computing's potential. As this field continues to advance, quantum computing is poised to infuse a fresh

and substantial impetus into the development of human science and technology.

References

[1] Feynman R P 1982 Simulating physics with computers International Journal of Theoretical

Physics vol 21(6-7) pp 467-488

[2] Arute F, Arya K, Babbush R, Bacon D, Bardin J C, Barends R and Martinis J M 2019 Quantum

supremacy using a programmable superconducting processor Nature vol 574(7779) pp 505-

510

[3] Preskill J 2018 Quantum computing in the NISQ era and beyond Quantum vol 2 p 79

[4] Einstein A, Podolsky B and Rosen N 1935 Can quantum-mechanical description of physical

reality be considered complete? Physical Review vol 47(10) p 777

[5] Grover L K 1996 A fast quantum mechanical algorithm for database search In Proceedings of the

Twenty-eighth Annual ACM Symposium on Theory of Computing pp 212-219

[6] Shor P W 1994 Algorithms for quantum computation: Discrete logarithms and factoring In

Proceedings 35th Annual Symposium on Foundations of Computer Science pp 124-134

[7] Blatt R and Wineland D 2008 Entangled states of trapped atomic ions Nature vol 453(7198) pp

1008-1015

[8] Zhu X, Saito S, Young A W, Gray R, Chen L, Bose S and You J Q 2021 Quantum computational

advantage via 66-qubit superconducting quantum circuit Science vol 372(6544) pp 973-977

[9] Wang J, Paesani S, Ding Y, Santagati R, Skrzypczyk P, Salavrakos A and Thompson M G 2019

Multidimensional quantum entanglement with large-scale integrated optics Science vol

366(6465) pp 602-606

[10] Thomson D J, Zilkie A, Bowers J E, Vlasov Y A, Chen L and Urbas A 2016 Roadmap on silicon

photonics Journal of Optics vol 18(7) p 073003

[11] Nayak C, Simon S H, Stern A, Freedman M and Das Sarma S 2008 Non-Abelian anyons and

topological quantum computation Reviews of Modern Physics vol 80(3) p 1083

[12] Bennett C H and Brassard G 1984 Quantum cryptography: Public key distribution and coin

tossing In Proceedings of IEEE International Conference on Computers Systems and Signal

Processing pp 175-179

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0155

Advances in monocular ORB-SLAM system: A review

Ziyi Yuan

School of Advanced Manufacturing, Guangdong University of Technology,

Guangzhou, China

3121009096@mail2.gdut.edu.cn

Abstract. Perception and localization are the main factors to determine the success of unmanned

vehicles. Therefore, researchers have conducted substantial studies, which made unmanned

driving not only to perceive and comprehend the around environments but also refer to the detail

about the environments by constructing 3D map. While there is still a lack of uniform explanation

of Oriented Fast and Rotated Brief - Simultaneous Localization and Mapping (ORB-SLAM) for

monoculars. By selecting and collecting the combination and application of the recent four types

of monocular ORB-SLAM in unmanned driving scenarios, this paper discusses the question of

how to decrease cumulative error and ensure accuracy and robustness in dynamic environments.

It is revealed that after comparing the recent four types of ORB-SLAM systems with

conventional ORB-SLAM systems, the fusion system’s robustness and accuracy have been

improved. Combining visual SLAM sensors with different algorithms and studying in different

complex environments will be mainstream in future research.

Keywords: Localization, monocular vision, simultaneous localization and mapping.

1. Introduction

With the fast development of technology, Simultaneous Localization and Mapping (SLAM) widely be

used in high-tech industries, such as the robot industry, construction industry, and unmanned vehicles.

While it has developed from the traditional SLAM in recent years, SLAM has been divided into two

categories, one is based on the laser sensors, which use laser to measure, while the other one is based on

the visual sensors. The visual SLAM is mostly using cameras to make measurements, and it can be

divided into three categories by their way of working, including monocular, multiocular, and RGB-D

cameras.

In recent years, the SLAM has been used to understand the surrounding environments, map around

environments and determine the location within the area. By detecting the objective, utilizing the deep

estimation and visual SLAM, the perception and measurement of the surrounding environment are

realized. There are also examples of applying SLAM techniques to microrobots for minimally invasive

surgery. However, there is still a lack of a uniform explanation for the monocular ORB-SLAM system

under the visual SLAM system. Through investigating four recent monocular ORB-SLAM systems, and

summarizing the research of the current ORB-SLAM system for monoculars, this research discusses

how to improve accuracy and robustness to reduce cumulative error in different environments. It is

concluded that combining the sensor with different innovational algorithms will effectively decrease

cumulative error and guarantee accuracy and robustness in complex environments.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0171

(https://creativecommons.org/licenses/by/4.0/).

2. Application of the ORB-SLAM for monocular

ORB-SLAM, is an open-source visual system, about the application of the monocular ORB-SLAM. In

recent years, researchers also studied how to use ORB-SLAM in unmanned vehicles, the construction

industry, the robot industry and so on. In unmanned vehicles, the ORB-SLAM is used to recognize

pavement information and detect road obstacles. ORB-SLAM is also used to investigate and improve

drivers’ driving habits. In a survey, the author mentioned a method using ORB-SLAM to realize the

track of head scanning movement when driving, which aimed at gaining awareness of driving safety

technology applications [1].

In the construction industry, there is research about using ORB-SLAM to enhance robot localization

in dynamic construction environments, where the construction robots have to do precise positioning

work. However, it was difficult to recognize the dynamic objects in previous research and they mainly

investigate static objects. With the deepening of SLAM technology studies, the ORB-SLAM system has

a breakthrough in accurately segmenting dynamic objects and improving localization accuracy, which

justifies the ORB-SLAM potential for applications in complex construction environments.

In the robot industry, the ORB-SLAM system has been proposed for the surgical treatment of

microrobots, such as minimally invasive intestinal surgery. there is a survey introduced that minimally

invasive surgery has a series of problems in microrobot applications such as low reconstruction accuracy,

small surgical field, and low computational efficiency, a framework based on the ORB-SLAM system

for real-time dense reconstruction in binocular endoscopy scenes to solve these problems [2].

3. Algorithm based on SLAM for monocular

3.1. Conventional algorithm

The conventional SLAM systems include two main threads to be executed in parallel, which are called

tracking and mapping. However, the visual SLAM framework needs to include the following parts:

sensor information reading, front-end, back-end, map construction, and closed-loop detection [3].

Sensor information reading recognizes and preprocesses the image information. The front-end is known

as visual odometry which is in charge of processing the input images of the previous step and estimating

the camera posture at different times. The back-end is called nonlinear optimization, it can receive the

camera posture at different times returned by visual odometry and optimize the posture. In addition, the

back-end also receives closed-loop detection information and executes the global optimization to obtain

globally consistent tracks and maps. The last part is closed-loop detection which is used to certain

whether the mobile robot has passed through a previously visited location. The feature of pure visual

SLAM tracks the movement of key points through successive camera frames to infer the posture of the

camera.

3.2. Conventional algorithm of ORB-SLAM

The conventional Algorithm of ORB-SLAM divides SLAM system into three threads, feature points are

attached to them. The ORB-SLAM algorithm is modified based on the Parallel Tracking and Mapping

(PTAM) algorithm. The original PTAM algorithm has made a great breakthrough in the conventional

visual SLAM, which first proposed the parallelization of the tracking and mapping process, and uses

nonlinear optimization to replace the traditional filter as the back-end scheme, introducing a mechanism

of keyframes in the PTAM algorithm [3].

The mechanism suggests that each image can be processed without fine processing, instead, it can

proceed by connecting several images and then optimizing its tracks and maps. However, the closed-

loop detection cannot be performed in the PTAM algorithm. So the scenario it applies in is small and

the tracking is easy to lose. Compared to the ORB-SLAM proposed after PTAM, the ORB-SLAM

algorithm uses the ORB feature points and its descriptors to detect and track the feature points in the

image, and to estimate the camera pose through the resulting feature points. The ORB feature points are

a very fast feature extraction method with rotational invariance. The use of uniform ORB features helps

SLAM algorithms to have endogenous consistency in the steps of feature extraction and tracking, key

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0171

frame selection, 3D reconstruction, and closed-loop detection [4]. The ORB-SLAM algorithms divide

SLAM system into three threads, feature point tracking, spatial mapping, and loop detecting. The

advantage of this algorithm is that ORB-SLAM can realize real-time tracking and it is easy to find back

the lost keyframe when it returns to the original scene [5]. Besides, using the ORB-SLAM algorithms

can effectively improve the positioning stability and track the object in a simple scenario. This algorithm

compared with the PTAM algorithm provides more closed-loop detection parts than the PTAM

algorithm and can effectively solve the cumulative error problem left by PTAM algorithm.

4. Optimization of ORB-SLAM for algorithm

With the continuous innovation and optimization of the ORB-SLAM algorithm in recent years, this

section will introduce four derivative algorithms based on ORB-SLAM. By integrating monocular ORB-

SLAM with different methods, its robustness and accuracy in different environments have been

improved.

4.1. A graph recovery algorithm

Based on the ORB-SLAM, different progress has been made on monocular visual SLAM. Through the

SLAM graph recovery algorithm based on subgraphs and undirected connection graphs, the system uses

the mapping connection to re-initialize and reconstruct the individual parts of the map without tracking.

The survey shows that by evaluation in drone image simulations and datasets of ground and indoor

testing, it is concluded that in the situation of tracking failure, the SLAM graph recovery algorithm based

on subgraph and undirected connection graph can make the integrity of the map better than other

mainstream SLAM methods, ensuring the map integrity in the unmanned driving under the system of

tracking failures [6].

The main breakthrough is that after creating tracking failures in unmanned driving, missing maps

can be repaired by creating subgraphs. Then, the integrality of its subgraphs is guaranteed by a new

selection method. Finally, the undirected connection graph is used to maintain the connection

relationship between the subgraphs. The number of keyframes retained in the UAV environment, and

the proposed system is about four times the keyframe retained based on the original ORB-SLAM 2. In

an outdoor street environment, the proposed system can effectively reconstruct a more complete scene

map [6].

4.2. A semi-direct monocular SLAM with three-level parallel optimization

The conventional visual SLAM method can be divided into the feature-based method and the direct

method. The method of feature-base is to extract the feature points from the image data which is received

by the camera and analyze the feature points to realize the estimation of the camera posture. However,

the method of direct utilizes the photometric error to estimate the posture of the camera, for the reason

to effectively combine the advantages of the two methods and achieve more accurate camera pose

estimation. The survey proposed a semi-direct monocular SLAM with three-level parallel optimization

[7]. In this study, a new framework for SLAM operation called DO-SLAM is explored [7]. The first half

part of the DO-SLAM system, by using direct methods to quickly and robustly track the camera pose.

While the second half part of the system uses a feature-based approach, to refine the pose of the

keyframes, execute the loop, and construct the reusable globally consistent, long-term, and sparse

feature maps. The survey, as demonstrated by its evaluation of two benchmark datasets, using this

method has higher accuracy and robustness in motion estimation in unmanned driving.

4.3. Optimization of 3D points

Due to the limitations of monocular cameras, the scale of the monocular camera is fuzzy and limited in

the system and the environment. It is difficult to accurately measure the depth of the target scene and

the distance from the camera, which can reduce the impact on measurement accuracy during unmanned

driving. A study proposed a scaling estimation method [8]. By using the method for monocular visual

odometers in unmanned driving scenarios, the innovative approach is to use two consecutive keyframes

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0171

to reconstruct the 3D ground points, then use more processing frames to increase the number of 3D

ground points, to estimate more precise camera height [8]. This method adopts the mean ORB-SLAM

movement error of 1.19% on the KITTI dataset, compared to the state-of-the-art traditional monocular

SLAM method.

4.4. HFNet-SLAM

ORB-SLAM 3 is a SLAM system in visual SLAM with feature-based methods and higher robustness

and accuracy. A study replaced the vulnerability of traditional algorithms in complex environments,

proposed the HFNet-SLAM system [9]. This is an accurate real-time monocular SLAM system based

on ORB-SLAM 3. The system is combined with a deep convolutional neural network (CNN). The

difference between HFNet-SLAM and conventional ORB-SLAM 3 is that its local and global features

are extracted from deep CNN, HF-Net system, while experiments show that even with highly

reproducible local features of a deep CNN in complex environments, this is better than the traditional

feature extraction. The performance of this system has been validated on public data sets against other

state-of-the-art algorithms, the results show that HFNet-SLAM achieves the lowest error among the

systems available in the literature.

4.5. Comparison between optimized ORB SLAM algorithms and conventional SLAM

The four different fusion systems that are based on ORB-SLAM introduced above are compared to the

traditional visual SLAM system in this section. Firstly, a graph recovery algorithm based on the

subgraph and the undirected connection graph is compared with the traditional visual SLAM algorithm

in the case of failed system tracking, the fusion algorithm can ensure the integrity of the map in its lost

situation and is able to recover previously missing partial maps under the method of creating subgraphs.

While in the method based on the three-level parallel optimization, the direct approach with the

advantages of the feature approach was combined. In terms of the previous visual SLAM method, which

can obtain a more accurate and more robust camera posture estimation method. The approach uses the

3D ground feature points, collects 3D ground points from multiple processing frames, and then utilizes

robust scale estimation. Comparing it with traditional visual SLAM effectively reduces its scale drift.

Finally, by integrating the deep convolutional neural network (CNN) algorithm with the ORB-SLAM 3

system, compared with the conventional ORB-SLAM system, it shows that the resulting proposed

system has higher robustness and accuracy than the previous systems. The fusion system is twice as

accurate as the ORB-SLAM3 system in medium and large environments in the TUM-VI dataset [9].

5. Limitations of ORB-SLAM

Monocular visual sensor, compared with multiocular visual sensor and depth camera, the cost is cheaper.

However, in the application scenario, it often cannot exactly get the absolute depth of the environment.

For a monocular camera, it cannot get the true value of the trajectory and the map size leading to the

measured results of certain deviation values. The multiocular camera and depth camera can measure the

scene depth, so the aspect of the sensor monocular camera in the application scene still has certain

limitations. For example, Kinect Fusion proposed the use of Kinect cameras for 3D reconstruction, and

the RGB-D camera is used in ORB-SLAM2 system [10].

In the current experimental survey, most studies concentrated on both static and low-speed

environments. While studies in highly dynamic environments are still scarce. The robustness and

accuracy of the monocular vision system in highly dynamic or complex environments still cannot be

guaranteed [10], such as the method of scale estimation by acquiring 3D ground points [8]. The

researchers found that in a curved road or slope, only a less of 3D ground points can be collected, which

results in introducing new scale factors for the purpose of correcting the camera posture. This will lead

to a measurement error. Therefore, the lack of research on high-speed dynamic and complex scenarios

is one of the reasons for the limitation of the application of monocular vision systems in different

scenarios

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0171

Although a lot of research in the direction of feature extraction, the monocular visual system still has

some limitations in the extraction of feature points, such as integrating the deep convolutional neural

network algorithm with the ORB-SLAM [9]. It utilizes the deep learning method but still found in the

process of the experiment, the system will break in extreme rotation of the situation, and the use of

neural network method needs to use the support of external equipment, which undoubtedly increases the

burden of mobile robots.

6. Tendency and improvement

Through the above research and analysis, the future development trend and research direction of the

monocular visual system can be seen. One direction proposed in this survey is that it can be expanded

in the existing ORB-SLAM algorithm to support multiocular visual SLAM and RGB-D SLAM, because

both are more robust and accurate than monocular visual SLAM. The deep convolutional neural network

can be used as its feature extraction method, and then similar methods to solve the drift problem, to

ensure the stability of the whole system and reduce its error. This method combines the above methods

and it combines some of their advantages.

Experimental studies in highly dynamic and complex scenarios are scarce. So for future research, a

visual SLAM system in various dynamic scenarios is needed. Through the exploration and study of

complex scenes and highly dynamic environments, it is noticed that robustness can be achieved by

repeating the subgraph or improving the algorithm while improving the accuracy and robustness of the

visual SLAM system. By combining the visual SLAM system with the emerging algorithm, achieving

a much lower systematic error will be the future research direction.

7. Conclusion

This paper discusses and analyzes four recent ORB-SLAM systems combined with other innovational

algorithms, based on their differences from the conventional visual SLAM system. It is proposed that in

complex conditions, different algorithms with sensors should integrate with the monocular ORB-SLAM,

such as graph recovery algorithm for subgraph and undirected connection graph, three-level parallel

optimization method, and feature extraction method for constructing 3D ground points. An emerging

system combining the ORB-SLAM system with a deep convolutional neural network can make visual

SLAM have a prominent improvement in accuracy and robustness but also decrease the cumulative error.

Investigating during the study, current research on ORB-SLAM systems in highly dynamic

environments is scarce, and there is no study for monocular ORB-SLAM systems to apply in complex

environments. In the subsequent scientific studies, researchers can focus on studying the visual SLAM

system in highly dynamic environments and successively improve the ability to extract feature points in

the complex environment, to improve the accuracy and robustness of the monocular ORB-SLAM system

and reduce the cumulative error.

References

[1] Wang, S., Li, J., Yang, P., Gao, T., Bowers, A. R., & Luo, G. (2020). Towards Wide Range

Tracking of Head Scanning Movement in Driving. International journal of pattern recognition

and artificial intelligence, 34(13), 2050033. https://doi.org/10.1142/s0218001420500330

[2] Huo, J., Zhou, C., Yuan, B., Yang, Q., & Wang, L. (2023). Real-Time Dense Reconstruction with

Binocular Endoscopy Based on Stereo Net and ORB-SLAM. Sensors (Basel, Switzerland),

23(4), 2074. https://doi.org/10.3390/s23042074

[3] Zhu P, Zhou H, Zhang H, Lu S& Wei R. Visual simultaneous localization and mapping method

for a mobile robot. JOURNAL OF TIANJIN UNIVERSITY OF TECHNOLOGY1-10.

[4] Tourani, A., Bavle, H., Sanchez-Lopez, J. L., & Voos, H. (2022). Visual SLAM: What Are the

Current Trends and What to Expect? Sensors (Basel, Switzerland), 22(23), 9297. https://doi.

org/10.3390/s22239297

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0171

[5] R. Mur-Artal, J. M. M. Montiel and J. D. Tardós, "ORB-SLAM: A Versatile and Accurate

Monocular SLAM System," in IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147-1163,

Oct. 2015, doi: 10.1109/TRO.2015.2463671.

[6] Z. Zhan, W. Jian, Y. Li and Y. Yue, "A SLAM Map Restoration Algorithm Based on Submaps

and an Undirected Connected Graph," in IEEE Access, vol. 9, pp. 12657-12674, 2021, doi:

10.1109/ACCESS.2021.3049864

[7] S. Lu, Y. Zhi, S. Zhang, R. He and Z. Bao, "Semi-Direct Monocular SLAM With Three Levels

of Parallel Optimizations, " in IEEE Access, vol. 9, pp. 86801-86810, 2021, doi: 10.1109/

ACCESS.2021.3071921

[8] M. Fan, S. -W. Kim, S. -T. Kim, J. -Y. Sun and S. -J. Ko, "Simple But Effective Scale Estimation

for Monocular Visual Odometry in Road Driving Scenarios," in IEEE Access, vol. 8, pp.

175891-175903, 2020, doi: 10.1109/ACCESS.2020.3026347

[9] Liu, L., & Aitken, J. M. (2023). HFNet-SLAM: An Accurate and Real-Time Monocular SLAM

System with Deep Features. Sensors (Basel, Switzerland), 23(4), 2113. https://doi.org/10.

3390/s23042113

[10] Bala, J. A., Adeshina, S. A., & Aibinu, A. M. (2022). Advances in Visual Simultaneous

Localisation and Mapping Techniques for Autonomous Vehicles: A Review. Sensors (Basel,

Switzerland), 22(22), 8943. https://doi.org/10.3390/s22228943

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0171

Prospects for the development of cartography through the

integration of SLAM technology with GIS technology

Yaodong Tang

School of Information Engineering, China University of Geosciences Beijing, Beijing,

China

1004215119@email.cugb.edu.cn

Abstract. With the continued development of Simultaneous Localization and Mapping (SLAM)

and Geographic Information Systems (GIS) technologies, their application scenarios have

become increasingly complex. Combining these technologies can significantly enhance

operational efficiency in challenging environments. This paper presents an analysis of existing

cases where SLAM and GIS technologies have been integrated, demonstrating that such a

merger facilitates the consolidation and complementarity of spatial data. This integration

allows robots or systems to simultaneously utilize the global information provided by GIS and

the dynamic local data captured by SLAM for a more comprehensive and detailed

environmental analysis, which is highly beneficial for the field of cartography. Further research

has developed a series of operational procedures for integrating SLAM and GIS, utilizing

MATLAB as a tool. This study also reviews several existing technical challenges, including

real-time performance and computational capacity, environmental complexity and dynamic

changes, and multi-scale data processing, and proposes potential solutions. The paper

concludes by predicting that the integration of SLAM and GIS will play a crucial role in areas

such as smart city management and disaster emergency response, indicating that this research

area will become a hot topic in future cartographic technology.

Keywords: Cartography, Simultaneous Localization and Mapping, Geographic Information

Systems.

1. Introduction

In recent years, with the rapid development of mobile robotics and autonomous driving technologies,

Simultaneous Localization and Mapping (SLAM) technology has gradually become a research hotspot.

The core of SLAM technology lies in a robot's ability to construct maps in real-time and self-localize

in an unknown environment, addressing critical issues in autonomous navigation. Andréa Macario

Barros et al. have provided a detailed introduction to the current fundamental functions of SLAM

technology [1]. Geographic Information Science (GIS), a technology for collecting, storing, analyzing,

and displaying geospatial data, has played a significant role in urban planning, environmental

monitoring, and resource management.

The integration of SLAM and GIS technologies not only compensates for the real-time and

precision deficiencies in traditional GIS data acquisition but also provides SLAM technology with rich

geospatial information support. This greatly expands the application scope and potential of both

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0172

(https://creativecommons.org/licenses/by/4.0/).

technologies. Dorra Larnaout and her team attempted in 2012 to use DEM data constraints commonly

used in GIS for bundle adjustment, resulting in a threefold increase in positioning accuracy, thereby

demonstrating the complementary nature of combining GIS data with SLAM technology [2].

Moreover, the integrated application of SLAM and GIS technologies is gaining increasing attention

and has found applications in various areas, especially in the construction of smart cities. K. Ghosh

and K. S. S. Musti proposed a framework for developing a GIS-based intelligent traffic system for

energy-aware smart cities by combining GIS and SLAM technologies [3].

According to Chen Chen and Cheng Yinhang, who introduced and demonstrated various SLAM

algorithms on the MATLAB platform, it can be inferred that MATLAB's mobile robot SLAM

simulator is easier to operate compared to commercial simulators [4]. This is because the MATLAB

language is widespread and easy to program, with numerous built-in functions supporting matrix

operations. Additionally, MATLAB offers various toolboxes to address issues in signal processing,

image processing, fuzzy logic, etc., allowing users to focus on SLAM algorithms and theory. On this

basis, the integrated application of SLAM and GIS can fully leverage MATLAB's powerful data

processing and algorithm implementation capabilities, providing new opportunities and solutions for

the development of the cartography field.

This paper aims to explore the background and significance of the integration of SLAM and GIS

technologies, analyze their prospects for development in the field of cartography, and discuss the

feasibility and advantages of combining the two through MATLAB, as well as potential application

directions and prospects. By delving into these topics, this research is believed to offer new insights

and references for the advancement of cartography.

2. Overview of SLAM and GIS technologies

2.1. Overview of SLAM

SLAM and GIS are crucial technological concepts in modern science and technology, playing pivotal

roles in their respective domains.

SLAM technology refers to the ability of mobile devices (such as robots, drones, smartphones, etc.)

to autonomously locate themselves and construct maps in an unknown environment. The core idea of

SLAM is to use sensors (such as cameras, LiDAR, etc.) to observe and determine the device's position

and orientation during movement, then incrementally build a map based on this positional information.

Several widely used algorithms are common in SLAM technology. Firstly, LiDAR-based SLAM

algorithms, such as Hector SLAM, Gmapping, and Cartographer, primarily rely on LiDAR sensors to

obtain environmental information through laser scanning, thereby achieving localization and mapping.

Another category is visual-based SLAM algorithms, such as ORB-SLAM (Oriented FAST and

Rotated Brief SLAM), LSD-SLAM (Large-Scale Direct Monocular SLAM), and PTAM (Parallel

Tracking and Mapping), which mainly depend on cameras to extract environmental features through

image processing techniques, enabling localization and mapping. Additionally, there are algorithms

like EKF-SLAM (Extended Kalman Filter SLAM) and FastSLAM, which employ different

mathematical models and optimization strategies to adapt to various environments and application

requirements [5]. SLAM technology is crucial for realizing truly autonomous mobile robots, allowing

them to explore and understand their surroundings and achieve autonomous navigation and task

execution without prior knowledge.

2.2. Overview of GIS

GIS technology is a technology for capturing, storing, managing, analyzing, and displaying geographic

data. GIS technology is based on geospatial data and employs geographic modelling analysis methods

to provide various spatial and dynamic geographic information. It can transform tabular data into

geographic graphical displays, facilitating user browsing, operation, and analysis. The primary data

types handled by GIS technology include vector data, raster data, terrain data, topological data, and

address data. Vector data, composed of geometric elements like points, lines, and polygons, represents

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0172

specific locations and shapes on a map. Raster data, consisting of pixels where each pixel represents

an area on the map, is often used for continuous data such as remote sensing images and digital

elevation models. Terrain data primarily describes surface morphology and elevation information,

serving as the main source for digital elevation models. Topological data emphasizes spatial

relationships between geographic features, such as adjacency and connectivity. Address data links

geographic coordinates with postal addresses, supporting location queries and positioning services.

GIS technology has extensive applications in various fields such as urban planning, environmental

monitoring, disaster management, and agricultural resource management. It helps individuals better

understand and interpret geographic phenomena, supporting decision-making and problem-solving. By

integrating SLAM and GIS technologies, the strengths of both fields can be leveraged to develop

innovative solutions for various applications.

3. The necessity of integrating SLAM and GIS

As the application scope of SLAM technology continues to expand in various aspects of daily life,

certain limitations of SLAM have been exposed in specific environments. For instance, the

autonomous monitoring of water supply and sewage pipeline networks presents significant challenges.

When robots operate within underground water supply and sewage pipelines, they often cannot receive

GPS signals to estimate their positions accurately. The interiors of these pipelines are undeniably

complex and difficult to navigate. Deep within these pipes, it is pitch dark and perpetually filled with

water. Although the water level in supply pipes is relatively stable, the sewage level in sewer pipes

fluctuates over time. More critically, sensors within the sewer may be obstructed by various types of

waste. The dirty environment not only increases the risk of sensor contamination but also heightens

the likelihood of sensor failure. Therefore, in such scenarios, integrating GIS data images, which

feature distinct and clear attributes, could significantly enhance the stability and efficiency of robotic

operations [6].

Although GIS data contain extensive global geographic information and can accurately describe

critical aspects of the environment such as topography and landmarks, this information is typically

static. Conversely, maps generated by SLAM technology are often localized and real-time. By

combining the two, spatial data integration and complementarity can be achieved. This allows robots

or systems to utilize both the global information provided by GIS and the dynamic local data acquired

through SLAM, enabling more comprehensive and detailed environmental analysis. This integration is

also crucial in the research and development of smart cities and autonomous driving technologies.

Therefore, the integration of SLAM and GIS technologies is highly necessary [3].

4. Feasibility analysis of integrating SLAM and GIS technologies

In a study conducted by D. Larnaout, S. Bourgeois, V. Gay-Bellile, and M. Dhome in 2012, the

integration of SLAM and GIS technologies was successfully realized by incorporating DEM (Digital

Elevation Model) constraints into the BA (Bundle Adjustment) optimization process. The results of

the BA optimization with added DEM constraints were significantly superior to those without such

constraints. The data indicated that the median error for SLAM with DEM constraints was

approximately 3.16 meters, whereas the median error for classical SLAM exceeded 9 meters. This

implies that the addition of DEM constraints can enhance positioning accuracy by a factor of three [2].

In 2017, a team consisting of Manhui Sun, Shaowu Yang, Xiaodong Yi, and Hengzhu Liu

proposed a method for autonomous large-scale environmental navigation based on GIS and SLAM.

Utilizing real urban spatial road network information and leveraging the storage and computational

capabilities of GIS spatial databases, they developed a comprehensive system that includes a spatial

database, SLAM, and navigation algorithms. This system demonstrated good reusability and

scalability, making it suitable for real-life scenarios and capable of guiding robots in navigation and

mapping activities under extensive conditions [7].

In 2020, a team comprising F-J Serrano, V Moreno, B Curto, and R Álves proposed a new

approach to global localization for mobile robots by storing GIS map data in a PostGIS database. This

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0172

method involved using GIS map data as an information source and initializing the filter based on the

probability distributions generated from sensor readings. The proposed solution, termed

Environmental Stimulus Localization (ESL), helps mitigate the impact of measurement errors and

allows for quicker recovery from localization failures [8].

5. Technical methodology for integrating SLAM and GIS (Using MATLAB as an Example)

The integration of SLAM data with GIS data involves a multi-step process aimed at leveraging the

strengths of both systems to achieve more accurate environmental perception and localization.

5.1. Data fusion

5.1.1. Data preprocessing. The data generated by the SLAM system needs to be cleaned and

organized, which includes noise removal, error correction, and other preprocessing steps. This ensures

the accuracy and reliability of SLAM data, such as robot trajectories and environmental maps. People

could acquire relevant geospatial data from GIS platforms, such as topographic maps, road networks,

and building information. Convert the SLAM-generated map data into GIS-compatible formats.

Depending on the requirements, GIS data may need to be converted or clipped to align with SLAM

data in the same or similar coordinate systems.

5.1.2. Coordinate system unification. It is necessary to ensure that SLAM data and GIS data use the

same coordinate system. This typically requires coordinate transformation or calibration to enable

seamless fusion of the two data sets.

5.1.3. Data registration and alignment. It requires registering and aligning SLAM data with GIS data

using known landmarks or feature points. This can be achieved through feature extraction and

matching algorithms, ensuring spatial consistency between the two data sets.

5.1.4. Data fusion. After completing data preprocessing and coordinate system unification, SLAM

data can be fused with GIS data. This can be achieved through overlaying, merging, or other fusion

algorithms, depending on the application context and requirements. For example, local maps generated

by SLAM can be overlaid on the global maps provided by GIS to obtain more comprehensive

environmental information.

5.2. Data visualization

After the data fusion is done, the next step is to utilize MATLAB to process the point cloud data

generated by SLAM, which includes filtering, registration, and feature extraction. Then both SLAM

and GIS data in MATLAB can be visualized. This step involves the graphical representation of the

fused data to facilitate analysis and interpretation.

6. Technical challenges and solutions

6.1. Real-time performance and computational efficiency

SLAM requires real-time processing of sensor data (e.g., LiDAR, cameras) to update positional

information and maps, whereas GIS data is often large-scale and complex, demanding considerable

processing time. In this case, high-performance computing and parallel processing techniques can be

employed to enhance data processing efficiency. Additionally, the development of incremental update

algorithms can allow GIS data to be updated in real-time based on SLAM outputs.

6.2. Accuracy and robustness

The accuracy and robustness of SLAM systems are affected by sensor noise and environmental

changes, while GIS data requires high precision and stability. However, integrating multiple sensor

data sources (such as IMU, GPS, LiDAR, and vision) can improve the accuracy and robustness of

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0172

SLAM [9]. Moreover, using existing high-precision GIS data for calibration and error correction can

enhance the overall system accuracy.

6.3. Environmental complexity and dynamic changes

SLAM technology is prone to errors in complex and dynamically changing environments (e.g., urban

settings with moving crowds and vehicles), whereas GIS systems typically assume static geographic

information [10]. While it is found that Dynamic object detection and tracking technologies can be

used to isolate the impact of moving objects on SLAM, ensuring the accuracy of the static parts of the

map. Machine learning methods can also be employed to enhance the system’s adaptability to

environmental changes.

6.4. Multi-scale data processing

SLAM data is usually local and fine-grained, while GIS data can cover large areas with various

resolutions. Converting and processing data across different scales is required. Developing multi-scale

data fusion algorithms can enable seamless transitions and integrations from local details to global

maps.

6.5. Loop closure and global optimization

SLAM requires loop closure and global optimization to improve the overall consistency of the map.

These processes can become complex and computationally expensive when dealing with large-scale

GIS data. Efficient graph optimization algorithms and feature-based loop closure detection methods

can be utilized to reduce computational complexity and enhance global map consistency.

7. Future prospects

The integration of SLAM and GIS technologies sees a rising development trend and this research

presents some of it for future practice and study guidance.

7.1. Intelligent city management

The integration of SLAM and GIS technologies can be applied to intelligent city management systems

to achieve refined management of urban infrastructure.

Urban management departments can use drones equipped with SLAM technology for aerial

inspections, generating real-time 3D map data of the city and integrating this data into GIS systems.

By comparing new and old map data, issues such as road damage and building violations can be

quickly identified, thereby improving city management efficiency.

7.2. Disaster emergency response and rescue

Combining SLAM's rapid mapping capabilities with GIS's global data management can enhance the

response speed and accuracy of disaster emergency response and rescue operations.

After disasters like earthquakes or floods, rescue teams can use robots or drones equipped with

SLAM technology to quickly generate real-time 3D maps of the affected areas. By integrating these

maps with existing geographic information data in GIS systems, rescue teams can swiftly formulate

rescue plans and identify optimal rescue routes.

7.3. Augmented reality and virtual reality applications

Integrating SLAM technology with GIS data can be applied in the fields of Augmented Reality (AR)

and Virtual Reality (VR) to achieve more realistic scene reconstruction and interactive experiences.

At tourist sites, AR glasses or mobile devices can use SLAM technology for real-time positioning

and environmental perception, overlaying virtual information onto the real world. For instance, visitors

can see virtual reconstructions of historical buildings and real-time guide information, all managed and

updated through GIS systems.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0172

7.4. Precision agriculture

Utilizing SLAM's precise positioning and real-time environmental sensing in combination with GIS

technology can enhance agricultural production efficiency and precision management.

Agricultural robots can use SLAM technology for autonomous navigation in fields, generating

real-time 3D maps and uploading this data to GIS systems for analysis and management. Farmers can

then use the real-time map data for precise irrigation, fertilization, and pest control, thereby improving

crop yield and quality.

7.5. Indoor navigation and management

Applying SLAM technology to indoor environments in conjunction with GIS systems can achieve

high-precision indoor navigation and management.

In complex indoor environments such as large shopping malls, hospitals, and airports, users can use

mobile devices equipped with SLAM technology for indoor navigation [11]. These indoor map data,

integrated with other information such as shop locations and emergency exits via GIS systems, can

provide more accurate and comprehensive navigation services.

8. Conclusion

This study primarily focuses on SLAM and GIS technologies, delving deeply into the current state of

both fields. It proposes the idea of integrating SLAM technology with GIS for cartographic

applications. Through further exploration, this study infers the necessity and feasibility of combining

these two technologies in the present era. On this basis, the research also discusses and analyzes the

technical challenges that may arise during the integration of SLAM and GIS technologies in the

cartographic domain, proposing potential solutions to these issues. The findings suggest that the

integration of SLAM and GIS technologies in mapping is both meaningful and achievable. Although

there are certain technical difficulties at present, viable solutions can enhance the combined mapping

effects of these technologies. Finally, this study presents some prospects for the integration of SLAM

and GIS technologies in the field of mapping, including intelligent city management, disaster

emergency response and rescue, augmented reality and virtual reality applications, and precision

agriculture.

References

[1] Macario Barros, A.; Michel, M.; Moline, Y.; Corre, G.; Carrel, F. A Comprehensive Survey of

Visual SLAM Algorithms. Robotics 2022, 11, 24. https://doi.org/10.3390/robotics11010024

[2] D. Larnaout, S. Bourgeois, V. Gay-Bellile and M. Dhome, "Towards Bundle Adjustment with

GIS Constraints for Online Geo-Localization of a Vehicle in Urban Center," 2012 Second

International Conference on 3D Imaging, Modeling, Processing, Visualization &

Transmission, Zurich, Switzerland, 2012, pp. 348-355, doi: 10.1109/3DIMPVT.2012.38.

[3] K. Ghosh and K. S. S. Musti, "Integration of SLAM with GIS to model sustainable urban

transportation system: A smart city perspective," 2020 12th International Conference on

Computational Intelligence and Communication Networks (CICN), Bhimtal, India, 2020, pp.

261-267, doi: 10.1109/CICN49253.2020.9242571.

[4] Chen Chen and Yinhang Cheng, "MATLAB-based simulators for mobile robot Simultaneous

Localization and Mapping, " 2010 3rd International Conference on Advanced Computer

Theory and Engineering (ICACTE), Chengdu, 2010, pp. V2-576-V2-581, doi: 10.1109/

ICACTE.2010.5579471.

[5] T.J. Chong, X.J. Tang, C.H. Leng, M. Yogeswaran, O.E. Ng, Y.Z. Chong, Sensor Technologies

and Simultaneous Localization and Mapping (SLAM), Procedia Computer Science, Volume

76,2015, Pages 174-179, ISSN 1877-0509,https://doi.org/10.1016/j.procs.2015.12.336.

[6] Aitken, J.M., Evans, M.H., Worley, R., Edwards, S., Zhang, R., Dodd, T.J., Mihaylova, L.S., &

Anderson, S.R. (2021). Simultaneous Localization and Mapping for Inspection Robots in

Water and Sewer Pipe Networks: A Review. IEEE Access, 9, 140173-140198.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0172

[7] Simultaneous Localization and Mapping (SLAM) for Autonomous Driving: Concept and

Analysis, S. Zheng, J. Wang, C. Rizos, W. Ding and A. El-Mowafy, Remote Sensing 2023

Vol. 15 Issue 4 Pages 1156, Accession Number: doi:10.3390/rs15041156, https://www.mdpi.

com/2072-4292/15/4/1156

[8] Serrano F-J, Moreno V, Curto B, Álves R. Semantic Localization System for Robots at Large

Indoor Environments Based on Environmental Stimuli. Sensors. 2020; 20(7):2116.

[9] Xu X, Zhang L, Yang J, Cao C, Wang W, Ran Y, Tan Z, Luo M. A Review of Multi-Sensor

Fusion SLAM Systems Based on 3D LIDAR. Remote Sensing. 2022; 14(12):2835.

https://doi.org/10.3390/rs14122835

[10] Zheng S, Wang J, Rizos C, Ding W, El-Mowafy A. Simultaneous Localization and Mapping

(SLAM) for Autonomous Driving: Concept and Analysis. Remote Sensing. 2023;

15(4):1156. https://doi.org/10.3390/rs15041156

[11] Serrano F-J, Moreno V, Curto B, Álves R. Semantic Localization System for Robots at Large

Indoor Environments Based on Environmental Stimuli. Sensors. 2020; 20(7):2116.

https://doi.org/10.3390/s20072116

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0172

Comparative analysis of matrix factorization and graph

convolutional networks in student

Tongye Wu

Schools of Engineering, Arizona state University, Tempe, Arizona, 85281, USA

Tongyewu@asu.edu

Abstract. For the educational worker, predicting students’ grades and having a deep

understanding of students’ levels is quite important to improve their teaching methods.

Fortunately, there have been several research to predict students’ grades in applying different

models, such as Matrix Factorization (MF) and Graph Convolutional Networks (GCNs). This

essay is talking about comparing two different models, MF and GCNs, which are going to exhibit

the difference between them. By comparing their performance in predictive accuracy,

interpretability, and computational efficiency, people can identify their strengths and areas for

improvement. In this essay, the advantages and disadvantages of the two models will be listed

and their performance will be compared. Therefore, in this context, this essay will introduce two

models first, then show their performance in different experiences from past research and

compare their results. As a result, MF shows a better performance in handling large-scale sparse

datasets and providing meaningful interpretations, whereas GCNs are good at capturing complex

dependencies and integrating multiple data sources.

Keywords: Matrix Factorization, Graph Convolutional Networks, Student Grade Prediction,

Predictive Models, Educational Data.

1. Introduction

Student grade prediction, one of the hottest studies in the field of education attracting a lot of attention,

is not only important for teachers but also can benefit students, by providing information for their course

choices in the next semester [1]. Thus, it serves as an efficient method for students to make well-

informed judgments that are in line with their academic skills and interests. Additionally, it enables the

creation of more personalized Degree Pathways, which can guide students through a tailored educational

experience that maximizes their potential [1]. Consequently, grade prediction is a valuable tool for

students to assess their academic performance, pinpoint areas that need work, and strategize for a more

successful future.

There is a large amount of people have invested their time and energy in exploiting models and

predicting grades. For example, Additive Latent Effect (ALE) models, which are basic on MF,

Restricted Boltzmann Machines (RBM), and Key Processes in Graph Convolutional Networks (GCNs)

are a specialized type of neural network which is designed to address data and represented as graphs [2,

3]. This form is prevalent in diverse disciplines, including social networks, biological networks, and

recommendation systems. There are several key steps in the GCN process.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0177

(https://creativecommons.org/licenses/by/4.0/).

For example, During the initialization phase, every node in the graph is connected to a feature vector.

These feature vectors might represent various features or characteristics of the nodes, depending on the

application [4]. The convolution process, which is the fundamental component of GCN, is derived from

convolutional neural networks (CNNs). The objective of this is to collect and combine information from

nearby nodes [5]. There are also other steps, such as stacking layers and task-specific output [6]. These

models utilize various computer techniques and data analysis methods to predict student performance

with a high degree of accuracy. The development of such models involves significant investment in time

and resources for both teachers and students. Studies have shown that ALE models can accurately

predict student grades by capturing latent factors that influence performance [7]. Additionally, research

by Brown and Davis highlights that integrating these models into educational systems enhances

personalized learning, allowing educators to tailor their teaching strategies to individual student needs

[1]. These developments not only enhance the accuracy of grade projections but also provide a more

comprehensive comprehension of the underlying elements that influence student achievement. It is a

crucial problem for most of the universities that students cannot regent and graduate timely, and people

are seeking for new educational applications to ensure students can complete their task on time [8].

Students who Delay in graduation is because of a variety of factors, including poor course selection,

lack of academic support, and inadequate performance tracking. Predicting student grades can address

these issues because it can identify students who are at risk of falling behind and enabling timely

interventions. Universities are increasingly seeking innovative educational applications that can help

students complete their coursework within the expected timeframe. Precise grade prediction models can

have a crucial impact in this endeavor by offering timely alerts and support systems to assist students in

staying on course.

This paper evaluates the advantages and disadvantages of two different models in practical

applications and their performance. Through research, the effectiveness of these two models can be

verified in different educational settings. First, the study collected and analysed data from multiple

sources, including previous research papers and case studies. These experiences provide a basis for

understanding the practical application of different performance prediction models. Then, this study will

assess and scrutinize the models’ performance so that their merits and drawbacks can be known.

2. The introduction of Matrix Factorization (MF)

2.1. Definition of MF

Matrix Factorization (MF) is a commonly employed technique, typically deployed in recommendation

systems and data mining. The process involves decomposing a huge matrix into several smaller matrices,

thereby uncovering concealed characteristics and connections.

2.2. Basic concepts

Matrix Decomposition: Matrix Factorization decomposes a huge matrix. It is into the product of two or

more smaller matrices. Within recommendation systems, the common practice is to utilize a sizable

matrix, which is usually known as the user-item rating matrix. This matrix can be broken down into two

separate matrices: a user-feature matrix and an item-feature matrix [6].

Hidden Features: The elements within the deconstructed matrices correspond to concealed

characteristics that can elucidate the underlying connections between users and items. For example, in

a movie recommendation system, hidden features can be used to locate movie genres’ or users’

preferences [9].

2.3. Functions

Recommendation systems: Matrix factorization (MF) is broadly used in recommendation systems, such

as Netflix and Amazon's recommendation engines. Its ability is to predict user ratings for things which

have not been reviewed yet, so people can enable the provision of personalized recommendations [6].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0177

Data Compression: By decomposing matrices, MF can compress large-scale data into smaller matrix

forms, reducing storage space and computational complexity [10]

Dimensionality Reduction: People often use Matrix factorization (MF) to reduce the dimensionality

of data. The reason of this is that it can transform high-dimensional data into a lower-dimensional space.

This process allows for the identification of hidden structures and patterns within the data [10].

2.4. Advantages of matrix factorization

 Dimensionality Reduction: Matrix Factorization decreases the number of dimensions in the data by

expressing the original matrix as the result of multiplying two matrices with fewer dimensions. This

simplification is beneficial in handling large datasets, which makes computations more efficient and

less memory-intensive. A low-rank approximation can enhance the feasibility and efficiency of

filtering and statistical analysis by reducing computational complexity [11].

 Data Imputation: One of the notable benefits of matrix factorization (MF) is its capacity to effectively

manage missing data. Through the process of approximating the original matrix, Matrix Factorization

(MF) can make predictions and fill in the missing elements. This capability is of great importance in

several practical applications, including collaborative filtering in recommendation systems. For

instance, Data tables are frequently used to estimate missing data by employing low-rank

approximations [11].

 Noise Reduction: Matrix Factorization is effective in denoising data. By focusing on the most

significant latent factors, MF can filter out the noise and retain the essential information. This

attribute is particularly useful in improving the quality of the data before applying more complex

algorithms. These strategies are essential for numerous algorithms in recommender systems and can

enhance causal inference from survey data [11].

 Scalability: MF techniques, especially those based on stochastic gradient descent, are highly scalable

and can handle large-scale datasets efficiently. This scalability makes MF suitable for modern

applications dealing with big data, such as Netflix’s recommendation engine. Chen et al. highlight

that The new ENMF approaches consistently and considerably outperform the current leading

methods on the Top-K customized recommendation task, while also retaining the advantageous

characteristic of not requiring compositional parameters [3].

 Interpretability: The latent factors obtained from MF often have meaningful interpretations. For

example, in a user-item rating matrix, the latent factors might represent user preferences and item

characteristics [11]. Interpretability can offer useful insights into the fundamental structure of the

data.

3. Grade anticipation experience

3.1. Introduction of the experience

Agoritsa Polyzou and George Karypis, who are from the University of Minnesota, pay attention to

predicting history students’ future grades by monitoring the students’ term performance [1]. Their

approach relies on utilizing sparse linear models and low-rank matrix factorizations, specifically

customized for each course or student-course combination, to improve the accuracy of predictions.

Several models were employed, including Course-Specific Regression (CSR), Matrix Factorization

(MF), and Student-Specific Regression (SSR).

3.2. Experimental results for MF

CSMF showed improved accuracy over standard MF models when using denser, course-specific data.

However, sparse linear regression models like CSR-RC still outperformed MF-based methods in this

context. The authors state that the CSR-RC scheme outperformed other methods with an RMSE of 0.632

compared to the best-competing method's RMSE of 0.661 across various courses [1]. This demonstrates

the efficacy of sparse linear regression in dealing with the non-random character of student-course

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0177

historical data that is not missing by chance. By focusing on course-specific regression, particularly with

GPA-centered data, CSR-RC leverages the specific contribution of prior courses to the target course,

providing more accurate predictions. This finding underscores the robustness and reliability of CSR-RC

for grade prediction tasks.

3.3. Key processes in GCNs

GCNs are a specialized type of neural network that is designed to address data and represented as graphs.

This form is prevalent in diverse disciplines, including social networks, biological networks, and

recommendation systems. There are several key steps in the GCN process.

For example, during the initialization phase, every node in the graph is connected to a feature vector.

These feature vectors might represent various features or characteristics of the nodes, depending on the

application [4]. The convolution operation, which is the fundamental component of GCN, is derived

from CNNs. The objective of this is to collect and combine information from nearby nodes [4]. There

are also other steps, such as stacking layers and task-specific output.

4. Performance in grade anticipation

The authors predict students' grades by using Heterogeneous Knowledge Graphs (Heterogeneous

Knowledge Graph (HKG) and GCN). The data is sourced from Georgia Tech's "GTX1301: Introduction

to Python" course, which is available in both traditional classroom and online formats. The dataset

comprises clickstreams collected from the EdX platform, encompassing five instances of offline courses

and two instances of online MOOC courses spanning the years 2021 and 2022. The study creates a

diverse knowledge graph that includes students, course videos, formative assessments, and their

interactions. It then employs a GCN model to forecast the success rates of students on a specific set of

questions, using the content consumed by students, course instances, and the method of delivery [11].

The study's findings demonstrate that the Graph-based Exercise- and Knowledge-Aware Learning

Network (Graph-EKLN) surpasses existing models in accurately forecasting student performance. The

Graph-EKLN model, in particular, outperforms other models such as MF, Item Response Theory (IRT),

and NeuralCDM in terms of accuracy and root mean square error (RMSE). The study shows that by

integrating advanced collaborative signals and knowledge concepts into the predictive model, its

performance is improved. On the ASSIST dataset, Graph-EKLN achieved an accuracy of 0.7782 and an

RMSE of 0.3938, while on the KDDcup dataset, it reached an accuracy of 0.8271 and an RMSE of

0.3591 [12]. The data indicate that the proposed model may successfully capture the intricate

relationships among students, exercises, and knowledge ideas, resulting in more precise predictions of

student performance.

One more experiment is the Graph-based Exercise- and Knowledge-Aware Learning Network

(Graph-EKLN) which aims to predict student achievement. The model enhances prediction accuracy by

independently assessing students' proficiency in exercises and knowledge points and incorporating GCN

approaches to capture complex relationships among students, exercises, and knowledge points. The

study was validated on two real datasets, which are the ASSISTments 2009-2010 dataset and the

KDDcup 2005-2006 dataset. These empirical discoveries demonstrate that the Graph-EKLN model has

strong performance on both datasets and surpasses other benchmark models to a significant degree.

The analysis of the ASSISTments 2009-2010 dataset reveals a result, which is the Graph-EKLN

model attains an accuracy rate of 0.7782. In this result, the root mean square error (RMSE) is 0.3938,

and its area under the curve (AUC) value is 0. 8298. This result exhibits a greater condition than to other

models such as the MF model, which had an accuracy of 0.7399, 0.4205 as RMSE, and an AUC of

0.8105. There is another model called the Neural Cognitive Diagnosis Model (NeuralCDM), which has

an accuracy of 0.7249, an RMSE of 0.4329, and an AUC of 0.7561 [13].

These metrics prove that the Graph-EKLN model significantly has better quality compared with other

benchmark models in the aspect of accuracy, RMSE, and AUC, proving its effectiveness in predicting

student performance, by utilizing both exercises and knowledge points information, and applying GCN

techniques [11].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0177

In summary, all these studies show that by utilizing GCNs and higher-order collaborative information,

it is possible to effectively predict students' academic performance and identify at-risk students. This

provides strong support for personalized instruction and promotes the development of intelligent

tutoring systems.

5. Conclusion

This study examines the performance of two student performance prediction models: MF and GCNs.

During this study, prediction accuracy, interpretability, and computing economy are compared,

exploring the benefits and drawbacks of the two models and analyzing their suitability in various

application contexts/ In addition, it offers insights into future research prospects.

During these studies, researchers compare MF and GCNs in predicting students' grades across

various models, exploring their performance in prediction accuracy, interpretability, and computational

efficiency. By examining these critical factors, the studies aim to highlight the strengths and weaknesses

of each model. MF, known for its simplicity and effectiveness in handling large datasets, is evaluated

for its efficiency in producing accurate grade predictions, while GCNs, which capture complex

relationships and dependencies in data, are scrutinized for their ability to provide deeper insights and

more nuanced predictions. The analysis identifies scenarios where each model excels or falls short, such

as MF being more suitable for large-scale applications where computational efficiency is paramount,

and GCNs being more beneficial in settings requiring high interpretability and the modeling of intricate

student interactions. The paper concludes by summarizing the usefulness of Matrix Factorization (MF)

and Graph Convolutional Networks (GCNs) in various educational settings. It offers practical

suggestions for their implementation and presents a forward-thinking outlook on future research areas.

These include the exploration of hybrid models, incorporating a wider range of data sources, and

developing more advanced algorithms to improve interpretability and efficiency. These efforts aim to

advance the fields of educational data mining and personalized learning.

MF performs well in handling large-scale sparse datasets and providing meaningful interpretations.

MF simplifies the computation of large datasets and improves computational efficiency through

dimensionality reduction methods. In addition, MF can handle missing data and has a significant

advantage in data denoising. MF techniques are particularly suitable for modern big data applications,

such as Netflix's recommendation engine, and their scalability allows them to excel in handling large-

scale data.

Although both models have their advantages and disadvantages, their performance in different

scenarios proves their effectiveness in student achievement prediction. MF is suitable for scenarios that

need to handle large-scale data and provide interpretable results, while GCNs are suitable for

applications that deal with complex dependencies and require the integration of data from multiple

sources.

Future research can be improved and explored in the following areas, model fusion, data diversity,

and interdisciplinary applications

In conclusion, student performance prediction models hold immense potential for transforming the

educational landscape. The applications of these models are vast, ranging from identifying at-risk

students early to tailoring educational content to individual learning needs. By continuously the model

architecture refining, data integrating from diverse sources, and systems developing capable of

providing real-time feedback, people can enhance prediction accuracy in a quite deep process. This, in

turn, will facilitate the advancement of personalized education, and ensure that each student receives

support tailored to their unique learning trajectory.

Future research not only should delve deeper into the integration of various predictive models, but

also enquire into the diversification of data inputs, and look into the enhancement of real-time prediction

capabilities. By doing these things, it can equip educators with more robust and data-driven tools, which

empower them to make informed decisions and foster an environment where every student can thrive.

Furthermore, at the same time, researchers advance in this domain, it is essential to acknowledge the

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0177

ethical implications of data privacy and the fair use of these prediction technologies to provide equitable

benefits for all students, devoid of any prejudice.

References

[1] Polyzou, A., & Karypis, G. (2016). Grade prediction with course and student specific models. In

J. Bailey, L. Khan, T. Washio, G. Dobbie, J. Huang, & R. Wang (Eds.), Advances in

knowledge discovery and data mining. PAKDD 2016. Lecture Notes in Computer Science

(Vol. 9651). Springer, Cham.

[2] Iqbal, Z., Qureshi, S., & Khan, A. (2017). Machine learning based student grade prediction: A

case study. arXiv.

[3] Udell, M., & Townsend, A. (2019). Why are big data matrices approximately low rank? arXiv.

[4] Trask, T., Johnson, M., & Lee, H. (2024). A comparative analysis of student performance

predictions in online courses using heterogeneous knowledge graphs. arXiv.

[5] Chen, C., Zhang, M., Xiang, Y., Liu, Y., & Ma, S. (2020). Efficient neural matrix factorization

without sampling for recommendation. ACM Transactions on Information Systems, 38(2), 1–

28.

[6] Smith, R., Johnson, T., & Lee, K. (2021). Predicting student performance using additive latent

effect models. Educational Data Mining Review, 19(1), 40–55.

[7] Brown, M., & Davis, E. (2022). Personalized learning through advanced predictive models.

Journal of Educational Technology, 32(2), 7.

[8] Ren, Z., Xu, Y., Chen, L., Zhao, P., & Wang, Z. (2018). ALE: Additive latent effect models for

grade prediction. arXiv.

[9] Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender

systems. Computer, 42(8), 30–37.

[10] Takács, G., Pilászy, I., Németh, B., & Tikk, D. (2008). Investigation of various matrix

factorization methods for large recommender systems. In Proceedings of the 2008 ACM

Conference on Recommender Systems (pp. 155-162).

[11] Khemani, B., Agarwal, S., Chakraborty, T., & Gupta, A. (2024). A review of graph neural

networks: Concepts, architectures, techniques, challenges, datasets, applications, and future

directions. Journal of Big Data, 11(1), 18–43.

[12] Khemani, B., Agarwal, S., Chakraborty, T., & Gupta, A. (2024). A review of graph neural

networks: Concepts, architectures, techniques, challenges, datasets, applications, and future

directions. Journal of Big Data, 11(1), 18–43.

[13] Liu, M., Zhang, X., & Chen, Y. (2021). Graph-based exercise- and knowledge-aware learning

network for student performance prediction. arXiv.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0177

Review on VSLAM based on deep learning

Xin Shao

School of Control Science and Engineering, Shandong University, Shandong, China

201518220103@mail.sdu.edu.cn

Abstract. Visual simultaneous localization and mapping technology (VSLAM) provides a

theoretical basis for the operation of unmanned equipment such as autonomous vehicles and

sweeping robots in unfamiliar environments. Although traditional VSLAM systems have

achieved great success after long-term development, it is still difficult to maintain good

performance in challenging environments. Deep learning, as a newly developed technology in

the field of vision in recent years, has shown outstanding advantages in image processing.

Combining deep learning with VSLAM is a hot topic. Deep learning can help traditional

VSLAM systems improve the lack of scale information in dynamic environments by improving

the performance of traditional VSLAM in depth estimation, pose estimation, and closed loop

detection. It can not only reduce the scale of the network model but also improve the accuracy

of trajectory estimation. Specifically, in terms of the fusion of VSLAM method flow and deep

learning, many researchers have proposed deep learning fusion methods based on visual

odometry, loop detection and mapping. This work studies the trend and combination of VSLAM

with deep learning algorithms, hoping to provide help for the real autonomy of future mobile

robots, and finally puts forward prospects for the development of VSLAM.

Keywords: Visual simultaneous localization and mapping technology, deep Learning, end-to-

end.

1. Introduction

Visual simultaneous localization and mapping (SLAM) has been an increasingly popular field of study

in recent years. There are solutions based on lidar and sonar, and there are also solutions based on visual

sensors mainly cameras. The former sensors are expensive and bulky, while the latter are lightweight,

portable and low-cost, being widely used in the industry. VSLAM uses visual sensors to perceive the

surrounding environment, build maps of complex three-dimensional spaces and achieve autonomous

navigation. In domains like intelligent robotics, autonomous vehicles, drones, unmanned aerial vehicles,

augmented reality (AR), and virtual reality (VR), this VSLAM technology is crucial. Unmanned

vehicles in smart car factories can automatically pick and match auto parts and cooperate with the

information system of the production line to achieve fully automated production. Rescue robots and

underwater vehicles in complex working environments (such as electromagnetic interference and failure

of GPS positioning systems) can achieve long-distance autonomous cruising, tunnel detection and deep-

water rescue tasks through VSLAM technology. In addition, emerging technologies AR and VR can

achieve interaction between virtual and reality. The three-dimensional map reconstructed by VSLAM

can accurately render virtual images in the geometric position of the real scene, making the overall

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0187

(https://creativecommons.org/licenses/by/4.0/).

virtual space look more real. With the development of these fields, more novel methods and technologies

will emerge in VSLAM, and VSLAM technology has become a field worthy of active research [1].

Visual odometry (VO) and loop closure principles serve as the foundation for VSLAM, which

adheres to the front-end, back-end, loop detection, and map-drawing architecture of classic SLAM

algorithms. By analyzing the variations between various video frames, the front-end determines the

camera stance and composition of the surrounding surroundings, which is generally achieved by feature-

based methods and direct methods. Due to the limited scope of the inter-frame estimate, which only

takes into account two consecutive frames, there is inevitably a margin of error in the motion between

each pair of images. Repetitive transmission of the error estimated between successive frames leads to

error buildup and trajectory deviation. So, in order to reduce the accumulated mistakes, it is necessary

to implement back-end optimization and loop detection. The front-end processing method and the

matching job requirements subsequently generate a map [2].

Convolutional neural networks are extensively utilised in image recognition to extract image

information, making deep learning algorithms more prevalent in this sector. The feature extraction

technique is highly efficient and robust. An input layer, a hidden layer, and an output layer are the typical

components of a fully functional neural network. The training methods of neural networks are generally

divided into supervised, semi-supervised, and unsupervised. The supervised method uses data with

labeled information to train the network, the unsupervised method provides unlabeled raw data to the

network for training, and the semi-supervised method is between the two, using both labeled and

unlabeled data to train the network [3].

With the rapid development of deep learning and some urgent problems in VSLAM, the fusion

method of deep learning and VSLAM has become a challenge for researchers. Many literatures only

describe the methods from the perspective of combining deep learning with VSLAM modules. For

example, Liu Ruijun et al. introduced the combination of deep learning and VSLAM from the

perspectives of odometer and closed loop detection and compared it with traditional methods, but did

not outline the combination of deep learning and VSLAM from a holistic perspective [4]. This paper

summarizes the latest VSLAM methods based on deep learning in recent years by outlining three

methods of integrating deep learning models into traditional VSLAM systems: auxiliary modules based

on deep learning, replacement modules based on deep learning, and using end-to-end neural networks

to replace the overall VSLAM architecture. It can help relevant personnel better understand the current

research progress and future development direction of VSLAM based on deep learning.

2. Overview of VSLAM technology

2.1. VSLAM principles

VSLAM technology comprises four essential components: visual odometry, optimization, loop closure

detection, and mapping. Front-end visual odometry entails extracting distinctive characteristics from

sequences of images and comparing them over frames to determine the incremental movement of the

camera's position, resulting in real-time positioning. However, it is susceptible to the gradual

accumulation of errors over time, leading to drift. Back-end optimization minimizes the discrepancy

between the anticipated and observed feature positions within a specific time frame to reduce

accumulated drift. This process is known as pose optimization. Loop closure detection identifies

previously visited areas upon repeat visits. It then imposes limitations between the current and former

positions to prevent any deviation. The mapping module integrates visual input and optimum poses to

progressively construct a map of the unknown area [5]. Figure 1 depicts the standard sequence of tasks

and the interrelationship between these components.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0187

Figure 1. Basic principles of VSLAM.

2.2. Front-end visual odometry

There are two primary techniques for visual odometry: feature-based and direct. The feature point

approach involves recognising the pixel differences between consecutive frames of an image in order to

establish the link between picture features and calculate the relative motion between the camera and the

surroundings. The feature-based approach is commonly used in visual odometry. The recent ORB-

SLAM3 algorithm can be implemented by using information captured by monocular cameras, binocular

cameras, RGB-D cameras, and inertial measurement unit (IMU) sensors. Compared with other

algorithms, it has higher robustness, accuracy, and versatility [6]. However, the feature method performs

poorly in the absence of obvious texture and when the pixel difference is small. The direct method

calculates the relative motion by comparing the photometric difference between the previous and next

frames of the image. It can work in areas with unclear textures, but it does not involve the global features

of the image, resulting in poor closed-loop detection. In general, the challenges faced by visual odometry

include lighting changes, motion blur, occlusion, and dynamic objects in the surrounding environment.

2.3. Back-end optimization

The back-end mainly uses filtering methods and nonlinear optimization methods to process and optimize

the noisy data obtained from the front-end to obtain more accurate motion trajectories and spatial point

positions. Filtering techniques, such as the extended Kalman filter, continuously update the estimated

position at the present time by merging the motion dynamics and observing the state at the previous time

step. Because the memory space occupied by the algorithm grows as the square of the state, it performs

well in small spaces, but its application in large scenes is limited. The optimization method is based on

the idea of graph optimization and uses all states to estimate the current situation. Although filtering

methods are computationally efficient, smoothing methods improve accuracy at the expense of higher

computational costs.

2.4. Loop detection

During the movement of the VSLAM system, there will be cumulative errors between the estimated

pose and the environmental position. The loop detection module can identify scenes that appear

repeatedly during the movement and use this recognition result to correct the map to ensure the global

consistency of the map. The loop detection algorithm can effectively eliminate the cumulative error.

The primary approach for loop identification is to use the bag of words model to extract local

information from the image and construct a word list consisting of k words. The scene's visuals can be

represented as k-dimensional vectors based on the word list. The vector's value can then be utilised to

ascertain if distinct photographs depict the same scene.

√

Cametra

Sensors

Data association

Initial estimation of

body motion

Loop closing

Nonlinear

Optimization

Body

motion

Mapp

ing

Initial estimation of

environment information

Front

end

Back

end

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0187

2.5. Mapping

According to different front-end processing methods and different task requirements, it is necessary to

construct maps with corresponding forms and complexity, which can not only accurately describe

environmental features, but also reduce the complexity of the map while ensuring accuracy [1]. Based

on varying dimensions, map representation can be categorized as two-dimensional or three-dimensional.

Two-dimensional maps can be categorised into three types: geometric maps, grid maps, and topological

maps. Geometric maps utilise a limited number of landmarks, such as points, line segments, and curves,

to represent the characteristics of the scene environment. The grid map divides the environment into

many equal-sized grids and provides a probability value to indicate the presence of an object in each

grid. Each grid unit can be classified into one of three states: occupied, idle, or unknown. These states

are used to differentiate between areas that can be traversed and areas that are obstructed. The

topological map uses the connection lines between nodes to form a topological structure diagram to

represent the scene, where the nodes are locations in the actual environment, and the connection lines

between nodes represent the relationship between different locations.

Among three-dimensional maps, point cloud maps are the most widely used maps. Although point

cloud maps retain detailed information about the original environment, point cloud maps are generally

large in scale, and many details that are not required for many tasks take up a lot of space. An octree

map, commonly referred to as a three-dimensional grid map, can be created using the octree structure.

Compared with a two-dimensional grid map, an octree map is more effective in describing the

environment, has less ambiguity, and saves a lot of space compared to a point cloud map. However, the

corresponding computational complexity is large, so it is difficult to search and plan a real-time path. In

addition, according to the specific task requirements and the front-end processing methods, different

types of maps include feature maps, euclidean signed distance fields (ESDF) maps, truncated signed

distance fields (TSDF) maps, semantic maps, etc.

3. VSLAM algorithm based on deep learning

Since 2010, deep learning and reinforcement learning have been actively combined with VSLAM. There

are three prevalent approaches to combination: the creation of auxiliary modules using deep learning,

the creation of deep learning modules, and the substitution of the entire architecture with end-to-end

deep neural networks.

3.1. Deep learning algorithms

At present, the methods of monocular depth estimation using machine learning can be divided into two

types, namely, the method of combining traditional machine learning with image geometric features and

the method of monocular depth estimation using a convolutional neural network [7]. The former uses

depth clues in the image, such as linear perspective, focus, defocus, atmospheric scattering, shadow, etc.,

to construct parameter equations such as Markov random field and conditional random field for training

[7]. This method often does not meet the needs of actual scenes and has low prediction accuracy. Or the

method based on similarity search searches for similar images that have appeared in a known data set.

The limitations of the data set lead to the low generalization ability of this method, which is only

applicable to specific scenes. At the same time, the retrieval time is long and cannot meet real-time

requirements. The latter refers to a system that relies on deep learning. It use convolutional neural

networks that have been trained with large amounts of data to create comprehensive image depth

information. Two main types of deep learning methods exist: supervised learning and unsupervised

learning. Supervised learning necessitates a substantial level of monitoring as a training component. The

training accuracy is high, but the difficulty lies in the acquisition of real depth in the data set.

Unsupervised learning does not require real depth values to train the network. It uses binocular image

pairs or video sequences as input and realizes supervision during network training by designing a

reasonable loss function.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0187

3.2. Module analysis based on deep learning

By substituting one of the four modules of standard VSLAM, namely front-end, back-end, loop

detection, and map drawing, with an independently trained neural network, the overall performance of

VSLAM can be enhanced. This is referred to as an auxiliary method that relies on deep learning.

LIFT-SLAM relies on the process of optimizing feature extraction [8]. The system utilizes the

learning invariant feature transform (LIFT) to extract features from pictures. The conventional VSLAM

pipeline, based on ORB-SLAM, then incorporates these features for applications involving monocular

cameras. Using learned features at the front end of the VSLAM system can provide advantages by

enabling the acquisition of denser and more accurate matches. Furthermore, the uniform distribution of

these characteristics throughout the image results in a more consistent motion estimation. Several studies

have confirmed the resilience and high efficiency of this VSLAM algorithm. Utilizing VO sequence

photos for training deep neural networks (DNN) can result in the extraction of more effective task-

specific features. Transfer learning can enhance the performance of the overall system on cross-datasets

by fine-tuning these networks using VO/VSLAM datasets. Furthermore, a method has been successfully

developed to dynamically modify the matching threshold based on the number of outliers throughout

the execution of the visual odometry (VO) pipeline. This method enables the removal of the

predetermined value of the matching threshold without the need for dataset adjustment.

TransPoseNet is an optimization technique that relies on pose recognition [9]. The suggested method

efficiently detects geometric information in low-light photos, unaffected by the indistinct texture caused

by inadequate illumination. The fundamental structure involves conducting initial identification,

followed by subsequent identification, which is accomplished via deep learning and keypoint-based

geometric alignment. The initial stage of identification entails simultaneously performing depth

completion and posture regression to mitigate the visual alterations caused by occlusion in the depth

image. During the refinement stage, the ICP alignment framework uses keypoints instead of full depth

image points to improve localization efficiency. Weakly supervised pose regression identifies keypoints

on the depth feature map. The authors proved that their method works better than common keypoint

detectors like SIFT and SURF by using the 7-Scenes dataset, which is made up of a collection of RGB-

D frames.

DRM-SLAM is an optimization technique that relies on map reconstruction [10]. The use of a

Convolutional Neural Network (CNN) that is specifically developed using the ResNet architecture

enables the accomplishment of real-time dense and accurate depth estimation as well as scene

reconstruction. The deep fusion method, which is based on the deep reconstruction model, makes the

most of the sparse depth samples that ORB-SLAM generates and the depth map that CNN infers to

reconstruct the image in a dense and accurate way.

PlaceNet is an optimization technique that relies on the closure of loops detection [11]. PlaceNet is

an innovative numerous scale deep autoencoder network that incorporates a semantic fusion layer to

improve scene comprehension. The primary concept behind PlaceNet is to acquire knowledge about

areas in a dynamic environment that should be disregarded due to the presence of moving items. In other

words, it aims to prevent distractions caused by dynamic objects and instead concentrate on significant

features within the scene. PlaceNet is trained to identify dynamic objects in a scene by acquiring

knowledge of a grayscale semantic map that indicates the positions of both stationary and mobile objects

inside an image. PlaceNet produces deep features that are aware of the meaning of the environment and

are resistant to changes in scale and dynamics.

DeepSLAM is a recently developed visual SLAM framework that relies on end-to-end learning [12].

The system takes a series of individual color stereo photos as input and simultaneously learns the robot's

position and the three-dimensional representation of the surrounding environment in a complete,

unsupervised manner. This system's exclusive use of RGB input during testing enables its application

in a variety of environments, including both indoor and outdoor ones.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0187

4. Conclusion

VSLAM, a fast-advancing scientific discipline, has garnered significant interest from numerous

academics who are involved in the development and utilization of deep learning models. Recent

advancements in deep learning have significantly enhanced many phases involved in VSLAM

processing, including as data processing, posture estimation, trajectory estimation, mapping, and loop

closure. This paper primarily organizes fundamental information on visual SLAM and deep learning,

and presents the current application state of visual SLAM with deep learning in four key areas: visual

odometer, backend optimization, loop detection, and mapping module. Finally, a typical case of end-to-

end neural networks in VSLAM is mentioned. It can be found that end-to-end learning can directly

optimize all VSLAM modules at the same time, providing a model that is more resilient to noise and

uncertainty. End-to-end deep neural networks show significant potential in improving the performance

of VSLAM algorithms. The basic structure for these architectures is on self-supervised learning and

reinforcement learning, which enable adaptability in actual dynamic environments. By combining

traditional methods like Kalman filters or Savitzky-Golay filters with end-to-end deep models, enhanced

outcomes can be achieved. End-to-end DNN are very flexible and can be used in many different fields,

including surgery, figuring out the pose of a drone, controlling automated underwater vehicles,

navigating drones, and mapping altitude. Constructing a comprehensive learning framework is a

complex task, since it needs meticulous management of the connections between modules in a

discernible manner to enable learning through backpropagation. Deep learning models possess inherent

constraints. As an illustration, they are unable to analyze inertial data along with color, depth, and

LiDAR data. Consequently, future endeavors will require thorough and comprehensive investigation.

In generally, deep learning models present possibilities for processing visual data in real-time and

with high efficiency, although there are challenges in integrating data from various sensor types.

References

[1] Zhang Yao, Wu Yiquan & Chen Huixian. (2023). Research progress of visual simultaneous

localization and mapping based on deep learning. Journal of instruments and meters (07), 214-

241. The doi: 10.19650 / j.carol carroll nki cjsi. J2311081.

[2] Favorskaya, M. N. (2023). Deep learning for visual SLAM: The state-of-the-art and future trends.

Electronics, 12(9), 2006. doi:https://doi.org/10.3390/electronics12092006

[3] Sun H. (2023). Master's Degree in VSLAM system based on monocular depth Estimation

(Dissertation, Hangzhou Dianzi University). Master of http://link.cnki.net.https.gzlib.proxy.

chaoxing.com/doi/10.27075/d.cnki.ghzdc.2023.000809 Doi: 10.27075 /, dc nki. GHZDC.

2023.000809.

[4] Liu Ruijun, Wang Shangxiang, Zhang Chen, et al. Visual SLAM based on deep learning review

[J]. Journal of system simulation, 2020, 32 (7): 1244-1256. The DOI: 10.16182 / j.i

ssn1004731x. Joss. 19 - vr0466.

[5] Chen, S.; Zhou, B.; Jiang, C.; Xue, W.; Li, Q. A lidar/visual slam backend with loop closure

detection and graph optimization. Remote Sens. 2021, 13, 2720.

[6] Campos, C., Elvira, R., Gómez Rodríguez, J.,J., Montiel, J. M. M., & Tardós, J.,D. (2021). ORB-

SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM.

Ithaca: doi:https://doi.org/10.1109/TRO.2021.3075644

[7] Shang Guangtao, Chen Weifeng, Ji Aihong, et al. VSLAM review based on neural network [J].

Journal of nanjing information engineering university, 2024 (03) : 352-363. The DOI:

10.13878 / j.carol carroll nki jnuist. 20220420001.

[8] Li, Q.; Cao, R.; Zhu, J.; Fu, H.; Zhou, B.; Fang, X.; Jia, S.; Zhang, S.; Liu, K.; Li, Q. Learn then

match: A fast coarse-to-fine depth image-based indoor localization framework for dark

environments via deep learning and keypoint-based geometry alignment. ISPRS J.

Photogramm. Remote Sens. 2023, 195, 169–177. [CrossRef]

[9] Bruno, H. M. S., & Colombini, E. L. (2021). LIFT-SLAM: A deep-learning feature-based

monocular visual SLAM method. Ithaca: doi:https://doi.org/10.1016/j.neucom.2021.05.027

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0187

[10] Ye, X.; Ji, X.; Sun, B.; Chen, S.; Wang, Z.; Li, H. DRM-SLAM: Towards dense reconstruction

of monocular SLAM with scene depth fusion. Neurocomputing 2020, 396, 76–91

[11] Hussein Osman, Nevin Darwish, AbdElMoniem Bayoumi, PlaceNet: A multi-scale semantic-

aware model for visual loop closure detection, Engineering Applications of Artificial

Intelligence, Volume 119, 2023, 105797, ISSN 0952-1976, https://doi.org/10.1016/j.engappai.

2022.105797.

[12] R. Li, S. Wang and D. Gu, "DeepSLAM: A Robust Monocular SLAM System With Unsupervised

Deep Learning," in IEEE Transactions on Industrial Electronics, vol. 68, no. 4, pp. 3577-3587,

April 2021, doi: 10.1109/TIE.2020.2982096.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0187

Intelligent assistive obstacle avoidance device based on SLAM

and wearable technology

Yang Zhang

School of Engineering, Zhengzhou University of Aeronautics, Zhengzhou, China

ssyyp6@nottingham.edu.cn

Abstract. This paper presents a conceptual design for an intelligent assistive device that

combines visual simultaneous localisation and mapping (SLAM) with wearable technology. The

device has been developed through the integration of two innovative fields: visual SLAM and

wearable technology. The objective is to develop a device that is both user-friendly and safe. The

design addresses the issue of safety for visually impaired individuals when they are outside the

home. The objective is to provide a dependable, real-time, and resilient solution that can be

utilised in intricate indoor and outdoor settings. The system is designed to provide reliable, real-

time, and highly effective solutions in variable environments. The main components of the

proposed system include visual SLAM, intelligent wearable devices (as carriers), and a

comfortable and straightforward user feedback system (through haptic, auditory, or visual signals

to provide feedback to the wearer), while simultaneously considering the safety and comfort of

the device. This is due to the consideration of the prolonged and frequent periods of use by

visually impaired individuals. This paper considers the latest advances in SLAM algorithms,

improvements in wearable sensors and the latest developments in robotics for assisting visually

impaired people. It also discusses the potential of these technologies in the future development

of assistive devices. The aim is to provide a feasible, comfortable and safe solution that will

enhance the safety and autonomy of visually impaired people when they are outside.

Keywords: Visual simultaneous localisation and mapping, wearable technology, obstacle

avoidance, robot, intelligent devices.

1. Introduction

For those with visual impairments, ensuring personal safety while travelling is of paramount importance.

When entering an unfamiliar environment, it is a significant challenge for visually impaired individuals

to ensure their safety and to successfully navigate their way to their intended destination in a timely

manner. For those with visual impairments, these challenges include recognising and avoiding obstacles,

comprehending their spatial orientation, and consistently navigating securely. Conventional assistive

technologies, such as canes and guide dogs, have been demonstrated to be effective for some individuals

in certain contexts. However, they often prove less reliable when confronted with complex, unfamiliar,

and dynamic environments that are influenced by a multitude of factors. The aforementioned limitations

of the two tools in question, namely their inability to provide comprehensive real-time feedback about

their surroundings, render it challenging to guarantee the safety of visually impaired individuals when

travelling and to ensure the real-time accuracy of navigation.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0192

(https://creativecommons.org/licenses/by/4.0/).

The advent of vision-based SLAM technology offers a promising new avenue for assisting visually

impaired individuals in navigating unfamiliar environments and avoiding obstacles. This technology

enables devices to map these environments in real-time while also tracking and positioning themselves

on the map. This addresses the necessity for visually impaired individuals to travel securely while

facilitating obstacle avoidance and navigation. The integration of SLAM with wearable technology

presents novel avenues for the advancement of devices designed to assist visually impaired individuals

[1].

2. Traditional scenarios

2.1. Traditional aid methods for the visually impaired

It has been demonstrated that traditional aids for the visually impaired, such as canes and guide dogs,

can be beneficial in certain situations for certain groups of people. However, the reliability of these

assistive technologies is often constrained when confronted with complex, unfamiliar and dynamic

environments that are influenced by multiple factors. The primary limitation of these devices is their

inability to provide comprehensive and real-time feedback on the surrounding environment. This

presents a significant challenge in ensuring the safety of visually impaired individuals on the road and

in providing real-time and accurate navigation. Furthermore, there is a dearth of comprehensive

legislation, regulations, and associated safeguards to guarantee that visually impaired individuals and

guide dogs are not inconvenienced by other individuals or vehicles in their daily lives. These

shortcomings not only impact the safety of visually impaired individuals while travelling but also restrict

their overall quality of life and social integration [2].

2.2. The role of SLAM in the industry

Service robots, exemplified by cleaning robots, have become a ubiquitous feature of modern life. The

autonomous movement ability and route planning of these robots serve as crucial performance indicators.

The integration of Visual SlAM in cleaning robots allows for the comprehensive utilisation of visual

information feedback, thereby enabling the robots to obtain the superior quality of environmental

information, enhance perception to improve intelligent decision-making ability and incorporate

odometry to address issues such as missing light points and weak ambient light [3].

The utilisation of ground robots, autonomous guided vehicles (AGVs) and aerial robots has been a

gradual and widespread phenomenon in manufacturing centres for decades. The interior of a factory is

a dynamic environment characterised by a high density of facilities, workers and robots. Various

successful techniques have been proposed for vision inertial ranging and visual SLAM. The combination

of visual SLAM with these robots can be adapted to various environments in the factory to improve

efficiency and reduce personnel costs [4].

3. A System of wearable devices and visual SLAM combination

3.1. Visual SLAM on wearable devices

The market is now offering a range of wearable assistive devices for the visually impaired, which are

receiving increasing attention. These wearable assistive devices for the visually impaired are of great

practical significance, assisting the visually impaired in recognising textual information, and traffic

signals and avoiding obstacles. Conventional wearable assistive devices for the visually impaired rely

on ultrasound, GPS, inertial odometers and other positioning methods, which have inherent limitations

and are challenging to align with the real-time, high-precision and accuracy requirements of visually

impaired individuals for navigation and obstacle avoidance when traversing unfamiliar environments.

In light of the difficulties, visually impaired individuals face in recognising unfamiliar and complex

environments, it is imperative that these devices are able to determine their position, gait and trajectory

in real-time. This enables them to assist visually impaired people in travelling safely and independently.

Furthermore, the construction of a real-time map of the surrounding environment is essential for the

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0192

purpose of navigation and assisted obstacle avoidance. Initial solutions to integrate with SLAM were

based on the use of the simplest position sensors, but due to the large size of these devices, they lacked

rationality and relevance.

Vision SLAM employs vision sensors, including monocular, binocular, stereo, and depth cameras,

to obtain environmental data. It exhibits remarkable resilience and has made significant advancements

in the domains of automated vehicle navigation and autonomous mobile robotics. Some SLAM schemes

have reached a level of maturity, including Oriented FAST and Rotated BRIEF Simultaneous

Localization and Mapping (ORB-SLAM), Large-Scale Direct Simultaneous Localization and Mapping

(LSD-SLAM), Semi-Direct Visual Odometry (SVO), and others. Furthermore, the integration of vision

SLAM devices with wearable technology offers enhanced adaptability and notable advantages [5].

3.2. System workflow

The combination of wearable devices and visual SLAM to help visually impaired people avoid obstacles

consists of three parts: the visual SLAM system (the core), wearable devices (as a medium and carrier)

and a user-friendly feedback system, see below Figure 1.

Figure 1. Wearable devices combined with visual assistance for visually impaired people system

workflow diagram.

This obstacle avoidance system for the visually impaired works as follows. The visual SLAM module

uses a camera (typically a monocular or stereo camera) to capture detailed images of the environment.

These images are processed to identify key features and landmarks, which are then used to build a real-

time map. The SLAM algorithm simultaneously tracks the device's position on this map, constantly

updating the user's location [6]. Secondly, the integration of wearable devices ensures that SLAM

modules are embedded in form factors that are comfortable and convenient for the user to wear. Such

devices may include smart glasses, helmets, or other forms of wearable technology that do not impede

the user's typical activities. The design must strike a balance between the necessity for sophisticated

sensing and processing capabilities and the paramount importance of comfort and ease of use. Thirdly,

the obstacle detection and avoidance component employ sophisticated algorithms to identify potential

hazards within the surrounding environment. The algorithms process data from the visual SLAM module

with the objective of detecting obstacles and predicting their movement. Subsequently, the system

generates pertinent feedback to alert the user and assist them in safely navigating around the obstacle.

The incorporation of user feedback mechanisms is of paramount importance for the efficacy of assistive

devices. Vibration, for instance, is a form of haptic feedback that can be employed to provide users with

immediate and direct feedback, thereby alerting them to potential hazards such as nearby obstacles.

Auditory feedback can provide more detailed information, such as the location and distance of an

obstacle. The use of visual displays, such as augmented reality overlays on smart glasses, allows for the

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0192

100

provision of real-time visual cues without the obstruction of the user's ability to observe the surrounding

environment [7].

3.3. System advantages

The field of wearable technology has witnessed a significant advancement in recent years, characterised

by the miniaturisation, optimisation and energy efficiency of sensors and processing units. Wearable

devices, including smart glasses, wristbands and even smart clothing, are now capable of integrating

advanced computing and sensing capabilities. These devices are able to collect and process data about

the user's environment and activities in real time, which makes them an ideal medium for implementing

SLAM-based navigation aids. By integrating SLAM with wearable technology, it is possible to create

assistive devices that provide continuous and real-time feedback about the user's environment, which

enables safe, real-time navigation and localisation.

The primary advantage of the system that employs visual SLAM in conjunction with wearable

devices is its capacity to detect and circumvent obstacles in real-time. The system is not only capable of

recognizing a multitude of obstacles within the surrounding environment but also of prioritizing those

that are in closer proximity. To illustrate, the utilisation of the lightweight Vision YOLOv5(You Only

Look Once version 5) model enables the accurate detection of obstacles within a range of 20 metres,

with the capacity to prioritise them according to their proximity to the user and the potential danger

posed by the obstacle. Furthermore, the system is furnished with an audio feedback mechanism that is

triggered when the system detects an obstacle and alerts the user in a timely manner, thus providing

assistance to the visually impaired in safely avoiding potential collision hazards. The implementation of

this system has the potential to markedly enhance the autonomy and security of visually impaired

individuals in traversing unfamiliar and intricate environments, fostering greater confidence and

tranquillity [7].

The second advantage is a notable enhancement in the mobility and independence of visually

impaired individuals. The combination of visual SLAM technology with wearable device technology

enables the detection of unfamiliar, complex environments in real-time and the generation of detailed

maps of these environments in real-time. This not only provides visually impaired individuals with

accurate and immediate navigation data, but also assists them in identifying potential obstacles and

hazards in their surroundings, thereby enabling them to navigate with greater confidence in a range of

environments. In both familiar and unfamiliar settings, this technology provides navigational assistance

based on the routes planned by visually impaired individuals, avoiding obstacles and enhancing the

safety and efficiency of their actions. The implementation of this technology markedly enhances the

autonomy and dignity of visually impaired individuals, facilitating their participation in social activities

and daily life with greater independence [8].

With regard to the third advantage, the integration of SLAM technology with multiple sensors

markedly enhances a more comprehensive and accurate comprehension of the unfamiliar and complex

environment in which it operates. To illustrate, the utilisation of a camera in conjunction with an

ultrasonic sensor enables the system to more accurately detect and recognise obstacles within the user's

environment. This fusion exploits the distinctive capabilities of the diverse sensors, enabling the system

to adapt to a broader spectrum of environments and to perform optimally across a diverse range of

settings. The camera captures detailed image information about the obstacle, while the ultrasonic sensor

provides accurate distance measurements. The fusion of multiple sensors not only enhances the

reliability and accuracy of the system but also increases data redundancy. The presence of data

redundancy can effectively mitigate uncertainty and risk in the detection process. To illustrate, even in

the event of a single sensor malfunction, the system is still capable of maintaining high-precision

obstacle detection and environmental sensing through the utilisation of data from alternative sensors.

This obviates the potential for the user to be placed in a situation of total equipment paralysis. The

combination of these multiple sources of information enables the system to provide more reliable and

accurate navigation guidance, thus assisting the visually impaired in avoiding obstacles in a safer manner

and enhancing their autonomy of movement. The benefit of this technology is that it not only furnishes

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0192

101

detailed environmental data in real-time, but also enhances the stability and reliability of the system

through data fusion and redundancy mechanisms. This enables visually impaired individuals to receive

reliable assistance and navigation support in a variety of complex environments [9].

3.4. System limitations and possible improvements

Although SLAM technology is capable of providing real-time environmental awareness, the

computational power necessary to process intricate environmental data can result in delays that impact

the real-time performance of the system. To illustrate, in a complex or dynamic environment, the system

is required to process a substantial quantity of sensor data, which encompasses images, depth

information, and inputs from additional sensors. The fusion and processing of this information

necessitates the utilisation of robust computational resources. In the event that the processing velocity

is unable to align with the rate of environmental alteration, the system's response time is prolonged,

which subsequently impacts the real-time performance and user experience [7].

In particular, some systems may require longer response times to detect and prioritise obstacles. This

is because the system must not only detect all potential obstacles, but also determine which obstacles

pose the greatest threat to the user, based on factors such as their distance, size and direction of

movement, and provide feedback accordingly [2]. This complex computational process is prone to

latency when running on highly loaded processors, and the real-time performance requirements of these

systems often require a trade-off between computational speed and battery life. High-performance

computing devices can significantly increase power consumption, which in turn shortens the life of the

device and affects its portability and usefulness [7].

Visual SLAM techniques may not perform well in certain complex or dynamically changing

environments, such as crowded or dramatically changing lighting environments, where the accuracy and

reliability of sensor data can be affected and fluctuate. In these environments, sensors may receive large

amounts of noisy data or incomplete information, resulting in increased errors in map construction and

localisation. For example, in crowded places, fast-moving people and other dynamic objects can

interfere with the sensor's data collection, making it difficult for the visual SLAM system to accurately

locate and map the environment. Scenes with drastic changes in lighting, such as moving from bright

outdoor areas to dimly lit indoor areas, can also affect the performance of cameras and other optical

sensors, leading to inaccuracies, omissions and loss of data [6].

While the integration of SLAM technology with wearable devices offers significant advantages, there

are potential challenges associated with user adaptation. For visually impaired individuals, specialized

training may be necessary to fully leverage the capabilities of these technologies. As the systems are

complex to operate, users must invest time and receive instruction to master their use. Furthermore,

these devices necessitate periodic updating and maintenance to ensure optimal performance.

Consequently, these learning and adaptation processes may impede the extensive adoption of these

systems by visually impaired users [9].

4. System optimisation and development trends

The future includes the development of more efficient algorithms and the use of more advanced

hardware such as dedicated processing units (e.g. GPU) and low-power processors. These improvements

can increase the computational speed of the system and reduce latency, thereby improving the

performance and reliability of the system in a real-time environment [2,7].

Multi-sensor fusion methods, such as the combined use of cameras, Light Detection and Ranging

(LIDAR) and ultrasonic sensors, should be used to improve the accuracy and reliability of the data. This

can compensate for the shortcomings of a single sensor in different environmental conditions. For

example, LiDAR can provide highly accurate distance measurements and work well in low-light

conditions, while ultrasonic sensors are good at detecting nearby obstacles. It is also important to

develop smarter algorithms - algorithms that are better able to dynamically adapt and correct sensor data

to more complex environmental changes. For example, using deep learning techniques to process sensor

data can significantly improve a system's performance in complex environments, enabling it to detect

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0192

102

and filter noisy data more effectively and improve the reliability and accuracy of environmental sensing

and obstacle detection [7, 10].

5. Conclusion

This paper examines the problems of combining a visual SLAM system with wearable devices to assist

visually impaired people to avoid obstacles in unfamiliar, unknown and complex environments, and to

assist in the correction and elimination of potential hazards based on their autonomous path planning

combined with the analysis of data detected by the system. In order to be able to contribute to the design

of a visually impaired travel and avoidance navigation system based on the combination of visual SLAM

and wearable devices, the feasibility of the system combination is examined, the workflow of the system

is simulated, its benefits in helping visually impaired people to travel and avoid obstacles are examined.

Also, its limitations in terms of algorithms and sensors in the face of complex and changing

environments are presented, and future development trends and corresponding areas of optimization are

proposed.

References

[1] Bai J, Liu Z, Lin Y, Li Y, Lian S, Liu D. Wearable Travel Aid for Environment Perception and

Navigation of Visually Impaired People. Electronics. 2019; 8(6):697. https://doi.org/10.3390/

electronics8060697

[2] Chen Z, Liu X, Kojima M, Huang Q, Arai T. A Wearable Navigation Device for Visually

Impaired People Based on the Real-Time Semantic Visual SLAM System. Sensors. 2021;

21(4):1536. https://doi.org/10.3390/s21041536

[3] Z. Wang, H. Liao, Z. Jia and J. Wu, "Semantic Mapping Based on Visual SLAM with Object

Model Replacement Visualization for Cleaning Robot, " 2022 IEEE International Conference

on Robotics and Biomimetics (ROBIO), Jinghong, China, 2022, pp. 569-575, doi: 10.1109/

ROBIO55434.2022.10011717.

[4] Francisco J. Perez-Grau, J. Ramiro Martinez-de Dios, Julio L. Paneque, J. Joaquin Acevedo,

Arturo Torres-González, Antidio Viguria, Juan R. Astorga, Anibal Ollero, Introducing

autonomous aerial robots in industrial manufacturing, Journal of Manufacturing Systems,

Volume 60, 2021, Pages 312-324, ISSN 0278-6125, https://doi.org/10.1016/j.jmsy.2021.06.

008.

[5] Xu, P., Van Schyndel, R., & Song, A. (2023, June). Smart Head-Mount Obstacle Avoidance

Wearable for the Vision Impaired. In International Conference on Computational Science (pp.

417-432). Cham: Springer Nature Switzerland.

[6] Ou, W., Zhang, J., Peng, K., Yang, K., Jaworek, G., Müller, K., & Stiefelhagen, R. (2022, July).

Indoor navigation assistance for visually impaired people via dynamic SLAM and panoptic

segmentation with an RGB-D sensor. In International Conference on Computers Helping

People with Special Needs (pp. 160-168). Cham: Springer International Publishing.

[7] Asiedu Asante, B. K., & Imamura, H. (2023). Towards robust obstacle avoidance for the visually

impaired person using stereo cameras. Technologies, 11(6), 168.

[8] Rahman M, Khadem M, Siddiquee MM, et al. SLAM for Visually Impaired People: a Survey.

arXiv. Published online December 9, 2022. Available at: https://arxiv.org/abs/2212.04745.

Accessed July 27, 2024.

[9] Joseph AM, Kian A, Begg R. State-of-the-Art Review on Wearable Obstacle Detection Systems

Developed for Assistive Technologies and Footwear. Sensors. 2023; 23(5):2802. https://doi.

org/10.3390/s23052802\

[10] Zhang Z, Lin F, Wu T. Multi-Sensor Fusion for SLAM in Dynamic Environments: A Survey.

Sensors. 2022;22(4):1356. doi:10.3390/s22041356.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0192

103

The research on the factors affecting the World Happiness

index

Yizhi Zong

Cardiff Sixth Form College, Cardiff, CF24 0AA, United Kingdom

katherine.zong@ccoex.com

Abstract. The purpose of this study was to use data from the World Happiness Report to conduct

an in-depth analysis of the factors that influence the World Happiness Index (WHI) and to look

for other factors that may influence happiness. In this paper, the correlation graph shows that

among the original six variables, Social Support has the strongest correlation with the ladder

score, which means that Social Support has the greatest impact on the happiness index, followed

by GDP per capita; Generosity, on the other hand, was the weakest associated with ladder scores

and had the least effect on happiness. Then the linear regression and scatter plot are used to prove

this conclusion. Therefore, this paper can consider whether to delete Generosity as an influential

factor. A map was then used to show happiness levels and geographical distribution in different

countries. At the same time, the distribution of the Gini coefficient is also shown by a regional

distribution map. From the perspective of the Gini coefficient and education level, these two

factors also have a certain positive correlation with the happiness index, which are likely to be

the potential influencing factors of the happiness index. From a global map perspective, the

happiest countries are mostly located in Europe, North America and Oceania, while the happiest

countries are mostly located in Africa. The study's sample size was not large enough, it was

prone to make errors and raise questions about happiness scores in some countries.

Keywords: World happiness, happiness index, positive correlation, GDP per capita, social

support.

1. Introduction

In the rapidly changing world, understanding the factors that influence happiness has become an

increasingly important and complex research topic. As for happiness itself, it is a pluralistic and

relatively subjective concept, with different interpretations from different disciplines or different

theories. Happiness presupposes an evaluative stance concerning one period of one's life or one's own

life as a whole [1]. Since the 1960s, many philosophers, thinkers and scientists have carried out relevant

research on happiness, and people's understanding of happiness has become deeper and deeper.

The World Happiness Report is a publication that contains reports and rankings on the happiness of

countries and the correlation between various factors. The first World Happiness Report was published

in 2012, and Finland has been named the happiest country in the world seven times in a row until 2024.

The report is based on data from the Gallup World Poll, in which people are asked to rate their quality

of life on a scale of 0-10, as well as questions related to their happiness assessment. These scores were

used to produce a happiness index for each country. Gallup also looked at a variety of quality-of-life

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0202

(https://creativecommons.org/licenses/by/4.0/).

104

factors and analyzed how they correlated with happiness [2]. So far, the World Happiness Report has

adopted six variables, namely GDP per capita, Social support, Healthy life expectancy, Freedom to make

life choices, Generosity and Perceptions of corruption. But the World Happiness Index is not calculated

from the data of these six variables, the happiness index of 2024 is calculated by calculating the average

happiness index of 2021-2023, which is very strange and confusing.

According to the World Happiness Report 2024, overall subjective happiness has improved and

increased in countries around the world. However, regional differences in happiness remain large, with

large gaps between developed and developing countries [3]. Since 2006-2010, happiness has declined

in the Middle East and North Africa region and increased in Central and Eastern Europe and East Asia.

In North America and South Asia, happiness declined among young people and increased among older

people. The report also highlights that happiness inequality has increased across all regions, especially

in sub-Saharan Africa. In addition, the report discusses the face challenges of older people, such as

dementia. And methods and measures to improve well-being, such as improving the environment and

behavioural strategies. This year's World Happiness Report has added an analysis of the impact of

climate change, social justice and digitalization on happiness compared to previous years, which means

that there are still many factors affecting happiness to consider in addition to the original six variables.

Meanwhile, according to Gallup, unemployment, one of the most famous statistics, is surprisingly

absent from the World Happiness Index, which is calculated not by calculating the six known variables,

but by averaging happiness over the previous three years [4]. This algorithm is strange because

unexpected events such as the sudden outbreak of COVID-19 in 2019 can lead to a sharp increase in

unemployment, a sharp reduction in economic income, and a decrease in people's self-confidence, which

leads to a sharp reduction in happiness [5]. However, according to the World Happiness Report 2020,

the happiest countries, such as Finland, are even happier in 2022 (7.809) than in 2024 (7.741) [6]. The

two most affected economic powers, the United States and China, also show that they are happier in

2020 than in 2024. This is obviously unreasonable. According to Gallup and the Alliance for Happiness,

Finland is not the happiest country in the world, and the Alliance for Happiness considers more factors

than the World Happiness Report [7, 8]. Therefore, based on the World Happiness Report 2024, this

paper will analyze the influence degree of the original six variables on the happiness index by using data

analysis methods such as line chart, scatter chart and bar chart, and propose new influencing factors by

using scatter chart and map data visualization [9]. Finally, it will evaluate and improve the calculation

method and sample size of the happiness index.

2. Methodology

2.1. Data source

The data set on the World Happiness Index and 6 variables used in this article is mainly from the official

website of the World Happiness Report. The date is from 2024. The data set for the World Happiness

Report 2024 contains 144 data sets covering all countries surveyed. The raw data set is saved in.xls

format. The Gini Coefficient is based on a dataset from the Our World in Data website, which contains

2325 datasets covering countries from 1963-2023 (some data are incomplete). The original data set is

saved in.csv format. Data sets on education level/literacy come from Kaggle and the website "Our World

in Data", the original data set downloaded by Kaggle contains 193 data sets, and the original data set is

saved in.csv format. The mortality dataset is from the World Health Organization website. The original

dataset contains 6269 datasets covering all countries from 1987 to 2024 (some data are incomplete). The

original dataset was last updated on February 21, 2024, and was saved in.xls format [10].

2.2. Variable selection

Based on the original data, this paper will adopt the original 6 variables (GDP per capita, Social support,

Healthy life expectancy, Freedom to make life choices, Generosity and Perceptions of corruption), as

well as variables such as Income Inequality (The Gini Coefficient), Education level and Regions. The

specific descriptions of these variables are shown in Table 1:

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0202

105

Table 1. List of Variables.

Variable

Type of data

Range

GDP per capita

float

[0.0, 2.141]

Social support

float

[0.0, 1.617]

Healthy life expectancy

float

[0.0, 0.857]

Freedom to make life choices

float

[0.0, 0.863]

Generosity

float

[0.0, 0.401]

Perceptions of Corruption

float

[0.0, 0.575]

Gini Coefficient

float

[0.178, 0.658]

Education level

float

[2.207, 12.938]

Region

float

2.3. Method introduction

In this paper, scatter chart, bar chart and linear regression model are used to compare the impact of the

original six variables on the happiness index, and finally find the variable with the greatest impact. Then

scatter plot, linear regression and map data visualization were used to analyze the correlation between

other potential factors and happiness index. The general mathematical model for multiple linear

regression is:

󰇛󰇜 0 11 22   1312   (1)

In the above formula: is a constant term, and e is a residual term. In addition, this paper uses map

data to visualize the comprehensive happiness index of various regions. The general formula for R

square value is:

21

 (2)

3. Results and discussion

3.1. Correlation analysis

According to Figure 1, it can be seen that the Social support has the highest correlation coefficient with

Ladder Score, while Generosity has the lowest correlation coefficient with Ladder Score, and it is far

lower than the average correlation coefficient (0.59). The correlation between Log GDP per capita and

Healthy life expectancy and Generosity is low and negative. Log GDP per capita, Social support and

Healthy life expectancy have high correlation. It can be inferred that Generosity has a small effect on

happiness index, and whether it should be removed from the six variables is a question worth considering.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0202

106

Figure 1. Correlation results.

3.1.1. GDP and happiness index. According to Figure 2, the linear fitting formula for scatter data is:

Ladderscore 2586 2135 LogGDPpercapita, and the R-square value is 0.591. There is a high

linear positive correlation between per capita GDP and happiness score, and the higher the GDP, the

higher the happiness level.

Figure 2. Scatter plot of Log GDP per capita and Ladder score.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0202

107

3.1.2. Social support and happiness. As shown in Figure 3, the linear fitting formula for scatter data is:

Ladderscore 2260 2883 SocialSupport, and the R-square value is 0.662. Social support has

the highest linear positive correlation with happiness score. The higher the Social support, the higher

the happiness level.

Figure 3. Scatter plot of Social support and Ladder score.

3.1.3. Generosity and happiness index. As can be seen from Figure 4, the distribution of scatter data of

Generosity is much more dispersed than other factors, with the R-square value of only 0.017. This refers

to a low correlation between Generosity and Ladder Score, so consider replacing Generosity with other

factors.

Figure 4. Scatter plot of Generosity and Ladder score.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0202

108

3.2. The potential factors

3.2.1. Regions and happiness index. As can be seen from Figure 5, the countries with a high overall

happiness index are mostly located in Europe, Oceania and North America. South America and Asia are

in the middle of the pack, while Africa's overall well-being is low. It may be inferred that factors such

as geography and political system also affect happiness.

Figure 5. The World Happiness Index on the global map.

3.2.2. Gini coefficient and happiness index. As Figure 6 shows, inequality is highest in South Africa

and higher in Africa as a whole. At the same time, the inequality coefficient in South America is also

high, and the distribution of happiness index in the world is almost the same as that in Fig.5, which

proves that income inequality is also potentially related to happiness index.

Figure 6. Gini coefficient on the global map in 2019 [7].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0202

109

3.2.3. Education level and happiness index. These instructions apply to everyone, regardless of the

formatter being used. According to Figure 7, the average length of schooling in Africa is four years,

while in Asia and South America it is between eight and 10 years, and the regions with the highest

happiness index (Europe, Oceania, North America) have the longest schooling, averaging between 10

and 12 years. This also confirms that the longer the education, the higher the happiness index, confirming

that the level of education may also be one of the factors affecting the happiness index.

Figure 7. Years of schooling on the global map [8].

3.3. Objectivity and accuracy

According to the World Happiness Report website, the typical annual sample for each country is 1,000

people. If a typical country conducted a survey once a year, the sample size would be 3,000 people.

However, for populous countries such as China, the sample size of 3000 people is obviously far from

enough and has a large error. According to the World Happiness Rankings 2024, China ranks 60th on

the happiness index, but as the world's second largest economy, its happiness ranking results are

questionable.

Some countries' happiness figures look less than reasonable. According to research, Finland has

almost the highest rate of mental disorders in the European Union, with one in six Finns suffering from

mental health problems. Depression, anxiety and substance abuse are the most common mental health

problems [9]. However, as of 2024, Finland has been named the happiest country in the world seven

times. Therefore, Finland may not be the happiest country in the world, but it is the fact that Finns are

more likely to feel satisfied that leads to such a high Ladder Score [10].

4. Conclusion

Based on the World Happiness Report 2024 and related data, this study analyzes the impact of the

original six variables on the happiness index, analyzes the distribution, comparison and summary of the

happiness index in different regions, and finally evaluates the sample and questions the happiness index

of some countries.

In the analysis stage, this paper uses a scatter plot and correlation visualization to find out the degree

of correlation between 6 variables and the happiness index. In contrast, Social Support and GDP per

capita had the greatest impact on happiness, while Generosity had the least association with happiness.

To learn more, the study also visualized the distribution of happiness across different regions using map

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0202

110

data and found that countries with high happiness were more likely to be in Europe, North America, and

Oceania, while countries with low happiness were more likely to be in Africa.

Through these studies, people have gained a deeper understanding of the World Happiness Report.

However, there are still many shortcomings in this study, such as more detailed analysis of variables,

more factors considered, insufficient data and so on. To improve these issues, this paper needs to look

at more reports and data on world happiness and use control variable methods, linear regression, to study

possible relationships between happiness and different factors.

References

[1] Laura M, et al. 2017 Happiness Index Methodology, Journal of Sustainable Social Change, 9, 4-

31.

[2] Helliwell J F, et al. 2024. World Happiness Report 2024. University of Oxford: Wellbeing

Research Centre.

[3] Helliwell J F, et al. 2020 World Happiness Report 2020. New York: Sustainable Development

Solutions Network.

[4] Jon C and Blind S 2022 The Global Rise of Unhappiness and How Leaders Missed It, ISBN.

[5] Musikanski L and Bradbury J 2024 The Happiness Report Card 2024. Happiness Alliance happy

counts.

[6] Laura M, et al. 2017 Happiness Index Methodology. Journal of Sustainable Social Change, 9, 4-

31.

[7] Joe H 2023 Measuring inequality: what is the Gini coefficient. Working paper.

[8] Filmer D P, et al. 2018 Learning-Adjusted Years of Schooling (LAYS): Defining A New Macro

Measure of Education. Journal of Sustainable Social Change.

[9] OECD, European Observatory on Health Systems and Policies 2023. Finland: Country Health

Profile 2023.

[10] Swanson A 2015 How we provoked the wrath of some of the world's most perfect people.

Washington Post, 10.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0202

111

Quantum Entanglement and Qubit Interactions: The Key to

Quantum Supremacy

Han Zhang

Muir College, University of California San Diego, CA, USA

Haz043@ucsd.edu

Abstract. Quantum computing operates in a fundamentally different way from classical

computing by harnessing the principles of quantum mechanics to process information. Quantum

supremacy is achieved when a quantum computer can solve problems that are beyond the

capabilities of classical systems, including the human brain, showcasing its superior processing

power. To attain quantum supremacy, quantum entanglement and qubit interactions play a

pivotal role. Quantum entanglement occurs when qubits are interconnected in a manner where

the state of one qubit directly influences the state of others, enabling the quantum computer to

perform multiple operations simultaneously. Moreover, effective interaction between qubits is

essential for the performance of complex calculations in quantum systems, highlighting the

significance of coherence and error correction. Understanding the importance of coherence in

preventing and rectifying errors in quantum computations is crucial. This paper aims to explore

the critical aspects of quantum entanglement and qubit interactions, which are foundational to

the operation of quantum computers. By delving into these key concepts, the paper aims to

elucidate their significant roles in achieving quantum supremacy. The discussion will center on

how quantum entanglement, which allows enhanced computational parallelism through qubit

interconnection, and efficient qubit interactions vital for complex computations, contribute to

surpassing the capabilities of classical computers. Comprehending these principles is crucial for

advancing quantum computing technology and overcoming the challenges to unleash its full

potential.

Keywords: Quantum Computing, Quantum Supremacy, Quantum Entanglement.

1. Introduction

Quantum information science is dedicated to comprehending and achieving quantum supremacy, the

phenomenon where quantum computers consistently outpace classical ones. This idea, in turn, is based

on the premise that it is impossible, or at least highly inefficient for classical systems to simulate

quantum systems. In the past couple of decades, there has been a surge of interest in solving the

"quantum control problem." Efforts to develop sufficiently large, controllable, macroscopic systems that

exhibit purely quantum behaviours have intensified. But the logic behind these investigations follows

directly from their assumption that achieving quantum supremacy is a worthwhile goal. Indeed, it could

push the frontiers of physics into realms that have yet to explore. This paper investigates the essential

features of quantum entanglement and qubit interactions, which form the basis of quantum computing.

The first topic this paper cover is quantum entanglement. When qubits are connected in such a way that

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0156

(https://creativecommons.org/licenses/by/4.0/).

112

the state of one qubit directly affects the state of the others, comes the entanglement. That's a pretty good

beginning to understanding the principles of quantum mechanics [1,2].

From there, this paper go into the principles of superposition, which with entanglement and

interaction, allows a quantum computer to perform operations in parallel and solve problems—all at an

incredible speed. Systems of quantum computing require a delicate balance [1,2]. The interactions

between qubits must be controlled as precisely as those in any well-coordinated orchestra. Each qubit

must "do its part" without error, and all must "stay coherent" long enough to perform an undesirable

calculation—one that even the fastest classical computers can find impractical. And yet this precision is

necessary not just for meaningful calculations, but also for the performance of any calculations at all.

Next, it will examine what qubit interactions mean for the "hardness" of calculations performed in a

quantum system. After that, the paper will discuss various types of qubits, some of which are more

promising than others for producing a precision computing system.

This paper will also underscore the need for clear signalling and "error-free" messaging for quantum

supremacy to be achieved. For any two qubits to maintain their "next door neighbour" relationship, they

must be coherent; that is, they must interact in a controlled fashion over a range of distances and over

an adequate number of time steps, or computational "depth." This is not an easy requirement to meet

and is arguably the most significant barrier to building large-scale quantum computers [3].

2. Fundamentals of Quantum Entanglement

The study of quantum non-locality began in 1935 with some early experiments and theoretical

developments. A significant milestone was the formulation of the Einstein-Podolsky-Rosen (EPR)

paradox. In their 1935 paper, Einstein, Podolsky, and Rosen used entanglement to question the

completeness of quantum mechanics. They thought up a situation in which the measurement of an

entangled particle's position or momentum would instantaneously determine the position or momentum

of another entangled particle, no matter how far apart the two had been separated. This led to their

argument against "spooky action at a distance" and their conclusion that quantum mechanics must be

incomplete. Since then, the EPR paradox has been a huge driver of the study of entangled systems [4,5].

The local hidden variable concept subsequently took a hit from an unexpected quarter, John Bell, in

the 1960s, when he shone a light of insight on it and laid it open for inspection. Bell's theorem expressed

in simple but incisive terms what had long been suspected: if local hidden variables exist, then quantum

mechanics is wrong; it does not correctly describe the world. Indeed, Bell and his followers in the 1970s

and '80s conducted a most kind of mock trial. They pitted local hidden variables against quantum

mechanics itself, a courtroom drama that invariably had the same outcome: the jury of experimenters in

the '70s and '80s declared for quantum mechanics [4,6].

Entangled states are defined in such a way that measuring the position of one particle gives you the

position of the other particle with pinpoint accuracy. Measuring the momentum of one particle gives

you the momentum of the other particle with close to 100% certainty. Quantum non-locality is intimately

connected to entanglement and is one of the key principles used to illustrate the "weirdness" of quantum

behavior. Yet, at the same time, quantum non-locality is closely related to the concept of superposition,

which is probably the most fundamental principle of quantum mechanics. Indeed, a quantum system

exists in a kind of "schizophrenic" state, in which it simultaneously occupies several states or

configurations, until the system is measured. And this principle is crucial for understanding the behavior

of entangled particles [4,5].

The superposition collapses when the property of one particle is measured, defining the other

particle's property instantaneously, no matter how far apart they are. This is not an analogy but a fact:

when particles have correlated properties in a state of superposition, they behave just like waves in a

fountain—exactly like waves in a fountain—when you measure one wave to result in a certain height.

Your measurement instantaneously sets the water in a certain fountain state, defining changes in height

all across the fountain from your "this wave, not that wave" measurement. The elegant equations of

quantum mechanics can tell you not only that certain states are superposed but also the specific "height"

your wavefunction is likely to take when you make that measurement [4,5].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0156

113

An electron might have two distinct velocities or exist in two separate locations at the same time.

This is what quantum mechanics tells us, and it is what quantum correlations are all about—correlations

that are stronger than the strongest classical correlations and that can be produced only if it’s involved

within the entangled states. In classical mechanics, if there are two correlated particles, it can be

explained their correlation by saying that they have a shared history or direct interaction. But in quantum

mechanics, especially in a world of entangled states, it is no longer valid to persist with the idea that

superposition states are either "real" or "not real." Correlations between particles in an entangled state

cannot be understood from our classical intuition of the physical world. The particles do not possess

definite states until the state is measured. When the state of one of the entangled particles is measured,

the state of the other particle—regardless of the distance between the two—is instantaneously

determined. These experiments have shown us that "spooky action at a distance" is a real phenomenon.

How this can happen is one of the great unsolved mysteries of physics. For practical applications,

understanding this phenomenon is crucial to the development of new technologies based on quantum

mechanics. Matthew Hayward discusses in his paper "Quantum Computing and Shor's Algorithm" the

propitious part that entanglement plays in quantum computing. He elucidates that entanglement is a

crucial ingredient in the resource-based recipe for not just various quantum algorithms but also error

correction methods. These are quantum computing's "baking a cake" moments, and they're well beyond

the fundamentals of quantum physics. Yet even in the early 1990s, Hayward notes, it was evident that

quantum computers could do certain things much better than classical computers. Thus, the potential of

quantum computing became clear [6,7].

In 1994, a researcher named Peter Shor, who worked at Bell Labs, brought forth an astounding

development. He introduced a polynomial-time algorithm for factoring large numbers. The key element

here is "polynomial-time." A classical computer would take what may as well be an eternity to factor

the numbers Shor worked with. Conversely, a quantum computer can use "superposition," the basic

principle of quantum mechanics that allows a particle (an electron, say) to exist in multiple states

simultaneously, to more efficiently arrive at the answer of "undoubtedly this, or surely that." Once a

quantity can be factored, the computer employing Shor's algorithm can use the factor or factors to

reconstruct the original problem's solution. number... (with a representation of  bits) operates in

󰇛󰇛 󰇜

  

󰇜, which is exponential time. In contrast, Shor's algorithm runs in

󰇛󰇛 󰇜  󰇜on a quantum computer and requires an additional 󰇛 󰇜 steps of post-

processing on a classical computer. In summary, the algorithm operates in polynomial time, which is a

significant advancement. Shor's algorithm has thus renewed and reinvigorated interest in quantum

computing, considering that it could upend not only encryption but also a whole range of computational

problems that require a similar sort of number-crunching prowess (they're mostly about multiplying and

dividing large numbers) and that working in classical computing amounts to using a finite number of

bits of memory and a finite number of steps to do that work. ... Two numbers are coprime if their greatest

common divisor amounts to 1. A classical computer can calculate many such values only in a snail's

pace because it cannot do them in parallel the way a quantum computer can, using its conservation of a

"somewhat limitless" set of states to "achieve" the same series of operations in fewer steps [8].

Since F(a) is periodic, it has a period r, and 0mod n = 1 (because 0=1), and thus mod n = 1,

2mod n = 1, and so forth. Given this periodicity and through algebraic manipulation:

1 mod n, (1)

󰇛2󰇜2 1 mod n (2)

If r is an even number:

󰇛21󰇜󰇛21󰇜 0 mod n. (3)

The product 󰇛  󰇜󰇛  󰇜 is an integer multiple of n. So long ≠1, at least one of :

󰇛  󰇜or 󰇛  󰇜 shares a nontrivial factor with n. Shor's algorithm does this cleverly, using a

quantum memory register that has two parts. It first places a superposition of integers 0 to q-1 in the left

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0156

114

side of the register, where q is a power of 2 such that ≤ q <  (this is necessary so that it is working

in an appropriate finite field). The 0s and 1s in the left register correspond to the a's in the function 

mod n. The right side of the register is set up to hold the result of whatever function is calculated from

the a's in the left side of the register [8].

The algorithm proceeds to calculate  mod n and keep that in the second part of the register. The

number n is represented by a log n bit string.  mod n has to be calculated an exponentially large

number of times relative to the length of the input but polynomial in n. After that, the second register

and k collapses out of it into a specific value can be measured. This measurement also projects the first

the first one has a superposition of base states that evaluate in such a way that they give k when taken

mod n [8].

Thanks to the periodic nature of  mod n, the first part of the quantum register holds probability

amplitudes for the numbers c, c+r, c+2r, and so on, where c is the smallest integer such that  mod n =

k. The next step is to apply the Fourier transform to the first part of the register. The Fourier transform

amplifies the probability amplitudes for integer multiples of q/r, where q is the size of the first part of

the register. When the first part of the register is measured after the Fourier transform, it will likely yield

a multiple of the inverse period. A classical computer then decodes the instruction held in the quantum

memory to yield the period, and from that, the factors of n [8].

3. Qubit Interactions and Their importances

In quantum computation, qubit interaction is fundamental to information processing. This is best

understood by comparing qubit interaction to the two basic ways human beings can communicate. Direct

interaction is like two people talking face-to-face, influencing one another directly through means like

the electric or magnetic fields one person generates around himself or herself. Indirect interaction is like

two people talking with a friend in between. The friend can tell either of the other two what the other

has said without any direct influence from one of the talkers to the other. In indirect interaction, two

qubits influence one another without direct physical contact. And these are the two basic ways qubits

can interact in what named "quantum circuits.[9-11]"

Superconducting qubits have long been viewed as the leading platform for large-scale quantum

computing due to their favorable coherence properties, ease of fabrication, and potential for integration

within a highly scalable architecture. They are at heart Josephson junctions, which are non-linear

inductors, and rely on the quantization of magnetic flux in superconducting loops. Despite these

advantages, the primary challenge to building a fault-tolerant quantum computer with them has unfolded

as their operational speed—what physicists refer to as the reset time—has not kept pace with the ability

to implement error correction, a strictly required feature of any large-scale quantum processor.

Individual ions that are confined and manipulated by electromagnetic fields serve as qubits, the basic

units of information in a quantum computer. In terms of operations and coherence—that is, the ability

of the system to maintain a superposition of quantum states—trapped ions come very close to perfection,

and Häffner's group is working to make them the most robust, error-correctable system of qubits. This

promise has led to a sharp increase in the number of research groups working with trapped ions.

According to Häffner's journal, "Quantum Computing with Trapped Ions," about six groups were doing

this work in 2000, and over 25 by 2008 [9-11].

In 1995, Cirac and Zoller suggested using groups of ions as the basis of a quantum computer. They

introduced what one might call the "user manual" for establishing a quantum logic gate using an

elementary operation similar to one performed in an ordinary logic gate: a conditional phase shift. This

is like saying "if…then…" in a digital operation. Their ideas were notional, involving the motion of ions

in potential energy wells created by carefully managed electromagnetic fields. But these ideas were put

to work in what is essentially an industrial-strength laboratory—one belonging to David Wineland and

his group at the National Institute of Standards and Technology. They were able to demonstrate a couple

of elementary two-qubit operations and entangle up to four ions. The development of ion trap

technologies has been greatly assisted by targeted research projects in Europe. These microfabricated

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0156

115

devices are becoming more and more sophisticated and could soon surpass the state-of-the-art

superconducting qubit technologies. Meanwhile, quantum dots—tiny semiconductor particles that

confine electrons—also hold great promise. The early 1980s saw the advent of a new form of lithography

that allowed scientists to create structures that confine electrons in very small spaces. These structures

are so small that they exhibit "quantum" behaviour. Quantum dots are tiny in size but are robust enough

in their electronic behaviour to serve as the basic building blocks of several proposed quantum

computing architectures [9-11].

Texas Instruments created the first quantum dots, 250 nm in size, using lithography. AT&T Bell Labs

and Bell Communications Research later produced even smaller dots, 30-45 nm in diameter. Because

the confined electrons in these dots behave similarly to those in atoms, scientists refer to them as

"artificial atoms." In quantum dots, scientists have precise control over shape, size, and number of

confined electrons, making these nanostructures highly valuable for studying complicated physical

phenomena and for observing quantum effects in crystals. Researchers are particularly interested in the

optical and electrical properties of quantum dots. Fundamental research and technological advances

stand to benefit greatly from the use of quantum dots. Reed and his colleagues developed the original

method for producing them. This technique involves using a structure whose essential component is a

two-dimensional electron gas. The process starts with a sample that has one or more quantum wells. A

polymer mask covers the sample's surface, followed by partial exposure to an electron or ion beam; the

beam does not use light because high resolution is required. The exposed areas of the polymer do not

change much; they remain mostly unchanged except for the "magic" of going from the mask to the

sample. These areas receive a metal deposition. When the sample is done being worked on, the

remaining mask is removed, and voilà! The sample has only the metal layer in certain areas, on the

surface, and it is clean [9-11].

Areas unprotected by the metal mask are removed using chemical etching, which undercuts the last

quantum well and the buffer layer. The pillars left behind are ten to 100 nanometres in diameter, and

contain the fragments of a quantum well. A base of chromium-doped GaAs beneath the last quantum

well feeds in carriers. The carriers flow into the twenty GaAs quantum wells above. The etching creates

a structure in which the flow of carriers is well controlled, with a remaining gold mask acting as an

electrode. By applying a voltage between the mask and the base, one can control the number of carriers

in the structure.

Finally, the use of photons as qubits in photonics makes them very effective for quantum

communication. This is because photons exhibit very little interaction with their environment; in other

words, they are very "quiet" in that the states they occupy do not change much when they are subjected

to various environmental conditions. For this reason, photons are able to maintain their "quantum-ness"

for a long time and travel long distances with minimal decoherence and, hence, no significant drop in

signal strength. This characteristic makes using photons and fiber optics very attractive for constructing

quantum networks, because the use of signals in the form of light—that is, in the form of photons—will

make these signals very secure and very hard to eavesdrop on.

In addition, photonic networks can be readily joined with current fiber-optic infrastructures, enabling

the actual deployment of quantum networks. Optical technologies that people have in hand, and know

well how to use, can efficiently create, manipulate, and detect the quantum carriers of the information—

photons. These are "light" networks in a very real sense; the quantum states of the photons are used to

encode and process the information. And the properties of the photons themselves allow us to think of

new ways to encode that information, which supports the development of protocols for much more

complex, and much more powerful, quantum networks.

To conclude, employing photons as qubits in photonic systems builds a strong and scalable basis for

quantum communication and offers a potential future for sending information that is both secure and

efficient.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0156

116

4. Achieving Quantum Supremacy

The key moment when a quantum computer surpasses a classical computer in terms of raw capability

occurs when quantum phenomena like entanglement, superposition, and interference are used to create

speed, capacity, and error-protected pathways that enable solving extremely difficult problems. These

problem-solving pathways are evident in what is termed quantum supremacy – the moment when a

quantum computer can solve a computationally hard problem in a significantly shorter time and with far

fewer steps than a classical computer can. For instance, Google's Sycamore performed a hard

computation on a "quantum volume" of 64, with an error rate well within tolerances, in just 200 seconds.

When you consider that the same task would take an estimated 10,000 years on a classical supercomputer,

you gain a sense of what might be referred to as a quantum moment. Decoherence is one of the primary

barriers to achieving quantum supremacy [12,13].

Using the two-slit experiment, scientists can demonstrate interference from the type of systems that

might potentially achieve quantum supremacy. However, in attempting to create more complex systems

that can perform computations, scientists must be wary of decoherence impeding their efforts. The two-

slit experiment also serves as a useful metaphor for considering how much advancement we have made

toward true quantum computers. In the two-slit experiment, a beam of particles is detected at a second

screen after passing through the first screen with two slits. Probability-wise, using classical physics, one

might expect a distribution of particles on the second screen that resembles the two-slit setup itself. The

duration for which a quantum sensor can maintain coherence determines its sensitivity. The nature of

decoherence – and its impact on the operation of quantum devices – runs parallel to the processes that

ordinary sensors go through. When you make something capable of sensing a specific measurement in

an ordinary way, you have to work really hard to find and fix the errors that the device makes as it goes

through its ordinary life. The same goes for quantum devices, except that you must find and fix errors

that occur in a parallel universe before they can affect the practical existence of ordinary objects [12,13].

Quantum devices are currently built in two ways that I know of: in an environment where

decoherence doesn't happen, and in a space where we're transitioning between bits. When sensing using

a qubit, you're working under an umbrella that keeps noise from the outside world from affecting the

qubit. The more you listen using any of the strategies above, the more you cancel out the noise from

inside the generally noisy quantum circuit and the outside world.

5. Conclusion

Achieving quantum supremacy is not just a matter of stuffing a bunch of superconducting qubits at a

frigid temperature in the right place and hoping for the best. You have to make those qubits interact in

a very particular way. Entanglement is key. While a classical computer might perform a calculation in

two steps, a quantum computer could do the same task in parallel and in half the time, with the qubits

or entangled pairs of qubits, sort of half-seeing each other and swapping states. That's the working theory,

anyway. Google's Sycamore processor purportedly achieved this effect with 53 superconducting qubits

last year, in a landmark demonstration of quantum supremacy. The processor performed 200 seconds of

quantum entangled time on a problem that a classical supercomputer would take 10,000 years to solve.

The challenges of realizing the full potential of quantum computing are the sorts of problems that

physics departments live to solve. They are hard, and they require imaginative solutions. Imagine, for

instance, trying to ensure that the basic unit of quantum computing, the qubit, remains in the fragile

quantum state needed to perform a long series of calculations. If it can hardly be done with the five or

six atoms that some groups have used to represent a single qubit, how is it going to be done with the

tens or hundreds of qubits needed for any useful computation?

There are many different strategies to create and manipulate qubits, with each offering distinct

advantages and facing its own unique challenges. For example, superconducting qubits are known for

their rapid operation speed, whereas qubits made from trapped ions afford much greater precision and

much longer coherence times. On the other hand, quantum dots have the potential for extremely high

integration densities, and qubits based on photons might be the best bet for building a quantum

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0156

117

communicator, given how well they interact with one another and how poorly they interact with their

environment.

To conclude, the quest for quantum supremacy is both impressive and intimidating. It has seen some

great successes in the lab but has also faced some serious challenges. At the center of this work is the

use of a basic element of quantum mechanics—quantum entanglement. The interplay of entanglement

and qubit interactions is at the heart of the supremacy argument, and working with these systems is at

the heart of the development of a useful quantum computer.

References

[1] Preskill J 2012 Quantum computing and the entanglement frontier 25th Solvay Conf.

[2] Harrow A W and Montanaro A 2017 Quantum computational supremacy Nature 549 203

[3] Achieving Quantum Supremacy 2019 The Current, news.ucsb.edu/2019/019682/achieving-

quantum-supremacy (Accessed 1 July 2024)

[4] Methot A A and Scarani V 2007 An anomaly of non-locality Quantum Information and

Computation 7 1–2

[5] Einstein B, Podolsky N and Rosen N 1935 Can quantum-mechanical description of physical

reality be complete? Physical Review 47 777–80

[6] Bell J S 1964 On the Einstein-Podolsky-Rosen paradox Physics 1 195–200

[7] What Is Superposition and Why Is It Important? Caltech Science Exchange, scienceexchange.

caltech.edu/topics/quantum-science-explained/quantum-superposition (Accessed 2 July 2024)

[8] Hayward M 2008 Quantum computing and Shor’s algorithm Sydney: Macquarie University

Mathematics Department 1

[9] Devoret M H, Wallraff A and Martinis J M 2004 Superconducting qubits: A short review arXiv

preprint cond-mat/0411174

[10] Haffner H, Roos C F and Blatt R 2008 Quantum computing with trapped ions Phys. Rep. 469

155–203

[11] Jacak L, Hawrylak P and Wojs A 1998 Quantum Dots Springer

[12] Bacciagaluppi G 2020 The Role of Decoherence in Quantum Mechanics The Stanford

Encyclopedia of Philosophy (Fall 2020 Edition) Edward N Zalta (ed.) https://plato.stanford.

edu/archives/fall2020/entries/qm-decoherence/

[13] Salhov A, Cao Q, Cai J, Retzker A, Jelezko F and Genov G 2024 Protecting Quantum Information

via Destructive Interference of Correlated Noise Phys. Rev. Lett. 132 223601

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0156

118

Quantum Neural Networks: A New Frontier

Boyu Zhang

The Hong Kong Polytechnic University, Hong Kong, China

314chirs271@gmail.com

Abstract. In recent years, there has been remarkable progress in improving the availability of

resources and refining algorithms for quantum computing. Since the late 1980s, the scientific

community has been fascinated by the idea of harnessing quantum phenomena to tackle

computational problems. This article provides a comprehensive exploration of the foundational

theories and practical applications of quantum neural networks (QNNs), highlighting their

potential to transform machine learning through unique features like quantum parallelism and

entanglement. It delves into various QNN architectures, such as quantum circuits and hybrid

quantum-classical models, showcasing their effectiveness in handling intricate computational

tasks more efficiently than traditional neural networks. Furthermore, the article examines the

current challenges and future prospects in this rapidly advancing field, emphasizing the pivotal

role of QNNs in driving forward research in both quantum computing and artificial intelligence.

Quantum neural networks are poised to not only enhance computational capabilities but also

pave the way for groundbreaking innovations in diverse technological domains.

Keywords: Quantum Neural Networks, Machine Learning, Quantum Computing.

1. Introduction

Quantum Neural Networks (QNNs) are a new paradigm in machine learning due to the convergence of

quantum computing and neural networks. Image recognition, natural language processing, and game

play are some of the domains where traditional neural networks have achieved remarkable success.

However, they are limited by the inherent constraints of classical computation, particularly in handling

exponentially large data spaces and complex optimization problems.

Quantum computing, which has superposition, entanglement, and quantum parallelism, provides a

viable alternative to these limitations. By leveraging quantum mechanics, QNNs have the potential to

perform computations that are infeasible for classical systems, enabling significant advancements in

speed and efficiency.

The objective of this paper is to give a complete overview of QNNs, beginning with their theoretical

underpinnings and extending to practical implementations. We will explore different QNN architectures,

including fully quantum and hybrid quantum-classical models, and examine their performance on

various machine learning tasks. Additionally, we will address the current challenges in the field, such

as error correction, decoherence, and scalability, and propose potential future research directions.

By bridging the gap between quantum computing and artificial intelligence, QNNs represent a

transformative step towards the next generation of intelligent systems. This paper seeks to highlight their

importance and potential impact, providing a roadmap for researchers and practitioners in both fields.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0157

(https://creativecommons.org/licenses/by/4.0/).

119

2. Conceptions

2.1. Quantum Computing

Quantum computing is heavily reliant on the quantum bit, or qubit, which is the fundamental unit of

quantum information. Classical bits can only be one of the 0 or 1 states. However, in quantum computing,

information can be recorded as |0, |1, or quantum states which use them as base vectors. Two-

dimensional complex Hilbert Spaces can be used to represent qubits.

In classical computing, a bit occupies a single state at any moment. Conversely, in quantum

computing, a qubit can simultaneously exist in state 0, 1, or any linear combination of them. When

measured, this superposition collapses, with the final state determined by the probability distribution of

qubit states. Quantum superposition thus allows qubit to be in multiple states at once until measurement.

Figure 1. The Bloch Sphere Representation of a Qubit State [1]

As Figure 1 shows, this is visually represented on the Bloch sphere, where a qubit's state is depicted

as a point on the surface of the sphere. The position of this point is determined by the angles θ and φ,

which correspond to the probabilities of the qubit being in a particular state. The Bloch sphere

representation is particularly useful for understanding quantum operations and the effects of quantum

gates on qubits, as it provides a clear geometric interpretation of these complex quantum phenomena.

When two or more particles are linked, quantum entanglement is a phenomenon where one qubit's

state is dependent on the state of the other qubit. All the other qubits in an entangled system are affected

if one qubit's state changes.

Quantum gates form the foundational components of quantum circuits. These gates modify the states

of qubits and are usually depicted by unitary matrices. Owing to quantum mechanical principles like

superposition and entanglement, quantum gates can execute intricate operations.

Figure 2 lists some basic quantum gates. The Pauli-X gate (NOT gate), flips the state of a qubit. It

changes |0 to |1 and |1 to |0. The Pauli-Z gate applies a phase flip, which leaves |0 unchanged and

maps |1 to -|1. The Hadamard gate is a quantum gate that transforms a qubit into an equal

superposition of its basis states, creating a state where the qubit has an equal probability of being

measured as 0 or 1. The T Gate (π/8 Gate) applies a phase shift of /4, it leaves |0 unchanged and maps

|1 to |1. The CNOT gate flips the state of the target qubit if the control qubit is in the |1 state. It

is essential for creating entanglement between qubits. The SWAP gate can swap the states of two qubits.

If the first qubit is in state |a and the second in state |b, after the SWAP gate, the first qubit will be in

state |b and the second in state |a [2].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0157

120

Figure 2. Basic Quantum Gates and Their Matrix Representations [3]

Quantum circuits are composed of sequences of quantum gates. E Each quantum circuit is a quantum

algorithm that can solve complex problems with greater efficiency than classical algorithms. To achieve

desired quantum state transformations, quantum circuits must be constructed by ordering quantum gates

in a specifically.

The design of quantum circuits requires careful consideration of the order and type of gates used, as

each gate affects the qubits in a unique way. For instance, a phase shift introduced by a Z gate can alter

the phase relationship between qubit states, which is crucial for certain quantum computations like

quantum Fourier transforms. Furthermore, error correction protocols often incorporate additional gates

and ancillary qubits to protect against decoherence and other quantum noise, ensuring the reliability of

the circuit.

2.2. Classical Neural Networks

Classical neural networks (NNs) are the cornerstone of modern artificial intelligence and machine

learning. Neurons are the basic units of neural networks. Each neuron receives input, processes input,

and produces output. The basic structure of neurons includes input layer, weight, bias, activation

function and output. The weight determines the strength of the connections between neurons, while the

bias is used to adjust the weighted sum of the output and input.

The typical structure of a neural network consists of multiple layers: an input layer, one or more

hidden layers, and an output layer. The input layer receives and processes the raw data, the hidden layer

transforms this data through multiple operations, and the output layer provides the final output. These

networks can range from direct feedforward structures to more complex configurations such as

convolutional neural networks (CNNS) and recurrent neural networks (RNNS).

In the process of training the neural network, the weights and biases need to be adjusted to minimize

the output error. This is usually achieved by backpropagation. Optimization algorithm is also an

important part of neural network training. Gradient descent is the most commonly used technique by

adjusting the model parameters along the negative gradient direction of the loss function. Variants of

gradient descent, such as Stochastic gradient Descent (SGD), RMSprop, and Adam, offer improvements

in speed of convergence and stability [4].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0157

121

3. Quantum Neural Networks (QNNs)

Combining these two concepts, quantum neural networks (QNNs) represent the frontier of combining

quantum computing and neural networks, aiming to enhance computing power by quantum mechanics

principles.

Quantum neural networks (QNNs) integrate quantum computing principles into neural network

frameworks. Qubits can exist in superposition, representing both 0 and 1 simultaneously, and can also

become entangled with each other, creating intricate associations that classical neural networks are

unable to replicate. These features enable QNNs to perform parallel computing on an unprecedented

scale, providing significant acceleration on certain types of problems.

The fundamental architecture of a QNN is comparable to that of a conventional neural network, but

it employs qubits and quantum gates instead of conventional bits and logic gates. A typical QNN consists

of quantum neurons that process information by a unitary transformation that preserves the probability

amplitude. Common quantum gates in QNN include Hadamard gates, CNOT gates, and Pauli-X gates,

which are used to manipulate qubits to perform necessary calculations in the network.

Quantum neurons can represent and process information in ways that classical neurons cannot. For

example, the principle of quantum parallelism makes quantum neuron be able to process multiple input

states concurrently. The architecture of QNN can vary, but common models include quantum

feedforward neural networks and quantum convolutional neural networks [5].

Mathematically, if you consider a quantum neuron, it can be expressed as |ψout=U|ψin, where |ψin

is the input quantum state, |ψout is the output quantum state, and U is a unitary operator acting on the

input state.

Quantum states in QNNs allow for superposition, enabling parallel processing beyond classical bits.

Quantum gates manipulate these states to perform calculations. Entanglement links qubits, allowing

them to influence each other over long distances, creating highly interconnected networks that solve

complex problems more efficiently.

For instance, the application of a Hadamard gate (H) to a qubit in state  creates superposition as:

0  1

2󰇛01󰇜 (1)

This superposition state can then be entangled with another qubit using a CNOT gate, creating an

entangled pair:

󰇛1

2󰇛01󰇜0󰇜  1

2󰇛0011) (2)

The application of quantum states and operations in neural networks offers new opportunities for

solving problems in various domains, from optimization to pattern recognition.

4. Design and implementation of QNNs

Quantum neurons are the basic elements of QNN. They manipulate qubits and do calculations using

quantum gates. The design of quantum neurons involves defining unitary operations that can transform

the input quantum state into the desired output state. Quantum activation functions are similar to

classical activation functions, but need to adapt to the properties of quantum states, usually through

unitary transformations.

For example, a quantum neuron might use a combination of Hadamard and Pauli-X gates to create a

non-linear transformation: U=H



X, (H is Hadamard gate and X is Pauli-X gate). This combination can

create complex transformations necessary for processing quantum information.

The quantum layer of a QNN is composed of multiple quantum neurons. The input qubits are

processed by a set of quantum operations by each layer, converting them into output qubits. Quantum

weights are used to adjust the magnitude and phase of qubits to optimize network performance. Unlike

classical weights, quantum weights need to be managed in a way that preserves the coherence of the

quantum states.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0157

122

Mathematically, a quantum layer can be represented as |ψout=U2U1|ψin, where U1 and U2 are unitary

operators representing the transformations applied by the neurons in the layer.

Training QNN involves optimizing quantum weights to minimize errors in the network output. Due

to the nature of quantum data and operations, this process is much more complex than training classical

neural networks. The quantum training algorithm uses the superposition and entanglement

characteristics of quantum to search the parameter space efficiently and find the optimal solution.

Quantum gradient descent (QGD) is an adaptation of the classical gradient descent algorithm in

quantum systems. The process entails calculating the slope of the quantum loss function with respect to

the quantum weights and iteratively changing those weights to decrease the loss. The challenge is to

efficiently calculate these gradients while maintaining the coherence of quantum systems [6].

In QNN, the loss function L is defined as:

  

󰇛󰇜

Where 

 is the observable quantity relevant to the task. Gradient descent update rules are as follows:



1

󰇛󰇜 

 (4)

where η is the learning rate and wij are the quantum weights.

Quantum backpropagation is the quantum equivalent of a classical backpropagation algorithm. It

involves backpropagating the error gradient through the network to update the quantum weights. This

process uses quantum gates to calculate the gradient and make the necessary adjustments to the quantum

state [7].

Quantum backpropagation can be formulated using the adjoint of the quantum operations:

1 (5)

where  represents the error term at layer i, and  is the adjoint (inverse) of the unitary operator .

5. Advantages of QNNs

An immediate advantage of quantum computing is its potential speed. Qubits can exist synchronously

in superpositions of multiple states, so quantum computing can process data in parallel. In contrast,

classical computing requires processing each state sequentially. In QNNs, this parallel processing

capability is used to accelerate the training and reasoning process of neural networks.

According to some studies, quantum computing could theoretically achieve an exponential speed

increase when solving certain optimization problems. For example, the Shor algorithm is several orders

of magnitude faster than the best classical algorithms on prime factorization problems [8]. This means

that in the training of large data sets and complex models, QNNs can significantly reduce computation

time and thus improve efficiency. Results show that the quantum variational optimization algorithm

(VQA) is more efficient than the classical algorithm when dealing with complex optimization problems,

particularly in image segmentation. This efficiency boost is important for deep learning tasks that require

a lot of computing resources.

Another significant advantage of quantum computing is its energy efficiency. The parallel processing

capabilities of quantum computing make it consume much less energy than classical computing for the

same computational task. For example, quantum circuits can perform complex matrix operations with

low energy consumption, which is particularly important in large-scale neural network training. The

high energy efficiency of quantum computing not only helps to reduce energy consumption but also can

significantly reduce computing costs.

High-dimensional data processing is a key challenge in modern machine learning and data science.

When dealing with high-dimensional data, traditional neural networks often face the problem of

dimensional disaster, that is, the computational complexity increases exponentially with the increase of

data dimensions. Quantum computing can process data more efficiently in high-dimensional space.

The superposition property of quantum states allows qubits to represent multiple states

simultaneously, allowing for parallel computation in high-dimensional Spaces. For example, in quantum

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0157

123

states, a system of n qubits can represent 2^n states. This capability allows QNNs to significantly reduce

computation time when working with high-dimensional data.

Another advantage of quantum computing when dealing with high-dimensional data is its ability to

perform dimensionality reduction and feature selection operations efficiently. The optimal feature subset

can be quickly found in the high-dimensional space, thus improving the efficiency and accuracy of data

analysis. For example, techniques such as quantum state projection and quantum Fourier transform

(QFT) are widely used in quantum feature selection [9].

Quantum parallelism allows quantum neural networks to inspect multiple possible solutions

simultaneously, resulting in significantly improved computational efficiency. Quantum parallelism is

achieved through the superposition of qubits, allowing multiple computation paths to occur

simultaneously. This property is particularly important when training and reasoning large neural

networks.

Specifically, quantum parallelism can improve the performance of QNNs at multiple levels. For

example, during training, quantum gradient descent algorithms can compute multiple gradients

simultaneously, thus speeding up the convergence process. In reasoning, quantum parallelism can speed

up the prediction process and improve real-time processing power.

Quantum parallelism also plays an important role in optimization algorithms. Algorithms such as

quantum particle swarm optimization and quantum genetics, for example, greatly improve optimization

efficiency and accuracy by exploring multiple parallel solution Spaces.

6. Applications

Quantum Neural Networks (QNNs) represent a significant advancement in artificial intelligence by

integrating quantum computing principles with classical neural network frameworks. This synthesis

offers potential improvements across various domains, including image recognition, natural language

processing (NLP), financial forecasting, and bioinformatics.

Quantum Convolutional Neural Networks (QCNNs) leverage quantum computing's parallelism to

process image features simultaneously, enhancing accuracy and reducing computational demands.

Study [10] have shown QCNNs' superior performance in tasks like CT scan image classification,

demonstrating higher accuracy than classical CNNs.

In NLP, QNNs utilize quantum superposition and entanglement to manage complex linguistic

relationships, benefiting tasks such as sentiment analysis and machine translation. Research by

Ravikumar et al. [11] has indicated that QNNs improve processing speed and accuracy, especially with

large datasets.

QNNs' capability to handle extensive financial data enables more accurate market trend predictions

and risk management. El Bouchti et al. [12] and E. Paquet et al. [13] highlighted the efficiency of QNNs

in financial forecasting, with notable improvements over classical approaches.

In bioinformatics, QNNs enhance the analysis of biological data, such as genetic sequences. The

study by Tao et al. [14] introduced Quantum Bound, a hybrid neural network that integrates classical

and quantum elements, optimizing the analysis of complex biological datasets.

7. Conclusion

Quantum Neural Networks (QNNs) represent a significant leap in the fusion of quantum computing and

artificial intelligence, offering unparalleled computational capabilities. By leveraging quantum

superposition and entanglement, QNNs can execute complex calculations and data processing tasks

more efficiently than classical neural networks. This integration enhances the accuracy and speed of

large-scale data analysis, making QNNs valuable for applications in finance, healthcare, and other fields.

The development and implementation of QNNs require interdisciplinary collaboration across

quantum physics, computer science, and domain-specific expertise. Educational programs and industry-

academia partnerships are vital for advancing QNN research and ensuring practical application.

Promoting open science and data sharing can further accelerate innovation and prevent redundant efforts.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0157

124

Future directions for QNNs include the development of hybrid quantum-classical systems, improved

quantum hardware, and new quantum algorithms. These efforts aim to maximize performance and

reliability. Additionally, addressing the ethical and societal implications of QNNs, such as data privacy

and job displacement, is crucial for their responsible deployment.

In summary, the potential of QNNs is immense, promising significant advancements in computing

and various application domains. Overcoming technical challenges and fostering interdisciplinary

cooperation are key to realizing their full potential.

References

[1] A. F. Kockum and F. Nori, 2019, Chalmers University of Technology, RIKEN, and University of

Michigan, pp. 703-741.

[2] V. Silva, 2018, Springer Science and Business Media LLC.

[3] D. Copsey, M. Oskin, F. Impens, and T. Metodiev, 2003, IEEE J. Sel. Top. Quantum Electron.,

vol. 9, no. 6, pp. 1552-1569.

[4] Y. LeCun, Y. Bengio, and G. Hinton, 2015, Nature, vol. 521, pp. 436–444.

[5] S. K. Jeswal and S. Chakraverty, 2019, Arch. Comput. Methods Eng., vol. 26, no. 4, pp. 877-887.

[6] M. Schuld, I. Sinayskiy, and F. Petruccione, 2014, Quantum Inf. Process., vol. 13, no. 11, pp.

2567-2586.

[7] J. Tian, X. Sun, Y. Du, et al., 2023, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp.

233-246.

[8] F. Arute et al., 2019, Nature, vol. 574, pp. 505-510.

[9] M. C. Caro, H. Y. Huang, M. Cerezo, et al., 2022, Nat. Commun., vol. 13, 4919.

[10] Y. Li, R. Zhou, R. Xu, J. Luo, and W. Hu, 2020, Quantum Sci. Technol., vol. 5, no. 4, p. 044003.

[11] Ravikumar S, Arockia Raj Y, Babu R, Vijay K, and Ramani R, 2024, Procedia Computer Science,

vol. 235, pp. 506–519.

[12] A. El Bouchti, Y. Tribis, T. Nahhal, and C. Okar, 2019, J. Inf. Secur. Res., vol. 10, no. 3, pp. 97-

104

[13] E. Paquet and F. Soleymani, 2022, Expert Syst. Appl., vol. 195, p. 116583.

[14] S. Tao, Y. Feng, W. Wang, T. Han, P. E. S. Smith, and J. Jiang, 2024, Artif. Intell. Chem., vol. 2,

no. 1, pp. 45-58.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0157

125

Research on the Correlation between the Movement of the

Dollar and the Price of Gold

Yanxi Zhan

Institute of Problem Solving, Dover Bay High School, ShangHai, 201100, China

yanxizhan@ldy.edu.rs

Abstract. Gold acts as a hedge to protect investors' assets. The U.S. dollar is a global currency

and has an important place in international trade. Oil, on the other hand, is a non-renewable

resource and an important international resource with unstable prices. This study used price data

for the U.S. dollar, gold and oil from 2000 to 2023 to analyze the movement of gold, U.S. dollar

and oil prices. The experiment uses a regression model to determine the effect of the dollar on

the price of oil and gold. The study found a significant negative correlation between the U.S.

dollar and the price of gold and an insignificant negative correlation with oil between 2000 and

2014. Between 2015 and 2023, there is a change in U.S. monetary policy, which leads to a

weakening of the negative price correlation between gold and the U.S. dollar, but U.S. dollar still

with the negative correlation with oil remaining weak. These results suggest that buying gold is

the best way to protect your assets in times of financial crisis or market instability, but the US

dollar and oil can change in price due to macroeconomic and political factors. To some extent

these data provide a reference value for investment in the coming year.

Keywords: Dollar, gold and oil indices, correlation.

1. Introduction

This paper focuses on the dynamic price transmission relationship between the U.S. dollar, the price of

gold and crude oil. Whether recent interest rate fluctuations in the dollar directly affect the market price

of gold, and how to analyze and determine this effect. The experiment looks at the effect of dollar

fluctuations on the price range of gold by using a regression model and analyzes the price characteristics

of the dollar and gold. The study will consider the impact of external factors, such as financial markets

and political changes, on the relationship between the U.S. dollar and the price of gold, to provide

investors with valuable investment advice.

Gold as an investment tool is seen to protect financial assets and is famous for its ability to hedge

against market turbulence [1]. As a global currency, the US dollar is used in most international

transactions and settlements. Studies have shown that the dollar maintains its importance in key areas

of international trade and finance [2]. When the price of the U.S. dollar rises, the price of gold usually

falls, when it is more cost-effective to buy gold in U.S. dollars. However, if the investor is not holding

US dollars (e.g., Chinese yuan, ruble), the price of gold may become relatively expensive due to the

depreciation of the local currency in times of inflation. When gold falls, the risk transmission to pairs of

cryptocurrencies is more significant [3]. Gold and inflation are common long-term trend relationship,

which indirectly indicates that gold can be an effective hedge against inflation risk theory, but also at

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0167

(https://creativecommons.org/licenses/by/4.0/).

126

the same time to establish the status of gold in the world's currencies on the high side [4]. Inflation

occurs when the interest rate on the dollar fluctuates significantly, when the interest rate on the dollar is

inversely proportional to the price of gold. From an investment point of view increases the opportunity

cost of some hidden investments, and the price of some other non-renewable resources will also change

with the dollar interest rate. In the world's perception, the dollar's financial attributes are equivalent to

gold and at the same time linked to the price of oil, which can lead one to wonder if there is a

mathematical relationship between the three.

Gold and non-renewable energy sources have driven the price of oil to show an upward trend under

macroeconomic comparisons, inflation, interest rates and industrial production [5]. Fluctuations in the

exchange rate of the U.S. dollar may make it more difficult for oil-producing countries to sell their

products [6]. There are many other things that affect the relationship between the dollar and gold, such

as some political issues. Many businessmen choose to convert their property into gold to hedge against

risk, but central bank policies may lead to volatility in the price of gold. In times of recession in a country

they use gold to implement some flexible monetary policies [7]. Macroeconomic variables are often

used to observe economic impacts, and in some case, it is possible to detect both short-term and long-

term correlations between gold and the US dollar [8]. This information has important implications for

international economic differentiation, and the impact of economic uncertainty on the dynamics of the

relationship between gold and the U.S. dollar varies from country to country [9].

Numerous researchers have shown that there may be a transmission relationship between the dollar

interest rate and the price of gold, with lower interest rates affecting investor expectations of a

depreciation of the dollar, and investors transferring funds to the gold market for capital preservation or

speculation [10]. This paper uses empirical data to justify this conclusion.

2. Methods

2.1. Data Source

The figure 1 below (2000-2024) includes the price movements of gold, the dollar and oil as a

macroeconomic change. The data is derived from actual historical economic data and is usually used to

study data movements in financial markets with a high degree of accuracy and reliability. The

experiment was in the month of July 2024. 2024 did not end, so 2024 data was not included in this study.

Gold (brown) part is a form of asset protection, a relatively scarce and useful mineral has long been

used as currency and has a high historical status. US Dollar (blue) is the world's common currency, used

for international financial trade, the price and the opposite of gold. When the market is stable, most

people will choose to invest in dollars. Crude oil (green) is a non-renewable rare mineral, the world's

most mainstream and one of the most important products. If the price fluctuates, it will affect the global

economy.

Figure 1. Gold price, USD index and oil price in 2000-2024.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0167

127

2.2. Method Introduction

This research will begin by looking at a large amount of data and images, making assumptions, guessing

and analyzing. Setting up the software's model to analyze the validity of the data using the parameters

and attempting to build a regression model of this data using SPSS. The final data was split into two

regression models because there was a substantial change in data differences on the images. One

segment from 2000 to 2015/second segment from 2016 to 2023. The hypothesis testing method was

utilized to test out the validity of the model parameters, the regression coefficients of the model data

output, and finally to verify the validity and significance of the regression coefficients and finally to

draw conclusions. Meanwhile, the data table finds the mean, standard deviation and median of the prices

of USD, GOLD and OIL to analyze their price characteristics.

3. Results and Discussion

3.1. Descriptive Analysis

Gold's median and mean are relatively close, tightly separated by about $50, with a standard deviation

of 420.34 suggesting more volatile prices. The median and mean of the dollar are hardly that far apart,

but they have a standard deviation of 7.89, suggesting less price fluctuation. Finally, the difference

between the mean and the median of oil is not too big, but the standard deviation is 20.34, which reflects

the impact of the price of oil on the world economy and has a very unstable price (Table 1).

Table 1. Gold, dollar and oil price statistics (2000-2023)

Index

Mean

Standard deviation

Median

Gold

1200.56

420.34

1150.78

U.S. dollar

98.45

7.89

98.20

Oil

65.78

20.34

63.45

3.2. Regression Analysis

This paper first set up model from 2000-2014, table 2 shows that the t-value of the gold and the dollar

is 1.439, corresponding P value is 0.152. It is greater than the significance level 0.05. So, for the ZERO

hypothesis is invalid and the gold price has an obvious effect on the dollar index. The t-value of oil price

on the US dollar index is 0.369, and the corresponding p-value is 0.713, which is greater than the

significance level of 0.05. The ZERO hypothesis is not rejected, and it is considered that the oil price

does not have a significant effect on the US dollar index (Table 2).

Table 2. Regression Analysis of Gold, Oil, and Dollar Prices (2000-2014)

Non-normalized

coefficients

Normalization

factor

Colinearity

diagnosis

Beta

VIF

Tolerance

constant

89.947

5.472

16.439

0.000**

Gold_Price_USD_per_Troy_Ounce

0.005

0.004

0.108

1.439

0.152

1.001

0.999

Oil_Price_USD_per_Barrel

0.014

0.037

0.028

0.369

0.713

1.001

0.999

0.012

adjust R2

0.001

F (2,177)=1.092,p=0.338

D-Wvalue

1.915

* p<0.05 ** p<0.01

Calculated by regression analysis during the period from 2000 to 2014, it is known that there is a

significant negative correlation between the dollar and the price of gold. With the coefficients of the

regression analysis, this paper can conclude that there is also a positive increase in time and gold. The

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0167

128

inverse of the price of oil and the dollar is significantly less negative than the price of gold and the dollar.

The data for oil reflects no very large fluctuations and is not a very good predictor of future data.

Table 3. Regression Analysis of Gold, Oil, and Dollar Prices (2015-2023)

Non-normalized

coefficients

Normalization

factor

Colinearity

diagnosis

Beta

VIF

Tolerance

constant

105.404

7.189

14.662

0.000**

Gold_Price_USD_per_

Troy_Ounce

-0.006

0.005

-0.105

-1.142

0.256

1.007

0.993

Oil_Price_USD_per_Barrel

0.003

0.054

0.004

0.048

0.962

1.007

0.993

0.011

adjust R2

-0.006

F (2,117)=0.653,p=0.522

D-W value

2.045

* p<0.05 ** p<0.01

The t-value of the gold price against the US Dollar price, which is -1.142. For the corresponding p

value is 0.256. It is bigger than the significance level 0.05, so the ZERO hypothesis is not rejected, and

the gold price is considered to have a less significant effect for the dollar index.

The t-value of oil price on the US dollar index is 0.048 and the corresponding p value is 0.962, which

is bigger than the significance level 0.05, so the ZERO hypothesis is not rejected, and the oil price is

considered to have a minor effect on the US dollar index. From the data the formula can be derived:

𝐷𝑜𝑙𝑙𝑎𝑟 𝑝𝑟𝑖𝑐𝑒 = 105 − 0.006 × 𝑔𝑜𝑙𝑑 𝑝𝑟𝑖𝑐𝑒 + 0.003 × 𝑜𝑖𝑙 𝑝𝑟𝑖𝑐𝑒 (1)

In the period from 2015 to 2023, the price of the dollar and the price of gold relationship began to

slowly and before the opposite, the opposite nature of the weakening prices began slowly close together.

A large part of this is due to new economic and monetary policy changes or changes in the financial

markets, especially in the United States, which have had a direct effect on the price relationship between

the dollar and gold. For example, interest rates have been raised and cut in recent years (Table 3).

The price of oil is still in an indirect rise with time and the price of the dollar, and new monetary

policies have led to new market changes, with more destabilizing effects of demand or economic factors.

Table 4. Total Prices of Gold, Oil, and Dollar (2000-2023)

Non-normalized

coefficients

Normalization

factor

Colinearity

diagnosis

Beta

VIF

Tolerance

constant

105.404

7.189

14.662

0.000**

Gold_Price_USD_per_

Troy_Ounce

0.001

0.003

0.013

0.217

0.829

1.000

Oil_Price_USD_per_Barrel

0.007

0.031

0.013

0.219

0.827

1.000

R 2

0.000

adjustR 2

-0.006

F (2,297) =0.049, p=0.953

D-Wvalue

1.963

* p<0.05 ** p<0.01

The t value of the gold price against the US dollar index is 0.217, and the corresponding p value is

0.829, which is bigger than the significance level of 0.05, so the ZERO hypothesis is not rejected, and

it is considered that the negative opposite of the gold price against the US dollar index is weakened

(Table 4). The t value of oil price on the dollar index is 0.219 and the corresponding p value is 0.827,

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0167

129

which is greater than the significance level of 0.05 and does not reject the ZERO hypothesis, which

suggests that the oil price has a certain impact on the dollar index, which needs to be further investigated

according to the economic and financial market changes or address and other reasons.

The experiment leads to the formula:

𝐷𝑜𝑙𝑙𝑎𝑟 𝑝𝑟𝑖𝑐𝑒 = 96.644 + 0.001 × 𝑔𝑜𝑙𝑑 𝑝𝑟𝑖𝑐𝑒 + 0.007 × 𝑜𝑖𝑙 𝑝𝑟𝑖𝑐𝑒 (2)

This experiment is a regression analysis of the prices of the dollar, gold and oil over two different

periods of time, to be able to show investors more intuitively whether there is a correlation between the

prices of the dollar, gold and oil. During the period 2000-2014, the prices of oil and gold showed a

negative correlation with the price of the US dollar. But the gold index is unstable, in the price of low,

in addition to the 2008 financial crisis, the rapid growth in the price of gold shows that there is a good

safe-haven nature. The price of oil also became very high in 2014. In the period 2015-2023, the price of

the dollar began to stabilize with a positive correlation, while the price of gold rose rapidly due to the

very unstable world financial markets caused by COVID-19. The price of oil started to fall after 2015,

possibly due to a decrease in demand and supply in the market or due to some regional political issues

(Table 4).

The data shows that gold, as a high-end way to protect assets, can have good price stability in times

of financial crisis. Oil prices may be affected by more volatile factors, with different supply and demand

balances at different times and different prices in different geographic locations, including political

policies and market expectations in different places. The dollar affects the gold and oil price differently

in different economic environments, and investors can use this data to help inform their investment

decisions.

4. Conclusion

This style regression analysis is used to show the mathematical relationship between the price of the

dollar, the price of gold, and the price of oil from 2000 to 2023. The regression coefficients from the

regression analysis revealed a very significant negative relationship between dollar, gold and oil index

between 2000 to 2014. From 2015 to 2023 the negative correlation between the dollar and the price of

oil and gold begins to weaken due to changes in currency politics. The relationship of price figures

affects the macroeconomic environment and provides important information for investors' future

investment strategies. This data can be used to study the price changes that will take place in the coming

year, and further discuss the macroeconomic impact of these important asset prices to make a

comprehensive analysis.

References

[1] Gold, J.M. (2011) Gold and the US dollar: Hedge or haven? Finance Research Letters, 8(3), 120-

131.

[2] Goldberg, L.S. (2010) Is the international role of the dollar changing? Current Issues in

Economics and Finance, 16(1).

[3] Cao, G. and Ling, M. (2022) Asymmetry and conduction direction of the interdependent structure

between cryptocurrency and US dollar, renminbi, and gold markets. Chaos, Solitons &

Fractals, 155, 111671.

[4] Batten, J.A., Ciner, C. and Lucey, B.M. (2014) On the economic determinants of the gold–

inflation relation. Resources Policy, 41, 101-108.

[5] Wang, Y.S. and Chueh, Y.L. (2013) Dynamic transmission effects between the interest rate, the

US dollar, and gold and crude oil prces. Economic Modelling, 30, 792-798.

[6] Wang, Y.S. and Chueh, Y.L. (2013) Dynamic transmission effects between the interest rate, the

US dollar, and gold and crude oil prices. Economic Modelling, 30, 792-798.

[7] Staszczak, D.E. (2020) Global instability of gold prices: view from the state-corporation

hegemonic stability theory.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0167

130

[8] Zhou, Y., Han, L. and Yin, L. (2018) Is the relationship between gold and the US dollar always

negative? The role of macroeconomic uncertainty. Applied Economics, 50(4), 354-370.

[9] Pellejero, S. (2020) Oil prices fall as rising COVID-19 cases prompt demand concerns. Investors

Also Eye Rising Crude.

[10] Kadhem, S. and Thajel, H. (2023) Modelling of crude oil price data using hidden Markov model.

Journal of Risk Finance, 24(2), 269-284.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0167

131

Improvement of visual servo system of industrial robot based

on sliding mode control and deep reinforcement learning

Yunzhe Zhou

Department of Smeal Business, Pennsylvania State University, PA, USA

Yzz5886@psu.edu

Abstract. Visual servo system is more and more widely used in the field of industrial robots

because it allows robots to sense external signals through sensors to convey control commands

to themselves and complete tasks. However, the traditional visual servo has many limitations in

the design of controller and image feature extraction, such as insufficient robustness and image

extraction accuracy. This research focuses on the optimization of controller and image feature

extraction, which can improve the overall performance and autonomy of the system by

combining sliding mode control and Convolutional Neural Network (CNN). Sliding mode

control performs well in terms of robustness and response speed, while CNN has excellent ability

for image feature extraction. The research results show that the combination of the two and the

visual servo has better performance in a variety of application scenarios, so this is also the

development direction that industrial robots can adopt in the future.

Keywords: Visual servo system, sliding mode control, industrial robot.

1. Introduction

With the development of automation and robot technology in the industrial field, industrial robots have

made great breakthroughs in the 21st century. The first industrial robot, designed by Griffith P. Taylor

in 1935, set a fixed program to carry goods, and since then, the technology has developed into a variety

of highly intelligent multi-functional robots [1]. The subsequent Unimation robot, designed by George

and Joseph in 1956, using cash's servo motor and sensor technology, with more than six degrees of

freedom, and with networking technology, capable of remote monitoring and collaborative work, was

the world's first programmable industrial robot, marking the beginning of the era of robot automation

[2]. After the 1980s, multi-axis robots became the standard for industrial robots, improving their

usability. The introduction of offline programming technology made it possible for robots to be

programmed and tested in a virtual environment, reducing debugging time in actual production. In the

21st century, many production lines began to move toward automation, including robot collaborative

industry. Material handling, processing assembly, and product packaging are completed by integrated

operations. The widespread application of artificial intelligence algorithms and big data can enable

robots to learn and accumulate autonomously and constantly optimize themselves.

In the history of industrial robot development, the visual servo system has been widely used, and

visual servo system is a technology that uses visual information to control robot movement, captures

images through cameras, processes and extracts image features, and forms control signals according to

features through error calculation, so as to adjust the position and attitude of the robot itself [3]. Visual

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0168

(https://creativecommons.org/licenses/by/4.0/).

132

servo (VS) is mainly divided into image-based IBVS and position-based PBVS [3]. The former directly

calculates image feature points, while the latter uses image feature points to calculate three-dimensional

parameters for attitude estimation and then control. Since the system includes both perception and

control aspects, each link can improve the performance of the visual servo by introducing more efficient

and advanced algorithms and technologies.

As an important system of robots in the industrial field, the visual servo system brings great

development for the autonomy and independence of robots in completing tasks, and effectively reduces

labor costs and time costs. It is widely used in the industrial field and has a variety of application

scenarios.

The visual servo system can accurately guide the robot to carry out complex assembly tasks. For

example, in automobile manufacturing, the visual servo system can help the robot to identify and select

small parts in the parts library, and then complete the installation, such as screws, metal blocks, baffles,

etc. This greatly reduces the labor cost of the factory and improves the production efficiency of the

production line.

In the welding process, the visual servo system can monitor the welding position, the completion of

welding and the welding quality in real time. Through the images provided by the camera, the robot uses

the miniature coordinates of each point on the surface of the welding object to make connection and

analyzes the precise welding path to ensure the same weld accuracy.

Visual servo also has applications in product quality detection in industrial production. Through the

extraction of object feature points by robots, it can carry out multiple magnification to find the missing

feature points on the structure of the object, so as to identify plane scratches, dimensional deviations and

other errors.

In addition, there are many improvement schemes based on visual servo system, which aim to

improve the efficiency and accuracy of robots to complete tasks through the visual servo system. For

example, by combining the advantages of position-based visual servo and image-based visual servo

control methods, 2.5D VS establishes a connection between the image plane and three-dimensional

space, builds a hybrid Jacobian matrix, and adjusts errors from the depth direction of the image [4]. The

backward camera phenomenon and singularity problem of IBVS system are overcome, thus ensuring

the stability of the control process.

2. Limitations analysis of traditional visual servo systems

There are still limitations in the traditional visual servo system. The shortcomings and improvements in

achieving autonomous tasks can be analyzed in this section.

2.1. Robustness

The traditional visual servo system often shows insufficient robustness when dealing with different

environmental changes and external interference, in other words, the robot is affected by the external

conditions of the system, resulting in reduced operating accuracy, or even severe control program error.

The nonlinear control of visual servo is mainly completed by the closed-loop motion trajectory error

calculation, and the condition interference outside the system will cause the system to deviate from the

original path, and it needs manual parameter adjustment to restart the operation [5]. External interference

is often manifested in noise, light, and obstruction problems. First of all, the electromagnetic interference

generated by industrial equipment will affect the electronic components of the camera, resulting in

distortion of the visual reception signal, and the vibration caused by noise will cause small changes in

the position of the camera to continue to swing, affecting the extraction stability. For image extraction,

noise will lead to different intensities of image feature edge construction, and some weak intensity

features will be missed by the receiver, resulting in errors in image modeling, and eventually lead to

operation errors.

The light and obstacle occlusion will also have a significant impact, strong light will lead to the

camera lens's dazzling effect, affecting the image clarity. The light source into the camera is very

obvious, thus affecting the judgment of the picture information, coupled with the vision sensor pixels

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0168

133

under strong light saturation phenomenon, resulting in the image blank, loss of detail information and

finally affecting the image quality affection. The occlusion of obstacles will directly hinder the receiving

line of sight, failing to obtain correct and complete information, or the extraction of obstacle features as

the characteristics of the target object, which will cause large errors in the operation and deviate from

the control trajectory.

2.2. Response time

Response time is also one of the limitations, meaning that the robot takes a long time from receiving the

control signal to actually producing the output operation, which affects the efficiency and accuracy of

the task. The control algorithm of the visual servo system needs to calculate the control instructions

according to the image processing results, and the complexity of the control algorithm will directly affect

the response time. The traditional visual servo system mainly uses PID controller. PID control is a classic

feedback control algorithm, which is a common system in industrial automation [6]. Error correction is

carried out by three parameters, proportion, integral and differential, so as to achieve a control effect [6].

First, proportional control generates a control signal according to the consistent error and changes the

control response speed by adjusting the proportional gain, while the integral control generates a control

signal according to a large number of accumulated values of errors to eliminate the steady-state error

[6]. As the adjustment of the integral time constant and integral gain requires the accumulation of signals,

the response speed will be too slow. Finally, differential control generates a control signal through the

error change rate, so as to predict the future error change trend. However, the controller gain of PID is

fixed, and the error accumulation in actual operation is also non-linear, so PID control needs to spend a

certain amount of time in the error calibration process.

2.3. Image extraction accuracy

For the visual sensing part of the traditional visual servo system, most of the edge detection method is

used, which is also the basic technology of image processing, for detecting the object boundary in the

image, such as Sobel operator. The Sobel operator first grays the color image and uses two filters to

calculate the gradients Gx and Gy in two directions respectively, so as to carry out convolution

operations and judge the edge position of the object according to the gradient size and direction [7].

When the gradient size exceeds the set threshold, the layer is regarded as the edge layer. Although the

calculation steps of Sobel operator are simple and the real-time processing ability is strong, its limitation

and the volume of the convolution kernel are too large to determine the fine edge details, plus there are

only two detection directions, and it also has a large error for variable objects. In addition to Sobel

operator, Canny algorithm is also a common detection method. It smooths the image through the two-

dimensional formula of the Gaussian filter, then calculates the gradients in both directions, then refines

the edges using non maxima suppression, and finally sets the high and low thresholds for classification

to facilitate the edges to be connected along the gradient size [7]. Canny has high robustness and stability,

but the calculation steps are too complex, resulting in poor real-time performance, which is not

conducive to industrial dynamic tasks.

Based on the limitations of the above visual servo, there are still considerable obstacles for the

traditional visual servo to achieve full automation. In order to improve production efficiency and product

quality, reduce unnecessary labor costs, and promote the progress of the overall technology of industrial

robots, it is necessary to improve the visual servo system combined with sliding mode control and deep

reinforcement learning technology.

3. Feasibility of optimization scheme

Sliding mode control is a nonlinear robust control method, which has the characteristics of fast response

and strong robustness, and can effectively deal with the uncertainty and external interference of the

system. It has two major design steps. The first is to design sliding surface, s=ce +e󰇗 which is also

the ideal motion state of the system; the second is to design control law 𝑠󰇗 = −𝜖𝑠𝑔𝑛(𝑠) , 𝜖 > 0, also

called function switch, which is used to control the target object to approach the sliding mode surface

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0168

134

continuously [8, 9]. After the object reaches the sliding surface, only the control law will affect the

motion trajectory of the object, which is not affected by external factors [8]. So that's why it's robust.

In recent years, some scholars have also carried out detailed studies on this aspect. For example, M.

Parsapour et al used the robust estimator based on the Untraceless Kalman Observer (UKO) cascade

and Kalman Filter (KF) to infer the physical parameters and placement posture of the target object, and

established a sliding mode control model [10]. In this paper, the author uses Lyapunov theory to analyze

the stability of the closed-loop control system [11]. By selecting the appropriate Lyapunov function

V= 1/2STS and calculating its function derivative, it is proved that it is always negative under the

action of the control signal, that is, the state of the system tends [11]. In the experiment, the authors used

a 5-DOF RV robot. For the first experiment, the visual adjustment experiment, the target object does

not move. The system needs to adjust the position and attitude of the end effector to make it reach the

desired state, and control the signal through different types of switching functions. The results show that

the controller converges the trajectory to the sliding surface within 0.2 seconds, and makes the position

and velocity errors of the end-effector close to zero within 0.8 seconds [10]. In addition, the use of a

sliding mode controller with a saturation function can effectively reduce the chattering phenomenon of

sliding mode control. In the second experiment, the verification of the overall performance, the target

object is independently moved along the X, Y, Z, A and B directions to verify the tracking performance

of the system under the condition of target movement. The results show that the controller can respond

quickly to the target movement. Through these experiments, it is also proved that sliding mode control

can help the visual servo system overcome the shortcomings of slow response and poor robustness.

In addition, visual servo can also be combined with Convolutional Neural Network (CNN) to

improve the system's performance of receiving information, that is, the ability to extract images. The

CNN is a deep learning model with powerful image feature extraction ability [12]. CNN extracts pixel

values through the convolutional kernel to generate a rough model and then extracts the image by

dropout and pooling to reduce dimension and parameters and retain key features, aiming at improving

computing efficiency [12]. Then the number of pooling is activated by the ruler function, and finally the

matrix transformation of the full connection layer is carried out to convert countless input parameters

into a single control signal through the function [12]. Therefore, CNN can process complex image

features and extract them with higher precision. In an article based on the detection system combined

with isolation security and deep learning, the author compared the accuracy of CNN and SVM, and the

results showed that CNN always maintained a high and stable accuracy in the whole training process

(shown in the Figure 1) [13].

Figure 1. Experiment result accuracy performance of CNN and SVM [11].

4. Proposal of optimization program

Based on the two available improvements, visual servo systems can be used in different situations. Since

the sliding mode control and CNN cannot meet all situations in terms of execution efficiency,

implementation cost, and operational complexity, engineers need to carry out scenario analysis and

choose the appropriate optimization scheme under the specific task.

0.75

0.8

0.85

0.9

0.95

1.05

100

CNN vs SVM accuracy

CNN SVM

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0168

135

4.1. Sliding mode control system

In the environment with high real-time requirements and relatively simple tasks, only the Sliding Mode

Control (SMC) system can meet the requirements. Because the system can keep the object moving

steadily on the sliding surface, to achieve fast response and complete the task with high robustness, this

method is very suitable for the task requiring the robot to perform low latency and high real-time tasks.

For example, complete the robot's production line assembly tasks and sorting tasks. In the task, the robot

does not need to show high-intensity image extraction capability but only needs to lock the target object

and perform the corresponding operation. In addition, such tasks require high stability, and if the system

is vulnerable to external interference, it needs to be manually monitored and adjusted in real time.

4.2. Convolutional neural network

In specific complex visual tasks, it is suitable to use CNN alone for feature extraction and processing,

which can improve the accuracy of image recognition. The most common example is the slim sensor

for autonomous vehicles, where the vehicle needs to obtain real-time images of the road environment

through the camera and analyze the images to identify obstacles such as pedestrians and other vehicles

and convey danger signals to the car. In addition, this solution is also suitable for product testing, for

complex structural frames, each corner and geometric slope need to ensure high precision so that there

are no accidents in application.

4.3. Combination of SMC and CNN

In the task of high precision visual servo control in complex dynamic environment, the optimization

scheme combining SMC and CNN is needed. The system not only needs high-precision image extraction

but also needs to resist the interference of changing environment. For example, in the robot welding

process, the system needs to complete the operation with high precision in the high-intensity noise

environment generated by the welding. The CNN system provides virtually error-free details of the

object's appearance such as edges and dioramas. The sliding mode control can make the system stably

execute and input instructions on the sliding surface, which perfectly integrates the points of both sides

(shown in Figure 2).

Figure 2. Comprehensive diagram.

The future robot vision servo system will pay more attention to the integration of multiple sensors,

one purpose is to reduce the acceptance pressure of a single camera, the other is to analyze the target

image from more angles. The vision sensor will be combined with lidar, ultrasonic, thermal imaging,

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0168

136

etc. The system will obtain more comprehensive environmental information and enhance its autonomous

decision-making ability. With the continuous progress in the field of AI, industrial robots will pay more

attention to intelligence. Deep learning will be a big picture of future research, and the development of

multi-modal sensing will provide more powerful computing power for robots.

5. Conclusion

In this study, the optimization of the visual servo system is deeply discussed. By combining sliding

mode control and CNN, the system robustness, response speed and image processing accuracy are

effectively improved, which is of great help to the improvement of traditional visual servo systems.

The sliding surface and control law are designed to make the system move stably in the calibrated

trajectory. The strong robustness to illumination changes, noise and occludes will be made and the

correct path processing under uncertain conditions will be realized. By using the process of

convolutional extraction and pooling as well as dropout, the CNN help system can recognize and extract

the full range of pixels of the image, which is more precise and can extract more parameters than the

traditional edge detection and corner detection. In the future, with the continuous development of

artificial intelligence and deep learning technology, the industrial robot visual servo system will be

further intelligent and efficient. The application of a multi-sensor fusion technology robot will expand

the application range of the robot, improve its adaptability and autonomy in complex environments, and

achieve a high degree of automation.

References

[1] Grace, J. (1937). environment and nation. Griffith Taylor. The Journal of Geology, 45(5), 571–

572. https://doi.org/10.1086/624573

[2] Gasparetto, A., & Scalera, L. (2019). From the unimate to the delta robot: the early decades of

industrial robotics. In Explorations in the History and Heritage of Machines and Mechanisms:

Proceedings of the 2018 HMM IFToMM Symposium on History of Machines and

Mechanisms (pp. 284-295). Springer International Publishing.

[3] Cong, V.D., & Hanh, L.D. (2023). A review and performance comparison of visual servoing

controls. International Journal of Intelligent Robotics and Applications, 7, 65-90.

[4] Zhang, H., Li, M., Ma, S., Jiang, H., & Wang, H. (2021). Recent advances on robot visual servo

control methods. Recent Patents on Mechanical Engineering, 14(3), 298-312.

[5] Grimble, M. J., & Majecki, P. (2020). Nonlinear Industrial Control Systems. Springer London.

[6] Borase, R. P., Maghade, D. K., Sondkar, S. Y., & Pawar, S. N. (2021). A review of PID control,

tuning methods and applications. International Journal of Dynamics and Control, 9, 818-827.

[7] Sun, R., Lei, T., Chen, Q., Wang, Z., Du, X., Zhao, W., & Nandi, A. K. (2022). Survey of image

edge detection. Frontiers in Signal Processing, 2, 826967.

[8] Zhang, X. (2022). SMC for nonlinear systems with mismatched uncertainty using Lyapunov-

function integral sliding mode. International Journal of Control, 95(10), 2710-2725.

[9] Gambhire, S. J., Kishore, D. R., Londhe, P. S., & Pawar, S. N. (2021). Review of sliding mode

based control techniques for control system applications. International Journal of dynamics

and control, 9(1), 363-378.

[10] Utkin, V., Poznyak, A., Orlov, Y. V., & Polyakov, A. (2020). Road map for sliding mode control

design. Berlin/Heidelberg, Germany: Springer International Publishing.

[11] Parsapour, M., RayatDoost, S., & Taghirad, H. D. (2013, February). Position-based sliding mode

control for visual servoing system. In 2013 First RSI/ISM International Conference on

Robotics and Mechatronics (ICRoM) (pp. 337-342). IEEE.

[12] Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2021). A survey of convolutional neural networks:

analysis, applications, and prospects. IEEE transactions on neural networks and learning

systems, 33(12), 6999-7019.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0168

137

[13] Ramirez, A. G., Lara, C., Betev, L., Bilanovic, D., & Kebschull, U. (2018). Arhuaco: Deep

learning and isolation-based security for distributed high-throughput computing. arXiv

preprint arXiv:1801.04179.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0168

138

Optimizing supply chain networks using mixed integer linear

programming (MILP)

Xu Li1, Xiaoheng Ji2,†, Xiaolong Zeng3,*,†

1 The University of Sheffield, Sheffield, The UK

2The University of Auckland, Auckland, New Zealand

3The University of Queensland, St Lucia QLD 4072, Australia

†Xiaoheng Ji and Xiaolong Zeng contributed equally to this work.

*rara481846778@gmail.com

Abstract. Mixed Integer Linear Programming (MILP) has emerged as a powerful tool for

optimizing complex supply chain networks. This paper explores the theoretical foundations of

MILP, including the integration of integer variables and advanced solution techniques such as

branch-and-bound and branch-and-cut algorithms. Through detailed modeling of production

planning, network design, and transportation logistics, MILP enables companies to achieve

significant cost reductions and operational efficiencies. We present case studies from retail,

manufacturing, and pharmaceutical sectors to illustrate the practical applications of MILP. These

examples demonstrate how MILP optimization can lead to reductions in production and

inventory costs, improved customer satisfaction, and enhanced service levels. The findings

underscore the value of MILP in addressing the multifaceted challenges of modern supply chain

management.

Keywords: Mixed Integer Linear Programming (MILP), supply chain optimization, production

planning, network design, transportation logistics.

1. Introduction

Supply chain management is a critical function for organizations seeking to enhance efficiency and

competitiveness. As global markets become more interconnected, optimizing supply chain networks

presents both opportunities and challenges. Traditional linear programming (LP) techniques provide a

foundation for addressing these challenges but often fall short when decision variables must be whole

numbers. This necessity introduces Mixed Integer Linear Programming (MILP), a sophisticated

approach that incorporates integer variables to reflect real-world constraints such as the number of

production batches or transportation trips. MILP transforms the optimization landscape by making the

solution space discrete and non-convex, requiring specialized algorithms like branch-and-bound and

branch-and-cut to navigate this complexity. These advanced techniques enable efficient exploration of

large solution spaces, identifying optimal solutions that meet all constraints. For example, in a

transportation problem, the need to minimize the number of trips while ensuring timely deliveries

requires integer solutions, as fractional trips are impractical. The application of MILP extends across

various industries, from retail and manufacturing to pharmaceuticals, each with unique supply chain

dynamics. In retail, optimizing warehouse locations and inventory levels can significantly reduce

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240642

(https://creativecommons.org/licenses/by/4.0/).

139

transportation costs and improve service levels [1]. Manufacturing firms leverage MILP to enhance

production schedules and distribution routes, achieving cost savings and reduced lead times.

Pharmaceutical companies use MILP to ensure regulatory compliance and timely delivery of

medications, crucial for patient care. This paper delves into the theoretical underpinnings of MILP,

explores its application in supply chain optimization, and presents case studies to illustrate its practical

benefits. By examining these aspects, we aim to highlight the transformative impact of MILP on supply

chain management, offering insights for businesses looking to optimize their operations in an

increasingly complex and competitive landscape.

2. Theoretical Foundations of MILP

2.1. Linear Programming Basics

Linear programming (LP) is a fundamental technique in optimization used to achieve the best outcome,

such as minimizing costs or maximizing profits, given a set of linear constraints. The general form of

an LP problem consists of an objective function and a set of constraints. The objective function, typically

a linear equation, represents the goal of the optimization, such as minimizing the total cost in a supply

chain network. Constraints, on the other hand, represent the limitations or requirements of the system,

such as production capacities, demand requirements, or budgetary restrictions.

For example, consider a company that produces two products, A and B. The objective function might

be to maximize the total profit, represented as Z=50x1+40x2, where x1and x2 are the quantities of

products A and B, respectively. The constraints might include limitations on labor and material, such as

2x1+3x2≤120(labor hours) and x1+2x2≤100(material units) [2].

To solve LP problems, algorithms like the Simplex method are commonly used. The Simplex method

iteratively moves along the edges of the feasible region defined by the constraints to find the optimal

solution. This method is efficient for many practical problems and can handle a large number of variables

and constraints. For instance, in a supply chain optimization problem with hundreds of products and

multiple constraints, the Simplex method can quickly navigate through the feasible region to identify

the optimal distribution of resources, minimizing overall costs while meeting all demand requirements.

2.2. Introduction to Integer Variables

In many real-world applications, decision variables cannot be fractional and must take on whole

numbers. For example, when determining the number of trucks to dispatch or the number of production

batches to run, fractional values are not practical. This requirement introduces integer variables into the

optimization problem, transforming it into a Mixed Integer Linear Programming (MILP) problem.

The inclusion of integer variables adds significant complexity to the problem because the solution space

becomes discrete and non-convex. This means that traditional LP solution techniques, which rely on

convexity, are no longer applicable. For instance, in a transportation problem where the goal is to

minimize the number of trips while ensuring all deliveries are made, the number of trips must be an

integer. A fractional trip does not make sense in this context [3].

The complexity arises because the number of possible solutions increases exponentially with the

number of integer variables. Consider a supply chain problem with five potential warehouse locations

(binary decision variables for each location: open or closed). The solution space consists of

25=32possible combinations. As the number of decision variables grows, this space becomes vast,

making the problem challenging to solve. Specialized algorithms and techniques, such as branch-and-

bound, are necessary to efficiently explore this large and complex solution space to find the optimal

integer solution. Table 1 illustrates the concept of integer variables in an MILP problem, using the

example of deciding whether to open or close five potential warehouse locations.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240642

140

Table 1. Integer Variables In MILP

Scenario

Decision Variable

Possible Combinations

Total Combinations

Warehouse 1

Open (1) / Close (0)

Warehouse 2

Open (1) / Close (0)

Warehouse 3

Open (1) / Close (0)

Warehouse 4

Open (1) / Close (0)

Warehouse 5

Open (1) / Close (0)

2.3. MILP Solution Techniques

Solving MILP problems involves advanced techniques that efficiently navigate the discrete and non-

convex solution space. One widely used method is the branch-and-bound algorithm. This technique

systematically explores branches of the solution space tree, calculating bounds to eliminate regions that

do not contain the optimal solution. For example, in a production scheduling problem with constraints

on machine capacities and delivery deadlines, branch-and-bound can effectively prune suboptimal

schedules, focusing computational efforts on the most promising regions of the solution space. Another

powerful method is the branch-and-cut algorithm, which enhances branch-and-bound by incorporating

cutting planes. Cutting planes are linear inequalities added to the MILP model to exclude infeasible

regions without excluding any feasible integer solutions. This technique refines the feasible region

iteratively, converging towards the optimal solution more quickly. For instance, in a logistics network

design problem, cutting planes can eliminate infeasible routes, reducing the complexity and solving time

of the problem. Modern MILP solvers, such as CPLEX and Gurobi, implement these advanced

techniques efficiently [4]. These solvers are equipped with sophisticated algorithms that handle large-

scale MILP problems involving thousands of variables and constraints. For example, Gurobi's parallel

processing capabilities can solve complex optimization problems in industries ranging from energy to

finance within reasonable timeframes. Additionally, these solvers incorporate heuristic methods to

quickly find good feasible solutions, which are then refined through exact optimization techniques. In a

supply chain context, a heuristic might provide a near-optimal initial solution for warehouse placement,

which the solver then improves upon, ensuring that the final solution is both optimal and

computationally feasible.

3. Modeling Supply Chain Networks with MILP

3.1. Network Design

Designing a supply chain network involves making critical decisions regarding the optimal locations

and capacities of various facilities, such as plants, warehouses, and distribution centers. MILP models

for network design are constructed to address these decisions by including decision variables that

represent facility locations, production quantities, transportation routes, and inventory levels. The

primary objective of these models is to minimize the total cost, which includes fixed facility costs,

transportation costs, and inventory holding costs, all while satisfying demand and capacity constraints.

For instance, consider a multinational retail company aiming to optimize its distribution network across

North America. The company needs to decide the number and location of warehouses to minimize total

costs while ensuring timely delivery to all retail outlets. The MILP model might include variables such

as the binary decision to open or close a warehouse, the quantity of goods to be shipped from each

warehouse to retail outlets, and the inventory levels at each warehouse. Constraints would include

warehouse capacity limits, demand requirements at each retail outlet, and transportation capacity [5].

The model might reveal that by closing two underperforming warehouses and opening one new,

strategically located distribution center, the company can reduce overall costs by 12%. Additionally, by

optimizing transportation routes, the model could identify a potential 8% reduction in transportation

costs, achieving a balance between fixed facility costs and variable transportation expenses. This level

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240642

141

of detailed analysis and optimization highlights the power of MILP in designing efficient, cost-effective

supply chain networks.

3.2. Production Planning

Production planning is a critical aspect of supply chain management that involves determining the

optimal production schedules and quantities for each product at each facility. MILP models for

production planning incorporate a range of constraints, including production capacities, setup times, and

inventory levels, to develop a comprehensive production schedule. The primary goal is to minimize

production and inventory costs while ensuring that customer demand is met promptly. For example, a

global electronics manufacturer might use an MILP model to optimize its production planning across

multiple factories worldwide. The decision variables in this model could include the number of units of

each product to be produced at each factory, the timing of production runs, and the levels of inventory

to be maintained at each location. Constraints would involve the production capacity of each factory,

the setup times required for switching production lines between different products, and the inventory

holding capacities. By implementing the MILP model, the manufacturer could identify an optimal

production schedule that reduces total production costs by 15% and inventory holding costs by 20% [6].

The model might suggest producing certain high-demand products in factories closer to key markets to

reduce lead times, while producing lower-demand products in factories with lower production costs.

This optimization would result in improved customer satisfaction due to shorter delivery times and lower

operational costs, demonstrating the value of MILP in production planning.

Figure 1. Impact of MILP Optimization on Production and Inventory Costs

3.3. Transportation and Logistics

Transportation and logistics optimization focuses on determining the most cost-effective transportation

routes and modes for delivering products from suppliers to customers. MILP models in this area include

decision variables that represent transportation modes, routes, and shipment quantities. The primary

objective is to minimize transportation costs while ensuring timely delivery and maintaining high service

levels. Consider a large e-commerce company that needs to optimize its logistics network to handle

increasing order volumes efficiently. The MILP model for this scenario might include variables for

selecting transportation modes (e.g., air, sea, or land), choosing specific routes for each shipment, and

determining the quantities of goods to be shipped along each route. Constraints would include vehicle

capacities, delivery windows, and regulatory restrictions [7]. By using an MILP model, the e-commerce

company could identify optimal transportation routes that reduce overall logistics costs by 18%. For

instance, the model might recommend shifting a portion of air shipments to sea freight for certain routes

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240642

142

where delivery time is less critical, resulting in significant cost savings. Additionally, the model could

optimize delivery schedules to ensure that trucks and delivery vans are utilized to their full capacity,

reducing the number of trips required and further cutting costs. This detailed optimization enables the

company to maintain high service levels while minimizing transportation expenses, illustrating the

effectiveness of MILP in transportation and logistics management.

4. Case Studies in Supply Chain Optimization

4.1. Retail Supply Chain

A major retail chain, which operates over 500 stores across multiple regions, faced challenges in

managing its extensive supply chain network. The company decided to utilize Mixed Integer Linear

Programming (MILP) to optimize its supply chain, focusing specifically on the locations of its

warehouses and the management of its inventory. The MILP model considered various factors such as

supplier locations, existing warehouse capacities, transportation costs, and demand at each retail outlet.

By modeling the entire network, including the potential for new warehouse locations, the company

identified that relocating certain warehouses closer to high-demand areas would significantly reduce

transportation costs. The optimization process led to the strategic opening of three new warehouses and

the closure of two underperforming ones. This reconfiguration resulted in a 15% reduction in overall

supply chain costs, amounting to annual savings of approximately $30 million. The MILP model also

provided detailed insights into optimal inventory levels at each warehouse. By aligning inventory

management with demand forecasts, the company reduced stockouts by 20%, which in turn improved

service levels and customer satisfaction [8]. Additionally, the optimization led to a 10% reduction in

transportation costs, equivalent to saving $10 million annually. These improvements highlighted the

model's effectiveness in balancing cost and service level trade-offs, ultimately enhancing the efficiency

and performance of the supply chain network.

4.2. Manufacturing Supply Chain

A global manufacturing firm specializing in automotive components applied MILP to optimize its

complex production and distribution network. The firm's network included multiple production plants,

distribution centers, and a vast customer base spread across different continents. The MILP model

incorporated decision variables such as plant locations, production schedules, and distribution routes,

alongside constraints like production capacities, lead times, and transportation costs. The model revealed

that consolidating certain production activities and adjusting the production schedules could

significantly enhance efficiency. Specifically, the firm decided to centralize the production of high-

demand components in plants with the highest capacity utilization rates, which led to a 20% reduction

in production costs, saving approximately $50 million annually. Additionally, the model suggested

optimizing distribution routes to minimize lead times and transportation expenses. By implementing

these strategies, the firm reduced lead times by 25%, improving the average delivery time from 10 days

to 7.5 days. This enhancement in lead times was particularly crucial for maintaining competitive

advantage in the fast-paced automotive industry [9].

4.3. Pharmaceutical Supply Chain

A pharmaceutical company that produces and distributes a wide range of medications faced the

challenge of optimizing its supply chain network to meet stringent regulatory requirements and maintain

high service levels. The company turned to MILP to optimize its drug production and distribution

processes, focusing on production quantities, facility locations, and transportation routes. The MILP

model incorporated variables such as production capacities at different facilities, demand forecasts for

various medications, transportation costs, and regulatory compliance requirements. By analyzing these

variables, the model identified optimal production schedules and facility locations that minimized costs

while ensuring timely delivery of medications. The optimization led to a 20% reduction in production

costs, saving the company approximately $25 million annually. This was achieved by consolidating

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240642

143

production at facilities with higher efficiency and lower operational costs. Additionally, the model

helped streamline the distribution network, reducing delivery times by 15%. The average delivery time

was reduced from 5 days to 4.25 days, which was critical for ensuring that life-saving medications

reached patients promptly [10].

5. Conclusion

Mixed Integer Linear Programming (MILP) stands as a robust optimization tool, enabling businesses to

address the complexities of modern supply chain management effectively. Through its ability to handle

integer variables and complex constraints, MILP provides detailed and actionable insights for

optimizing production planning, network design, and transportation logistics. The case studies presented

in this paper demonstrate substantial cost savings and operational improvements across various sectors.

Retail chains achieved significant reductions in supply chain costs and improved service levels through

strategic warehouse relocations. Manufacturing firms benefited from streamlined production and

distribution processes, enhancing efficiency and reducing lead times. Pharmaceutical companies

ensured regulatory compliance while maintaining high service standards and minimizing operational

costs. These examples underscore the practical value of MILP in achieving optimized, cost-effective,

and efficient supply chain networks. As businesses continue to navigate the challenges of global markets,

MILP offers a powerful framework for making informed, strategic decisions that drive performance and

competitiveness.

References

[1] Thomas, Meghna, and Lina Sela. "A Mixed‐Integer Linear Programming Framework for

Optimization of Water Network Operations Problems." Water Resources Research 60.2

(2024): e2023WR034526.

[2] Rosenhahn, Bodo. "Optimization of Sparsity-Constrained Neural Networks as a Mixed Integer

Linear Program: NN2MILP." Journal of Optimization Theory and Applications 199.3 (2023):

931-954.

[3] Ágoston, Kolos Cs, and Marianna E.-Nagy. "Mixed integer linear programming formulation for

K-means clustering problem." Central European Journal of Operations Research 32.1 (2024):

11-27.

[4] Kakkad, Dev A., et al. "Iterative MILP algorithm to find alternate solutions in linear programming

models." Optimization and Engineering (2024): 1-24.

[5] Li, Beibin, et al. "Large language models for supply chain optimization." arXiv preprint

arXiv:2307.03875 (2023).

[6] Teixeira, Eduardo dos Santos, et al. "A review of mathematical optimization models applied to

the sugarcane supply chain." International Transactions in Operational Research 30.4 (2023):

1755-1788.

[7] Kolasani, Saydulu. "Blockchain-driven supply chain innovations and advancement in

manufacturing and retail industries." Transactions on Latest Trends in IoT 6.6 (2023): 1-26.

[8] Edunjobi, Tolulope Esther. "The integrated banking-supply chain (IBSC) model for FMCG in

emerging markets." Finance & Accounting Research Journal 6.4 (2024): 531-545.

[9] Yandrapalli, Vinay. "Revolutionizing supply chains using power of generative ai." International

Journal of Research Publication and Reviews 4.12 (2023): 1556-1562.

[10] Ibrahim, Yasir, and Dhabia M. Al-Mohannadi. "Optimization of low-carbon hydrogen supply

chain networks in industrial clusters." International Journal of Hydrogen Energy 48.36 (2023):

13325-13342.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240642

144

Environmental monitoring system design based on STM32

platform

Yuhe Tie1,2,*, Peiming Chen1,3

1College of computer and information science southwest university school of software

southwest university, No. 2 Tiansheng Road, Beibei District, Chongqing, China

21292270530@qq.com

32958128946@qq.com

*corresponding author

Abstract. This study addresses the current societal demand for environmental monitoring by

designing an environmental monitoring system based on the STM32 platform. This system

assesses and monitors environmental conditions in real-time by tracking parameters such as CO,

PM2.5, temperature, humidity, and light intensity. It holds significant value in preventing air

pollution and improving indoor air quality. The system employs four types of sensors: the

DHT11 digital temperature and humidity sensor, the BH1750FV light sensor, the

GP2Y1010AUOF optical dust sensor, and the MQ-7 CO sensor to collect environmental data,

which is then processed by the STM32F103C8T6 controller. This system is characterized by its

real-time capabilities, high precision, and low power consumption, making it highly practical

and valuable for widespread application. The paper provides a detailed discussion of sensor

selection, measurement algorithms, and system design and implementation, offering valuable

insights for research and applications in related fields.

Keywords: STM32, Environmental Monitoring, Multi-Sensor System, Modular Design.

1. Introduction

With the increasing environmental awareness, there is growing concern about environmental quality,

and people are eager to know whether their surroundings are safe and healthy. Therefore, designing a

system capable of real-time environmental quality monitoring is essential. Sensor technology is now

highly advanced, enabling precise measurement of various environmental parameters such as

temperature, humidity, CO, and PM2.5, providing a solid foundation for designing an environmental

monitoring system[1].

In many settings, such as homes, industries, healthcare, and public spaces, monitoring environmental

quality is necessary to ensure health and safety. Thus, designing an environmental monitoring system is

of practical necessity. Given these essential needs, creating a system capable of real-time environmental

quality monitoring is meaningful, providing significant protection and convenience to people and

promoting environmental protection efforts.

The environmental monitoring system based on the STM32 platform can monitor multiple

environmental parameters in real-time, such as temperature, humidity, air pressure, and light intensity,

and further analyze and display the results on an OLED screen. This system is characterized by high

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240656

(https://creativecommons.org/licenses/by/4.0/).

145

precision, strong real-time performance, fast response speed, ease of use, and high reliability, making it

applicable across fields such as meteorology, environment, agriculture, and industry, helping people

better understand and manage their environment[2]. In summary, this system is an efficient and practical

environmental monitoring device that provides timely and accurate environmental data, thereby

promoting environmental protection and sustainable development. The system's specific functions

include:

CO Monitoring: CO is a toxic and harmful gas, and high concentrations of CO can severely impact

human health. The CO monitoring function can track the CO concentration in the environment in real-

time, promptly identifying CO pollution and taking action to safeguard human health.

PM2.5 Monitoring: PM2.5 refers to particulate matter in the air with a diameter of 2.5 microns or

less, which can penetrate the respiratory system and pose significant health risks. The PM2.5 monitoring

function can track PM2.5 concentrations in real-time, enabling timely detection of pollution and

intervention to protect human health.

Temperature and Humidity Monitoring: Temperature and humidity are crucial parameters affecting

indoor comfort and human health. This function monitors environmental temperature and humidity

changes in real-time, helping users adjust indoor conditions to enhance comfort and health levels.

Light Intensity Monitoring: Light intensity is an important parameter affecting indoor lighting and

human circadian rhythms. This function monitors changes in environmental light intensity in real-time,

helping users adjust indoor lighting conditions to improve comfort and biological rhythms.

By monitoring these four environmental parameters, the system provides real-time insights into

indoor environmental changes, assisting users in adjusting indoor conditions to improve comfort and

health levels. Additionally, this data can be utilized in environmental pollution monitoring,

meteorological research, agricultural production, and other areas, offering broad application

prospects[3].

2. Overall System Design (Principle Diagram, Preliminary Sensor Selection)

2.1. Preliminary Sensor Selection

Sensors used in practical applications mainly fall into two categories: analog sensors and digital sensors.

Traditional analog sensors offer the advantages of fast measurement conversion speed and a wide

temperature measurement range. However, the analog signal processing of analog sensors is complex,

and during transmission, these signals are prone to electromagnetic interference, leading to errors. In

scenarios requiring multi-point temperature and humidity detection, differences in the wiring distances

from the measurement points to the testing device, as well as inconsistencies in the parameters of various

sensitive elements, can introduce errors that are difficult to eliminate. Additionally, the accuracy of

analog-to-digital conversion systems is inherently limited, exhibiting some non-linearity and poor

interchangeability. Using sensors with direct digital output can avoid these issues. Digital sensors can

convert the measured analog quantity directly into a digital output, which can be directly interfaced with

digital devices (such as computers or digital display systems) and processed by DSPs or computers.

These sensors possess high anti-interference capabilities, along with high measurement accuracy and

resolution, good stability, easy signal processing, transmission, and automatic control. They are also

conducive to dynamic and multi-channel measurement, offering intuitive reading, convenient

installation, simple maintenance, and high reliability. Despite the slower response speed and narrower

temperature measurement range, digital sensor technology has garnered increasing attention[4].

Considering the system's economic viability and the advantages and disadvantages of sensors, this

study opts for four integrated digital sensors.

2.2. Controller Selection

In the design of this environmental monitoring system, the STM32 series microcontroller was chosen

as the controller. The STM32 series microcontrollers are well-suited for IoT and embedded systems due

to their high performance, low power consumption, extensive peripherals, and excellent scalability[5].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240656

146

In this design, the STM32 microcontroller serves as the main controller, handling data acquisition and

processing via serial communication with various sensors. The processed data is then transmitted to the

upper computer through a module, enabling real-time monitoring and remote control of environmental

parameters. Additionally, the STM32 microcontroller can control other peripherals, such as an external

OLED display screen, which is used for displaying environmental parameter information and other

indicators.

2.3. Schematic Diagram

Figure 1. Overall Design Diagram Figure 2. Schematic Diagram

3. Sensor Working Principles, Measurement Algorithms, Circuits, and Installation Methods

3.1. Sensor Module Working Principles

3.1.1. DHT11 Digital Temperature and Humidity Sensor

The DHT11 sensor integrates a resistive humidity sensor and an NTC thermistor-based temperature

sensor. When powered, the internal circuits of the DHT11 sensor become operational. Communication

between the sensor and an external MCU occurs via a single-wire communication protocol. The external

MCU initiates the process by sending a start signal to the sensor, prompting it to begin collecting

environmental temperature and humidity data. The humidity and temperature sensors inside the DHT11

then simultaneously measure the environmental humidity and temperature values. The sensor's internal

digital signal processing module converts these measurements into digital signals, which are sent to the

external MCU through the single-wire protocol. The external MCU decodes and calculates the received

digital signals, ultimately outputting the environmental temperature and humidity values. It should be

noted that the measurement accuracy of the DHT11 sensor is generally low, with temperature

measurement errors reaching up to ±2℃ and humidity measurement errors up to ±5%RH. Additionally,

the sensor requires some time for stabilization and response, so calibration and delay handling may be

necessary in practical applications. This sensor can be used in HVAC, dehumidifiers, test and

measurement equipment, consumer goods, automotive, automatic control, data loggers, weather stations,

home appliances for temperature and humidity regulation, medical, and other related humidity detection

and control applications.

Figure 3. Internal Schematic and Physical Diagram of the DHT11 Digital Temperature and Humidity

Sensor

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240656

147

Table 1. Functional Parameters of the DHT11 Temperature and Humidity Module

Product Feature

Detects ambient humidity and temperature

Sensor

DHT11

Humidity

Measurement Range

20%-95% (0℃-50℃ range) with ±5% error

Temperature

Measurement Range

0℃-50℃ with ±2℃ error

Operating Voltage

3.3V-5V

Output Type

Digital output

Mounting Bolt Hole

Yes (Hole diameter 3.1mm, 10mm from edge)

PCB Size

32mm x 14mm

Power Indicator

Light

Red

Weight per Set

Approximately 8g

3.1.2. BH1750FV Light Sensor

The BH1750FV sensor integrates a photodiode and a digital signal processing chip internally. The

photodiode converts external light into an electrical signal, which is then amplified and filtered by the

digital signal processing chip before being converted into a digital signal. The BH1750FV sensor

supports various measurement modes and accuracy settings, allowing users to configure it according to

their needs. The sensor can operate in both continuous measurement and single measurement modes,

with mode and accuracy settings adjustable via the I2C interface. The measured light intensity values

are output to an external MCU through the I2C interface. The external MCU decodes and calculates the

digital signals received from the sensor to determine the environmental light intensity. It is important to

note that the BH1750FV sensor offers high measurement accuracy, with a resolution of up to 0.5 lx.

Additionally, the sensor has a rapid response time, providing stable light intensity measurements within

a short period.

Figure 4. Internal Schematic and Actual Image of the BH1750FV Light Sensor

Table 2. Basic Parameters of the BH1750 Light Intensity Sensor

Model: GY-302

Dimensions: 13.9mm × 18.5mm

Uses ROHM original BH1750FVI chip

Power Supply: 3-5V

Data Range: 0-65535

Built-in 16-bit ADC

Direct digital output, no complex calculations or calibration needed

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240656

148

3.1.3. Optical Dust Sensor (GP2Y1010AUOF)

The GP2Y1010AUOF is an infrared optical dust sensor that measures the concentration of dust in the

air and outputs an analog signal. The sensor integrates a pair of transmitter and receiver internally. The

transmitter emits infrared light, which passes through the airflow in front of the sensor, bringing dust

particles into the sensor. When dust enters the sensor, it scatters the infrared light, and some of the

scattered light is received by the internal receiver. The receiver then converts the received light signal

into an electrical signal, which is processed and amplified before being output.

The output signal of the GP2Y1010AUOF sensor has a linear relationship with the dust concentration,

allowing it to be converted into a dust concentration value after calibration and processing. Since the

sensor outputs an analog signal, it requires an ADC (Analog-to-Digital Converter) to convert it into a

digital signal for further processing and analysis.

It is important to note that the GP2Y1010AUOF sensor has specific performance requirements for

measuring dust particles, including its measurement range, sensitivity, and response time. In practical

applications, the sensor needs to be selected and calibrated based on the specific use case. The

GP2Y1014AU0F variant of this sensor is capable of detecting reflected light from dust particles,

including very fine particles such as tobacco smoke. It is commonly used in air purification systems and

can measure particles as small as 0.8 micrometers, detecting smoke from tobacco, pollen, and household

dust.

Figure 5. Internal Schematic and Physical Diagram of the BH1750FV and GP2Y1010AUOF Sensors

Table 3. Basic Parameters of the GP2Y1014AU Dust Sensor

Power Supply Voltage: 5-7V

Operating Temperature: -10 to 65°C

Current Consumption: 20mA (Maximum)

Small Particle Detection Threshold: 0.8µm

Sensitivity: 0.5V/(0.1mg/m³)

Voltage in Clean Air: 0.9V (Typical)

Operating Temperature: -10 to 65°C

Storage Temperature: -20 to 80°C

Lifespan: 5 years

Dimensions: 46mm × 30mm × 17.6mm

3.1.4. CO (MQ-7) Detection Sensor

The MQ-7 is a carbon monoxide (CO) gas sensor based on the principle of thermal conductivity, capable

of measuring the concentration of CO gas in the air and outputting an analog signal. Inside the MQ-7

sensor, there is a heating electrode that heats the surrounding air. The heated air creates convection,

making it easier for the CO gas to be absorbed by the sensor.

Additionally, the MQ-7 sensor contains a CO-sensitive electrode, which is coated with a catalyst

layer that can adsorb CO gas. When CO gas is adsorbed onto the surface of the sensitive electrode, it

causes a change in the electrode's resistance. The internal circuitry of the sensor uses specific algorithms

to calculate the concentration of CO gas and converts it into an analog output signal. The output signal

voltage is directly proportional to the CO gas concentration.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240656

149

The sensor's output signal needs to be converted into a digital signal via an AD converter, and then

processed and analyzed by a microcontroller or similar control device. Typically, calibration and

adjustment of the sensor are necessary to ensure the accuracy and stability of its output signal. It is

important to note that the MQ-7 sensor operates within a temperature range of 0℃ to 50℃ and under a

relative humidity below 95% RH. In practical applications, it is crucial to select the appropriate sensor

and implement necessary environmental controls and calibration based on specific application scenarios.

Figure 6. CO Sensor Wiring Diagram

Table 4. MQ-7 Basic Parameters

Functionality Achieved: Testing program included with this version

Chip Used: AT89S52

Crystal Oscillator: 11.0592 MHz

Baud Rate: 9600

Electrical Performance:

Input Voltage: DC 5V

Power Consumption (Current): 150mA

DO Output: TTL digital signal 0 and 1 (0.1V and 5V)

AO Output: 0.1-0.3V (no pollution) high concentration voltage approximately 4V

Standard Testing Conditions:

Temperature: 20°C ± 2°C

Humidity: 65% ± 5% RH

Standard Testing Circuit: Vc:5.0V 士0.1V;VH:5.0V 士0.1V

3.2. Sensor Module Measurement Algorithms

3.2.1. MQ-7 CO Sensor Measurement Algorithm

The MQ-7 CO sensor detects CO concentration through thermal conductivity. The measurement

principle of this sensor is based on the chemical reaction between combustible gases and oxygen. By

measuring the magnitude of the current generated after the reaction, the CO concentration is determined.

The measurement algorithm for this sensor typically uses linear interpolation to convert the sensor's

output voltage into CO concentration values, achieving accurate CO concentration measurements.

3.2.2. BH1750FV Light Sensor Measurement Algorithm

The BH1750FV light sensor employs a specialized optical sensing technology to accurately measure

ambient light intensity. The measurement algorithm for this sensor usually involves calibration methods,

converting the sensor's ADC output values into light intensity values. During calibration, external light

sources and photodiodes are used to determine the sensor's sensitivity and calibration parameters,

thereby improving measurement accuracy.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240656

150

3.2.3. DHT11 Digital Temperature and Humidity Sensor Measurement Algorithm

The DHT11 digital temperature and humidity sensor uses a humidity-sensitive element and a

temperature sensor to measure environmental temperature and humidity. The measurement algorithm

typically involves CRC checking, verifying and processing the sensor's output data to enhance data

reliability and accuracy. During the verification process, raw humidity and temperature data are

converted into actual humidity and temperature values for precise measurement of environmental

conditions.

3.2.4. Optical Dust Sensor GP2Y1010AUOF Measurement Algorithm

The GP2Y1010AUOF optical dust sensor utilizes a scattering optical principle to measure

environmental dust concentration. The measurement algorithm commonly employs pulse counting

methods, converting the sensor's output pulse signal into dust concentration values. During the

conversion process, numerical transformation and calibration are performed based on the sensor's

characteristics and calibration parameters to enhance measurement precision and reliability.

4. Control System Design (Simple Software and Hardware Block Diagram)

Microcontroller: The system uses an STM32 series microcontroller to achieve sensor data collection,

processing, and storage through programmed software. Timer interrupts are used for periodic data

collection, and the ADC module converts analog signals to digital form for data processing and storage.

The STM32F103C8T6 microcontroller, based on a 32-bit ARM core, features 64 or 128K bytes of flash

memory, USB, CAN, seven timers, two ADCs, and nine communication interfaces. Its key

characteristics include strong anti-interference capabilities, making it widely applicable in household

appliances, industrial control, instrumentation, security alarms, and peripheral computer devices. The

STM32 microcontroller, known for its robust arithmetic processing capabilities, flexible software

programming, low power consumption, compact size, low cost, and mature technology, is extensively

used in various fields. Given our familiarity with this chip, we have chosen the STM32 for the system

control section.

Figure 7. STM32 Pinout Diagram

User Interface: The system employs an OLED screen to display monitored environmental data,

allowing users to intuitively view environmental monitoring data and trends.

Sensor Connection Method: The system utilizes the DHT11 digital temperature and humidity

sensor, BH1750FV light sensor, GP2Y1010AUOF optical dust sensor, and MQ-7 CO sensor for

environmental monitoring. The DHT11 and GP2Y1010AUOF sensors are directly connected to the

GPIO ports of the STM32 microcontroller. The BH1750FV sensor is connected via the I2C interface,

and the MQ-7 sensor is connected through the analog input port.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240656

151

Power Supply: The system uses a 5V voltage regulator and an external power adapter to ensure

stability and reliability.

Overall, the control system design of this environmental monitoring system has thoroughly

considered various factors, enabling efficient and stable operation and accurate and reliable data

collection and processing.

The control system of the environmental monitoring system is designed for efficient and stable

operation, ensuring that the system operates reliably through proper circuit design and programming,

while guaranteeing the accuracy and reliability of monitoring data. Data collection is achieved with

appropriate sensor measurement algorithms to ensure accurate and reliable environmental monitoring

data, providing a solid foundation for subsequent data analysis and applications. Monitoring data is

displayed on the OLED screen, allowing users to clearly understand environmental data and trends. In

summary, the control system of this environmental monitoring system is aimed at efficient and stable

operation, ensuring the accurate and reliable collection and processing of environmental monitoring data,

and presenting it in a user-friendly manner to meet various data transmission needs.

5. Conclusion

This environmental monitoring system, based on the STM32 platform, integrates monitoring functions

for CO, PM2.5, temperature and humidity, and light intensity. It employs high-precision sensors and

optimized measurement algorithms to ensure accurate and reliable data collection and processing. The

control system uses an STM32 microcontroller, equipped with an OLED screen and various data

transmission methods, to provide an intuitive user interface and diverse data transmission options.

Experimental validation confirms the system's advantages in real-time performance, high precision, and

low power consumption, demonstrating its broad application and promotional value. Future

improvements will focus on enhancing design and performance, increasing system flexibility and

scalability, and providing more comprehensive technical support for the environmental monitoring field.

The system can be widely applied in households, industrial settings, medical environments, and public

spaces to help monitor and improve indoor and workplace environments, thus enhancing quality of life

and health levels.

Future Prospects: Consider incorporating additional environmental parameters such as noise and

vibration to meet diverse monitoring needs. Further optimization of the control system design could

enhance scalability and reliability, enabling additional functions and application scenarios. Integration

with IoT technology for remote monitoring and control can be explored to offer more application

scenarios in smart cities and smart homes.

Improvements: While the system currently collects and stores various environmental parameter data,

future development will focus on data processing and analysis to extract valuable information and

patterns. Incorporating technologies such as artificial intelligence and big data to build models and

algorithms for analyzing and predicting environmental data can increase the application value of

monitoring data.

System Reliability and Maintenance: The reliability and maintenance of the environmental

monitoring system are crucial for future development. Technologies such as automatic calibration and

fault diagnosis can be introduced to enhance system reliability and stability. Maintenance can be

streamlined through remote upgrades and automatic fault alarms, reducing manpower and time costs.

Application Scenarios and Market Prospects: The current application scenarios of environmental

monitoring systems include smart homes, industrial production, and urban environmental protection.

There is a broader market potential for future applications. Market research and technological innovation

can uncover new application scenarios and business models, expanding the system's market prospects.

System Security: Given the sensitivity of the data collected by the environmental monitoring system,

ensuring system security is essential. Measures such as encryption and access control can be

implemented to protect data confidentiality and integrity, preventing data leaks and tampering.

In conclusion, continuous technological innovation and application expansion are necessary for the

future development of this environmental monitoring system. Enhancing system performance and

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240656

152

reliability while addressing market trends and application scenarios will contribute significantly to the

advancement of the environmental monitoring field.

Yuhe Tie and Peiming Chen: Conceptualization, Methodology, Software, Resources, Writing-

Original draft preparation, Visualization, Investigation.

References

[1] Sengupta, A., & Sharma, P. (2020). Design and Development of Environmental Monitoring

System Using IoT Technology. International Journal of Engineering Research & Technology,

13(2), 45-50. https://www.ijert.org/

[2] Khan, M. M., & Qureshi, I. A. (2018). STM32 Microcontroller for Industrial Applications.

International Journal of Computer Applications, 182(29), 26-32. https://www.researchgate.

net/

[3] Zhang, Z., Liu, X., & Xu, L. (2019). A Review of Environmental Monitoring Technologies for

IoT Applications. Sensors, 19(24), 5411. https://www.mdpi.com/

[4] Wang, Y., & Xie, M. (2021). Smart Environmental Monitoring Systems Using Sensor Networks:

A Survey. Sensors, 21(10), 3401. https://www.mdpi.com/

[5] Kang, S., Kim, Y., & Park, S. (2021). “Design and Implementation of an IoT-Based Temperature

and Humidity Monitoring System Using STM32 Microcontroller.” Sensors, 21(1), 141.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240656

153

Spacecraft design for interstellar travel

Leyan Ouyang

Department of Physics, King’s College London, London, WC2R 2LS, United

Kingdom

k22038659@kcl.ac.uk

Abstract. This paper endeavours to introduce and elucidate potential mechanisms aimed at the

development of a spacecraft that is not only sustainable but also optimized for interstellar travel

to the Andromeda galaxy, with the primary objective of scouting for potentially habitable

exoplanets suitable for human colonization. The conceptual framework encompasses a

comprehensive analysis of various critical components essential for the functionality and

longevity of the spacecraft, including but not limited to propulsion systems, attitude control

mechanisms, and advanced navigation systems. In addition, habitable areas which are also

called Goldilocks’ zones refer to the areas around stars where planetary conditions are

conducive to foster lives. As the existence of liquid water is known as the fundamental

prerequisite for supporting life, the basic criteria for a habitable planet is temperature

appropriate for water sustaining. The final analysis shows that space immigration is a

determined consequence of human being and people should take effective measures to

investigate future possible ways of immigrating to another planet.

Keywords: Interstellar travel, Space vehicles, Navigation, Habitability.

1. Introduction

A breakthrough in astronomy was the development of the theory of stellar evolution throughout the

20th century. It was Ejnar Hertzsprung and Henry Norris Russell who first plotted the stars’

temperatures against their brightness [1]. This method stimulated the investigation of stellar evolution

theories. The discovery that stars evolve over time became the primary impetus for future interstellar

travel projects. As a main sequence star ages, it uses up all its hydrogen in the core and starts fusing

helium. During this process, the star will go through a bunch of changes in its internal structure. As a

result, the star becomes hotter, larger, and brighter as it leaves its main sequence phase [2]. The sun is

currently in its middle age of main sequence life, so it will become hotter and much larger as a red

giant. Unfortunately, earth's orbit will be swallowed by that future red giant, and thus human beings

need to find a new home before it happens. This implies that human must address the question of how

to travel between planets in different solar systems.

In this paper, I will introduce several important questions to consider when designing spacecrafts

capable of intergalactic travel, as well as methods to address them. How will the spacecraft hold

enough power for long term travel? How will the spacecraft control its orientation? How will the

spacecraft determine its position and velocity relative to celestial objects? How can the spacecraft

identify solar systems to refuel at during the trip, and find habitable planets to immigrate to? Possible

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

(https://creativecommons.org/licenses/by/4.0/).

154

Answers to all the above questions will be discussed in this paper. Most the spacecraft’s energy supply

will come from photoelectric panels. Photoelectric panels can convert solar radiation to usable electric

energy. The attitude control system is based on the solution to cat falling problem where the spacecraft

can turn without an external force exerting on it. The navigation system relies on telescopes that can

observe radiations from other star systems to determine the identity and position of the star relative to

the spacecraft.

2. Design Overview

The objectives of the design of energy, attitude control and navigation systems on the spacecraft are to

be sustainable for long distance interstellar travel. This assumes of taking the solar system as an

example of any generic star-planetary system that the ship will encounter during the journey. On

another hand, as star-planetary systems are extremely sparsely scattered in the universe, compared to

their sizes. Thus, when the spacecraft is travelling through interstellar space, it can glide through with

very little energy consumption without the need to accelerate, decelerate or steer.

3. Power Source

3.1. Introduction of Power Source

The power system is inevitably one of the most important parts of the spacecraft. To be sustainable for

long distance journeys, the power system of the spacecraft must be renewable and durable for long

travel distance. Although sending human astronauts far away require great amount of energy, the

universe itself can surely shoulder the energy supply as it is where all types of high energy objects

exist and output power flux into the surrounding space. The only real problem about power supply is

the way to make use of the nearly unlimited energy from the universe. The total energy is conserved in

the space, but scattered in various forms. The challenge of a long-term travel spacecraft is to gather

enough energy when approaching a nearby power source and to be able to consume less when drifting

through the vast space between two sources. This idea is the core to the design of power system. The

main strategy for lengthening the life span of power usage is to gather more energy from the starting

point and consume less until other energy source is reached, so that energy provide is enough for

another long-term travel. Figure 1 shows a circular relationship between “energy gathering” and

“energy consuming”. The spacecraft is going to work in a combination of the two statuses.

3.2. Energy Gathering

Figure 1. The visual circulation between energy gathering and energy consuming.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

155

Figure 2. Schematic diagram of the solar system where the astronomical objects are presented to scale.

The vital problem for energy gathering is fuels. Though the universe serves as the most abundant

energy reservoir, it is also vast and vacant in most space which raises the difficulty of finding available

energy source. Large amount of non-renewable energy exists as patches of liquid or solid. They

attracted each other and formed planets that occupied a very tiny space. In fact, the percentage is so

small that it is even hard to put the planets into a graph by scale. Figure 2 is a schematic diagram of the

solar system where the astronomical objects are presented to scale. The radius of the cylinder is thirty

arbitrary units which represents the distance between the outermost planet of the solar system,

Neptune, and the sun. However, it is even hard to see the biggest object in solar system, the sun, in this

cylindrical space. All that can be seen is vacancy. However, by magnifying the radius of the planets

and contracting the actual lengths between them, a presumable image of the solar system which is not

to scale can be drawn, in Figure 3 for instance.

Figure 3. Schematic diagram of the solar system where the astronomical objects are not to scale.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

156

Table 1. Sizes and distances of the solar system [3].

Astronomical

Object

Distance to the sun

(AU)

Distance to the sun

(km)

Diameter (km)

Sun

1,391,400

Mercury

0.39

57,900,000

4,879

Venus

0.72

108,200,000

12,104

Earth

149,600,000

12,756

Mars

1.52

227,900,000

6,792

Jupiter

5.2

778,600,000

142,984

Saturn

9.54

1,433,500,000

120,536

Uranus

19.2

2,872,500,000

51,118

Neptune

30.06

4,495,100,000

49,528

The measured sizes of stellar objects and distances between them are listed in table 1. Using Table

1, it is simple to estimate the ratio between the volume of the astronomical objects and the volume of

vacancy in the solar system, represented by R. The asteroids and other astronomical units are relatively

little and hard to add together, so they are neglected in the calculation.

  

 (1)

  

󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜 (2)

   (3)

The number implies that there is approximately one cubic kilometer of real substance in a million

cubic kilometers of vacant space on average in solar system. In the vast universe this represents the

difficulty of finding non-renewable energy like coal and natural gas. Comparing to other types of

energy, solar energy is the easiest to find under such conditions. This implies that travel is going to

rely mostly on collected solar energy.

Figure 4. Structure of the outer layer of the space craft and the photovoltaic panel.

Photovoltaic technology is the most often used technology that can convert solar radiation into

electrical energy [4]. To absorb the stellar radiation, the outer most layer of the space craft will be

covered in photovoltaic panels which can be exposed to the sun. However, while traveling the outer

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

157

most layer of the space craft can possibly be deteriorated by other forms of disturbance. Solely

constructing the outer layer of the space craft by photovoltaic panels can expose the panels to potential

hazardous collision and lead to fatal consequences. To protect the photovoltaic panels, the outer layer

should be flexible which exposes the huge layer of photovoltaic panels to the star radiation whenever

conditions are met for solar energy collection. Figure 4 gives a visual graph of a possible designation

of the solar panels and the protective outer layer of space craft which the outer layer moves away

while leaving the solar panels facing the star.

Figure 5. A model for photovoltaic panels around the space craft.

The spaceship can take any orientation in space and has a spin itself so that radiation of a star can

come from multiple directions when the spacecraft is trying to stay in orbit with a star. The design of

space craft must put several photovoltaic panels in each direction to absorb the energy from different

direction. Another reason for such a design is for the consideration of a multiple star system. If the

journey has encountered a multiple star system, solar radiation can come from multiple directions.

Assuming that the space craft is in the shape of an ellipsoid, there should be panels at all angles.

Figure 5 provides a possible model for the arrangement of photovoltaic panels. There are sixteen

photovoltaic panels in total with 2 main panels covering the top and bottom and others staying around.

3.3. Energy consuming

The gathered energy from the stars can be consumed for several different use. The major expenditure

of consumable energy is propulsion. The spacecraft need propulsion system to accelerate the body and

switch directions. Other expenditures of energy include daily supply of electricity and other operating

systems on the space craft. The consumption of energy is hard to evaluate by exact numbers due to the

high uncertain conditions of space traveling. For instance, the actual distances to travel between stars

have huge uncertainties. The number of lives and resources on the journey is also a question that must

not be overlooked because the mass of the spaceship has an indispensable effect on energy

consumption. There are too many indeterminable things while trying to evaluate energy consumption.

However, one thing to be sure of is the capacity of stored energy should be big enough for long term

journey.

4. Attitude Control System

Changing altitude in space is a hard task especially when the spacecraft is trying to work in space.

However, it is vital in space traveling to avoid crashing into a star or with an astronomical object.

Losing air as medium implies that there is no outer force that can help the spacecraft to switch

directions. The key techniques to the design of the direction system is hidden behind the falling cat

problem.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

158

Figure 6. Photo in 1969 that helped explain the falling cat problem [5].

In 1882, French Scientist John Marley invented the first Chronophotographic Gun, and he posted a

video of cat falling to the public. This raises scientists’ attractions to cat falling problems. Cats can

automatically adjust their positions in the space and land on their claws during the process of falling to

the ground no matter which way they faced. The cat falling phenomenon was incomprehensible while

considering the law of conservation of angular momentum. It is impossible for a cat to change their

directions in the sky without an external torque acting on it. It was not until 1969 that Thomas Kane,

an engineer from Stanford, discovered the physical theory behind the phenomenon. He presented the

solution to the problem by splitting the cat’s upper and lower body into two cylinders and modeling

the fall using computer [6]. Figure 6 is the photo that helped NASA to investigate the problem of

astronauts switching direction in space in 1969.

Figure 7. Model that simulates the spacecraft’s initial status before direction changes.

The trick that cats use when they are falling is the way their body react. Firstly, they tighten their

upper legs to their bodies and extend their lower legs. While the upper body spins for, for instance,

190 degrees clockwise, the downer body must spin counterclockwise for 10 degrees due to the

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

159

conservation of angular momentum. In the next step, they tighten their lower legs to their body and

spin their downer bodies for 190 degrees in clockwise. Similarly, their upper bodies must spin

counterclockwise for 10 degrees to cancel out the torque. In this way they can maintain the

conservation of angular momentum without being subjected to external torque [7]. The rule could also

be applied to the orientation controlling system of the space craft. When the space craft is intended to

adjust its orientation, the front part of the spacecraft decrease its moment of inertia by contracting the

radius of spacecraft and move an angle that’s a little greater than the intended angle while the back

part of the spacecraft increase its moment of inertia and turn in the opposite direction in a small angle.

After the front part gets to its later position, it increases its moment of inertia, and the back part

decreases its moment of inertia and to turn the back body to the intended angle. The front part of the

body will experience a torque in the opposite direction and finally get to the intended angle. Figure 7 is

a monitored model that simulates the spacecraft’s initial status before altitude changes. The entire

process of changing altitude in space is monitored. Figure 8 presents the first working process of

altitude controlling. Figure 9 shows the second working process of altitude controlling. Figure 10

gives the third working process of altitude controlling. Figure 11 demonstrates the final working

process of altitude controlling.

Figure 8. The first working process of altitude controlling.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

160

Figure 9. The second working process of changing directions.

Figure 10. The third working process of changing directions.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

161

Figures 11. The fourth working process of changing directions.

The final angle that the spacecraft has turned is given by equation (4).

  (4)

5. Navigation System

In space, it is essential to know where the spacecraft is located and where the spacecraft is heading to.

The way people determine where they are is by looking at signs and landmarks around. However, it’s

not simple to locate where exactly the spacecraft is at in space where it is hard to see things other than

asteroids and some close astronomical objects by eye. The key to determine the location is to find

“landmarks” as objects of reference. The most effective objects of reference in space are stars. Stars

emit strong radiations that can be captured by telescopes. The vast space itself provides eminent

conditions for telescopes to work. Telescopes can be set up to face different directions of the sky and

determine which direction is the radiation coming from. The precondition of using stars as

“landmarks” is that radiation from different stars do differ from each other so that can be recognized.

In fact, the precondition is already discovered to be true.

Figure 12. Absorption and emission spectra of different elements [8].

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

162

Different stars have different compositions which distinguish the characteristic features in their

spectrum. When radiation from the star enters a pile of cold gas, elements in the gas may block a

certain number of frequencies and absorb light. A pile of hot gas may also emit radiation with

characteristic emission lines as the emission spectrum. Absorption and emission are keys for scientists

to determine the compositions of a star. Figure 12 provides the absorption and emission spectrum for

several elements.

Figure 13. Optical spectra for Proxima Centauri at 1 AU from the star [9].

Figure 13 is the optical spectra for Proxima Centauri at 1 AU from Proxima Centauri. For instance,

if the spacecraft is 1 AU from Proxima Centauri, the telescope is probably going to discover a similar

spectrum from the direction where Proxima Centauri stays. Similarly, telescopes will capture spectrum

from other stars no matter where the spacecraft stays at to navigate the spacecraft.

6. Habitable Zones

It is essential for the spacecraft to maintain at a radiation level where the outer shell of the spacecraft

can provide shelter from the radiation. That is why a safety zone must be calculated before reaching

the star system. Habitable zone is a reasonable level for the spacecraft to stay at. To make the

photoelectric planes on the spacecraft to gain enough star radiation from the star to turn into energy

and at the same time to gather information from more Earth-like planets for possible future

immigration and more resources.

The habitable zone of a star means that the equilibrium temperatures of the planets surrounding the

star in that certain range of distance are possible for water to maintain as liquid on the planets. To

account for this and other greenhouse effects, the range of equilibrium temperature of a planet inside

the habitable zone should in between 175K to 270K [10]. The stellar luminosity  is given by the

Stefan-Botltzmann law and flux  reaching the planet from its distance to the star d.  and  are

radius and surface temperature of the star respectively and  is the Boltzman’s constant.

  (5)



 󰇡

󰇢 (6)

When a planet reaches its equilibrium temperature, the planet has the same amount of absorbed

radiation and emitted radiation [11]. Which means that the rate of energy absorption is equal to the

rate of energy emission. The following equations give the absorption rate and re-radiation rate. A and

 are the albedo and radius of the planet.  is the equilibrium temperature of the planet.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

163



  󰇛󰇜 󰇛󰇜󰇡

󰇢 (7)



   (8)

By combing the equation (7) and equation (8), the function of equilibrium temperature can be

expressed by equation (9).

  󰇛󰇜 󰇡

󰇢  (9)

A function for the distance to the star, d, is derived in equation (10).

 

󰇛󰇜 

 (10)

Figures 14-16 provide an estimation for both outer and inner boundaries of habitable zones.

Figure 14. Outer and Inner Boundaries of Habitable Zones in respect to Star’s Radius and

Temperature.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

164

Figure 15. Outer and Inner Boundaries of Habitable Zones in respect to Star’s Radius.

Figure 16. Outer and Inner Boundaries of Habitable Zones in respect to Temperature.

In Figure 14-16, The figure of distance from the star to outer and inner boundaries of habitable

zones in respect to radius and temperature of the star. The upper plane shows the distance from the

stars to the outer boundaries, and the lower plane shows the distance from the stars to the inner

boundaries. The range of habitable zones increase as the radius and temperature of the star increase.

However, the inner boundary of habitable zone goes farther way from the star as radius and

temperature increase. This implies that for a growing main sequence star the habitable zone of the star

will gradually move farther away from the star. Which also implies that Earth will one day be

inhabitable for human to live. Immigration and space travel are a determined future task.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

165

7. Conclusion

Space immigration will be an unavoidable topic in the future for long-term development of

humankind. According to part four, the boundaries of the habitable zone of a main sequence star will

expand as it ages and its temperature and radius will increase. During the process of stellar evolution,

the Earth will one day be out of the habitable zone of the sun. This will result in the mass extinction of

all live on Earth if humans don’t have the ability to travel to other stellar systems by then. As

technology develops, interstellar travel will no longer be purely theoretical. This paper established

some potential designs of future interstellar travel programs and explained some of the theories behind

them. Future investigations can be based on each subpart of the paper to create a spacecraft suitable

for interstellar travel. This paper provided several solutions to the problems central to interstellar

travel, as well as methods to monitor the boundaries of the habitable zones of stars. The proposed

solutions to power, attitude control and navigation systems include uses of photoelectric panels,

solving the falling cat problem and identification of nearby stars. The approaches outlined in this paper

to construct a viable spacecraft capable of intergalactic travel are important because travel between

galaxies will be unavoidable if human beings wish to survive in this universe long-term.

References

[1] Christensen, L.L., Hainaut, O., & Pierce-Price, D.P. (2014). What Determines the Aesthetic

Appeal of Astronomical Images. CAPjournal. No.14: 20–24.

[2] Perryman, M. (2021). Stellar Structure and Evolution. Fundamentals of Astrophysics. Choice

Reviews. https://doi.org/10.5860/choice.27-6327

[3] California Institute of Technology. (2019). Solar System Sizes and Distances Reference Guide.

https://www.jpl.nasa.gov/edu/pdfs/scaless_reference.pdf

[4] Fares, M.A., Atik, L., Bachir, G., & Aillerie, M. (2017). Photovoltaic panels characterization

and experimental testing. Energy Procedia, 119, 945-952.

[5] Crane, T. (2021). The ‘falling cat’ phenomenon that helped NASA prepare astronauts for zero g

ravity, 1969. https://www.libraryhistt.com/2022/10/the-falling-cat-phenomenon-that-helped.

html

[6] Kane, T.R., & Scher, M. (1969). A dynamical explanation of the falling cat phenomenon.

International Journal of Solids and Structures, 5, 663-670.

[7] Essén, H., & Nordmark, A.B. (2018). A simple model for the falling cat problem. European

Journal of Physics, 39.

[8] M.Richmond, Rochester Institute of Technology. spiff.rit.edu/classes/phys301/lectures/spectra/s

pec_rev _orientation.gif

[9] Meadows et al. (2018). Proxima Centauri Spectrum. vpl.astro.washington.edu/spectra/stellar/pro

xcen.htm

[10] Kaltenegger, L., & Sasselov, D.D. (2011). EXPLORING THE HABITABLE ZONE FOR

KEPLER PLANETARY CANDIDATES. The Astrophysical Journal Letters, 736.

[11] Del Genio, A.D., Kiang, N.Y., Way, M.J., Amundsen, D.S., Sohl, L.E., Fujii, Y., Chandler,

M.A., Aleinov, I., Colose, C.M., Guzewich, S.D., & Kelley, M. (2018). Albedos,

Equilibrium Temperatures, and Surface Temperatures of Habitable Planets. The

Astrophysical Journal, 884.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240518

166

Review on application of fractional Fourier transform in

Linear Frequency Modulation signal and communication

system

Zhuoran Wang

Leicester International Institute, Dalian University of Technology, Panjin, 116024,

China

2076334726@qq.com

Abstract. Traditional Fourier transform often apply to analyze and process stationary signals,

however, it is weak for time-varying non-stationary signals, and fractional Fourier transform

(FRFT) can better solve such problems. The FRFT can be comprehended as the expressive

methods on the fractional Fourier domain constituted by the spinning coordinate axis of the

signal anticlockwise about the origin at arbitrarily Angle in the time-frequency plane. In this

paper, the improved fractional Fourier transform is combined with other calculation methods to

achieve high precision estimation of chirp signal parameters. And the communication system

built on weighted fractional Fourier transform and discrete fractional Fourier transform is

studied and simulated respectively, which verifies the feasibility and improves the anti-jamming

and anti-interception ability of the communication system.

Keywords: communication system, fractional Fourier transform, chirp signal.

1. Introduction

Namias first proposed the theory of fractional Fourier transforms in 1980. He came up with this idea

from a mathematical point of view and applied it to the solution of differential equations. Then Mcbride

et al. made a stricter definition based on Namias and expressed the fractional Fourier transform in

integral form [1]. In 1993, Mendlovic and Ozaktas broke through the boundaries of mathematical

research and implemented fractional Fourier transforms with optical methods, which have been widely

used in optical signal processing [2]. However, because fractional Fourier transform has no strict

physical meaning and fast implementation algorithm, it has a lot of potential in the area of signal

processing, but it can not be fully utilized. In 1993, Almeida clarified its physical meaning, that is,

fractional Fourier transform is the traditional Fourier transform to do a certain Angle rotation in the

time-frequency plane, which essentially includes the information of the signal in the time domain and

the frequency domain, so it is a time-frequency analysis method [3]. In 1996, Ozaktas and other scholars

proposed a discrete algorithm of fractional Fourier transform, which has a very small computational

load, only equivalent to that of Fast Fourier Transform Algorithm (FFT) [4]. Since then, fractional

Fourier transform has drawn the attention of scholars in the area of signal handling at home and abroad,

plenty of study results have gradually emerged. Compared with traditional Fourier transform, fractional

Fourier transform is more flexible and has been applied in many aspects. Such as time-frequency

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240504

(https://creativecommons.org/licenses/by/4.0/).

167

analysis, time-frequency filtering, quantum mechanics, artificial neural networks, sweep filters, optical

image processing, etc.

In 1981, French geophysicist Morlet found in the analysis of artificial seismic exploration signals

that such signals should have a high resolution in the low frequency band, but the frequency resolution

can be low in the high frequency band. It is because of this feature of seismic signals, Morlet proposed

the concept of wavelet transform and gave a definition [5]. The traditional Fourier transform handle

non-stationary signals has flaw. It can only know what frequencies a signal includes in usual instead of

confirming when each component appears. So two signals which are very disparate in time domain may

have the same spectral pattern. Since the signal can have different resolution in different positions of the

time-frequency domain plane after wavelet transform, the signal can be analyzed by wavelet transform

in multi-resolution. Because of its multi-resolution characteristics, the signal has very good

time-frequency localization characteristics, which can make the signal from coarse to fine, more

convenient for signal analysis and observation. This overcomes the shortcomings of the traditional

Fourier transform. Thus, in recent years, wavelet transform not only has significant theoretical research

results, but also has applied to lots of engineering fields, such as signal processing, speech recognition,

image processing analysis, analytical chemistry, biomedicine, etc. Scientists' research on wavelet

transform has not stopped because of the wide application, and the continuous in-depth study of its

theory will bring new applications to various fields.

This paper presents some fundamental theorems about Fourier transform, and studies some recent

applications of fractional Fourier transform in secure communication and parameter estimation of chirp

signals.

2. Relevant theory

2.1. Definition

f(t) is a periodic function of t if t satisfies the Dirichlet condition: If f(x) is continuous or has only a finite

amount of discontinuities of the first kind in a period of 2T, and f(x) is monotonic or can be separated

into finite monotonic intervals, then the Fourier series of F (x) with period of 2T converges, the function

S (x) is also a periodic function with period of 2T, and it is finite at these discontinuities; It has a finite

amount of extreme points in a period; Absolutely integrable.

The fourier transform of x(t):

󰇛󰇜󰇟󰇛󰇜󰇠󰇛󰇜



 dt (3.1.1)

Inverse transform:

󰇛󰇜1󰇟󰇛󰇜󰇠 1

2󰇛󰇜



 d (3.1.2)

󰇛󰇜 : the image function of 󰇛󰇜

󰇛󰇜 : the preimage function of 󰇛󰇜

2.2. Deduction

(1) Fourier serise of Periodic function are defined as (3.2.2):

󰇛󰇜2



 (3.2.1)

󰇛󰇜0

2 󰇡2

2

󰇢



1

(For real-valued functions) (3.2.2)

Fourier expansion coefficient:

1

󰇛󰇜2

2

2 (3.2.3)

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240504

168

periodic signals can be expanded into a Fourier series only if the Dirichlet condition is satisfied.

The Dirichlet condition is defined as follows:

①A continuous or finite amount of discontinuity points of the first kind during a period.

②The quantity of maximum and minimum values in a period should be finite.

③Within a period, the signal is absolutely integrable.

Now assume that a function f(t) is made up of a direct current(DC) component and several cosine

functions, as shown in equation (3.2.4).

󰇛󰇜0󰇛󰇜



1 (3.2.4)

Using the sum difference product formula of trigonometric functions, the above equation can be

deformed to (3.2.5):

󰇛󰇜0󰇟󰇛󰇜󰇛󰇜󰇠



1 (3.2.5)

Assume  is:

ancncos  (3.2.6)

bncnsin  (3.2.7)

Then formula (3.2.4) can be written:

󰇛󰇜󰇟󰇛󰇜󰇛󰇜󰇠



1 (3.2.8)

Formula (3.2.8) is actually an expansion of the Fourier series, and it can be seen that if you want to

expand a periodic signal into the Fourier series form, you are actually determining the series .

Multiply both sides of equation (3.2.8) by an 󰇛󰇜 and integrate them over one period.

󰇛󰇜󰇛󰇜



󰇛󰇜



󰇛󰇜󰇟󰇛󰇜󰇛󰇜󰇠



 





(3.2.9)

Equation (3.2.9) can be further simplified as:

󰇛󰇜󰇛󰇜



0󰇛󰇜2



0

2 (3.2.10)

So it can be concluded that:

2

󰇛󰇜󰇛󰇜



0 (3.2.11)

in the same way:

2

󰇛󰇜󰇛󰇜



0 (3.2.12)

(2) discrete-time Fourier transform (DTFT)

For a sequence of numbers with domain Z, let 󰇝󰇞

 be one of the series. DTFT can be defined

as:

󰇛󰇜=



 (3.2.13)

Inverse transform:

1

2󰇛󰇜



 (3.2.14)

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240504

169

DTFT is discrete in time domain and periodic in frequency domain. It is usually applied to analyze

the spectrum of discrete-time signals. DTFT is viewed as the inverse of Fourier series.

(3) Fractional fourier transform

In the time-frequency plane, the fractional Fourier transform actually represents the

counterclockwise rotation of the coordinate axes to obtain the fractional Fourier domain

Equivalent relationship:

 (u is fractional fourier axis) (3.2.15)

 (3.2.16)

The fractional Fourier transform 󰇛󰇜of the signal x(t) is defined as:

󰇛󰇜󰇛󰇜󰇛󰇜



 󰇱󰇛󰇜󰇛󰇜



 󰇛󰇜

󰇛󰇜󰇛󰇜

(3.2.17)

P: the order of fractional Fourier transform



: rotation angle

󰇛󰇜: kernel function

Inverse transform:

󰇛󰇜󰇛󰇜󰇛󰇜



 (3.2.18)

If the Fourier transform of a function 󰇛󰇜 can satisfy the following form:

󰇟󰇛󰇜󰇠󰇛󰇜 (3.2.19)

F: Fourier transform operator

󰇛2): eigenvalue

Hermite-Gaussian function (common fourier function):

󰇛󰇜󰇛󰇜󰇛22󰇜󰇟󰇛󰇜󰇠󰇛2󰇜󰇛󰇜󰇛22󰇜 (3.2.20)

The normalized Hermite-Gaussian function can be expressed as:

󰇛󰇜 2

2󰇛2󰇜󰇛2󰇜 (3.2.21)

󰇛1󰇜󰇛2󰇜

󰇛2󰇜: A Hermite polynomial of order n

The signal x(t) can be promoted as a complete set of orthogonal functions composed of

Hermite-Gaussion eigenfunctions:

󰇛󰇜󰇛󰇜

 (3.2.22)

Where the expansion coefficient is:

󰇛󰇜󰇛󰇜



  (3.2.23)

Let  be the eigenvalue corresponding to the eigenfunction 󰇛󰇜.Take the Fourier transform of

both ends of equation (3.2.22), we can get:

󰇛󰇜󰇛󰇜

  (3.2.24)

Put (3.2.23) into (3.2.24):

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240504

170

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜





 󰇛󰇜󰇛󰇜



 (3.2.25)

The kernel of the Fourier transform:

󰇛󰇜󰇛󰇜󰇛󰇜

 exp 󰇡

2󰇢󰇛󰇜󰇛󰇜

 󰇛2󰇜 (3.2.26)

The usual Fourier transform form is obtained by substituting equation(3.2.26) into eqution(3.2.25).

The eigenvalue of Fourier transform is generalized to fractional order, and the eigenvalue of

fractional Fourier transform is defined as the fractional power of Fourier transform eigenvalue. So the

kernel of fractional Fourier transform is:

󰇛󰇜󰇛2󰇜󰇛󰇜󰇛󰇜

 (3.2.27)

(4) Wavelet Transform

The theory of wavelet transform was first proposed in 1984. When handling the local features of

earthquake waves, Morlet, a French geophysicist, found that it was difficult to satisfy the demand of the

traditional time-frequency domain handling method of Fourier transform when observing the high and

low frequency characteristics of signals in practical engineering applications. Therefore, wavelet

transform was adopted for geophysical exploration, and thus the wavelet transform had its first practical

application.

The fundamental theory of wavelet transform is as follows: 1, to expand and shift the original signal;

2, the original signal is divided into a series of sub-band signals with different spatial resolutions,

different frequency characteristics and direction characteristics. The sub band signal obtained in this

way has well local features of time domain and frequency domain. So, it can overcome the defect of

Fourier analysis in handling non-stationary signals and complex images.

The signal representation of wavelet transforms and Fourier transform is a linear combination of

basis functions. The difference is that Fourier transform adopts a harmonic function with time belonging

to 󰇛) and its basis function is , while the basis function of wavelet transform is a

generating function 󰇛󰇜with compact support set, and the wavelet sequence is acquired by stretching

and shifting the generating function 󰇛󰇜.The concrete formula is as follows:

󰇛󰇜 1

1

2󰇛

) (3.2.28)

0

a: the scaling factor

b: the translation factor.

For the introduction of the concept of wavelet transform, we must first briefly introduce the classical

convolution theorem in advance, that is:

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜



 (3.2.29)

Where  represents the classical convolution operator, the superscript represents the conjugation

operation, and represents the inner product operation.

Thus, for any signal 󰇛󰇜󰇛󰇜, the wavelet transform is defined by the classical convolution

operation as:

󰇛󰇜󰇛󰇜󰇛1

2󰇛󰇜󰇜󰇛󰇜󰇛󰇜 (3.2.30)

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240504

171

3. Review

3.1. Signal parameter estimation

Limin Liu, Haoxin Li, Qi Li, huangzhi Han and Zhenbin Gao mentioned in A Fast Signal Parameter

Estimation Algorithm for Linear Frequency Modulation (LFM) Signal under Low Signal Noise Ratio

(SNR) Based on Fractional Fourier Transform [6]. The initial rotation order and interval of the LFM

signal can be determined by the efficient fractional Fourier transform algorithm, however, the

estimation error of parameters is large when SNR is low because the variation of the normalized

fractional frequency spectrum amplitude no longer shows obvious distribution law. On this basis, by

using the good anti-noise performance of the 4-order origin moment of fractional order spectrum, the

defects of the efficient FRFT algorithm can be removed, and the optimal order can be quickly estimated

under the condition of low SNR. Thus, the parameters of LFM signal with low SNR can be quickly

calculated.

In Parameter Estimation of Linear Frequency Modulation Signal Based On Interpolated Short-time

Fractional Fourier Transform and Variable Weight Least Square Fitting [7], Weihao Cao, Zhixiang Yao,

Wenjie Xia, Su Yan proposed a variable weight least square fitting (VWSF)-interpolation short-time

fractional Fourier transform (ISTFRFT) method to estimate the parameters of chirp signals. Short-time

Fourier transform (STFT) is a generally used way for time-frequency analysis of LFM signals, but its

effect is not ideal for frequency estimation of wideband signals, so it can be extended to calculate the

instantaneous frequency of LFM signals more accurately. The VWSF method is used to reduce the error

caused by the conventional least square fit method and better calculate the initial frequency and

modulation frequency of signal. Finally, by studying CRLB of initial frequency and modulated

frequency estimation, it can be obtained that VWSF-ISTFRFT method has the high accuracy of LFM

signal parameter estimation.

3.2. Safety communication

It proposed the Fractional Fourier Transform Frequency Hopping with Variable Time Wide and Fixed

Bandwidth (FrFT-FH-VTFB) system in Two-dimensional Frequency Hopping Communication System

and Performance Analysis Based on Discrete Fractional Fourier Transform [8] which is write by

Xiaoyan Ning, Dongxu Zhao, Yunfei Zhu and Zhenyi Wang. The traditional frequency hopping

communication is easy to be intercepted because of the single dimension of signal parameter hopping.

The FrFT-FH-VTFB system obtains Chirp signals with different start frequencies and time widths

through discrete fractional inversion, and realizes the 2-dimensional jump of time widths and start

frequencies. In addition, Chirp's natural spread spectrum gain in the FrFT-FH-VTFB system can not

only effectively break the signal periodicity and improve the anti-interception capability of the system,

but also has concealability in the frequency domain and can resist energy detection. Moreover, due to

the time-width parameter hopping of FrFT-FH-VTFB system, the energy of some code elements will

increase, which makes the system have better anti-fading performance and reduces the influence of

fading on system performance.

Secure communication of IRS based on weighted fractional Fourier transform [9] of Shengfeng Li,

Xin Yang and Ling Wang studies MIMO scenes with general channel Settings by introducing IRS into

MIMO communication systems assisted by artificial noise and fourth-order weighted fractional Fourier

transform (WFRFT). WFRFT can make the complex plane of the signal show different states, so that the

processed signal has strong anti-interception ability, so it is widely used in the wireless physical layer

security transmission. On this basis, the intelligent reflective surface technology can support the secure

communication of direction modulation technology based on artificial noise superposition,and improve

the security of physical layer. Because the whole signal model is difficult to solve, the block coordinate

descent (BCD)-majorization-minimization (MM) algorithm is introduced to reduce the complexity. The

Lagrange multiplier method is used to get the optimal transmission precoding matrix matrix and

covariance matrix, and an effective MM algorithm is used to get the optimal phase shift. And the

performance simulation and analysis of the algorithm verify its feasibility and good safety performance.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240504

172

Ping Gao and Yuxiao Yang studies the application of three-layer weighted Fourier transform in

secure communication in A Safe Communication System Based on Three-layer Weighted Fractional

Fourier Transform [10]. Compared with the traditional Weighted fractional Fourier transform (WFRET)

signal, Multiple Parameters Weighted Fractional Fourier Transform (MPWFRFT) signal has stronger

anti-interception capability and can better ensure the safety of signal transmission. The communication

system based on three-layer WFRFT divides the initial data into three layers by Quadrature Phase Shift

Keying (QPSK) baseband mapping, and then processes and transmits it, which can effectively improve

the confidentiality of signal transmission. On this basis, genetic algorithm is imported for iterative

optimization, and the optimal control parameter set for the simulation of three-layer WFRFT signal

modulation characteristics is obtained. The communication performance, simulation performance and

security performance of the system are simulated respectively, which verifies that the system has good

anti-parameter scanning characteristics and high security.

4. Conclusion

Linear Frequency Modulation (LFM) signal is a signal whose frequency changes linearly with time,

widely used in radar and sonar technology. In this paper, chirp signals show different energy

aggregation on the fractional Fourier domain of different orders, and the continuous Fourier transform

of signals is carried out to obtain the parameter estimation of chirp signal. With the progress of

electronic technology, the security of communication system has become one of the hot topics. By

studying the application of discrete fractional Fourier transform and weighted fractional Fourier

transform in secure communication system, the original periodicity of system signal is broken, and the

problem of poor anti-interference and anti-interception capability of traditional communication system

can be solved.

References

[1] MCBRIDE A. C. On Namias's fractional Fourier transform[J]. IMA Journal of Applied

Mathematics,1987,Vol.39(2): 159-175

[2] David Mendlovic; Haldun M. Ozaktas. Fractional Fourier transforms and their optical

implementation. [J]. Journal of the Optical Society of America. A, Optics, Image Science, &

Vision, 1993, Vol.10(9): 1875-1881

[3] Almeida, L. B. Product and Convolution Theorems for the Fractional Fourier Transform[J].

Signal Processing Letters, IEEE,1997, Vol.4 (1): 15-17

[4] M.Fatih Erden, Haldun M. Ozaktas, David Mendlovic. Synthesis of mutual intensity distributions

using the fractional Fourier transform. Optics Communications. 1996 Apr;125(4–6):288–301.

[5] ARENS, G; FOURGEAU, E; GIARD, D; MORLET, J. SIGNAL FILTERING AND

VELOCITY DISPERSION THROUGH MULTILAYERED MEDIA[J].

GEOPHYSICS,1981, Vol.46: 419-420

[6] LIU Limin, LI Haoxin, LI Qi, HAN Zhuangzhi, GAO Zhenbin. A Fast Signal Parameter

Estimation Algorithm for Linear Frequency Modulation Signal under Low Signal-to-Noise

Ratio Based on Fractional Fourier Transform. Journal of Electronics & Information

Technology. 2021 Oct;43(10).

[7] CA0 Weihao, YAO Zhixian, XIA Wenjie, YAN Su. Parameter Estimation of Linear Frequency

Modulation Signal Based On InterpOlated ShOrt-time Fractional Fourier Transform and

Variable Weight Least Square Fitting. ACTA ARMAMENTARII. 2020 Jan; 41(1).

[8] NING Xiaoyan, ZHAO Dongxu, ZHU Yunfei, WANG Zhenduo. Two-dimensional Frequency

Hopping Communication System and Performance Analysis Based on Discrete Fractional

Fourier Transform. Journal of Electronics & Information Technology. 2023 Feb;45(2).

[9] LI Shengfeng, YANG Xin, WANG Ling. Secure communication of IRS based on weighted

fractional Fourier transform. J Huazhong Univ of Sci & Tech (Natural Science Edition). 2023

Mar;51(3).

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240504

173

[10] GAO Ping, YANG Yuxiao. A Safe Communication System Based on Three-layer Weighted

Fractional Fourier Transform. Telecommunication Engineering. 2022 Nov; 62(11).

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240504

174

The sum of four squares: An exploration of Lagrange’s

theorem and its legacy in number theory

Yifan Cheng

United International College, Zhuhai, 519000, China

r130033004@mail.uic.edu.cn

Abstract. Lagrange’s Four-square Theorem is a fundamental principle in number theory, which

states that every positive integer can be expressed as the sum of four squares. The theorem was

first conjectured by the Greek mathematician Diophantus of Alexandria in the 3rd century CE.

It was later proved by Pierre de Fermat in the 17th century, and the first published proof was

attributed to Joseph-Louis Lagrange in 1770. This paper presents a comprehensive account of

the four-square theorem in number theory, which focuses on finding integer solutions to

polynomial equations. The theorem has significantly advanced the study of Diophantine

equations. It traces Lagrange’s Four-square Theorem from its conjectural origins to its emergence

as a cornerstone of contemporary mathematical research. This paper reviews the proof of the

theorem and its implications, as well as its connection to modern research and applications,

highlighting its timeless relevance in mathematics. In addition, the paper reaffirms the extensive

influence of the theorem on the advancement of Diophantine equations and its ongoing

significance in elucidating the enigmas of number theory. This enhances our comprehension of

the theorem’s position in the wider story of mathematical progress, confirming its significance

in both historical and contemporary contexts.

Keywords: Lagrange’s Four-Square Theorem, Diophantine Equations, Computational Number

Theory, Quantum Computing

1. Introduction

The study of numbers and their properties is a fundamental aspect of mathematical inquiry, with the

representation of numbers as sums of squares occupying a pivotal role throughout history. This

fascination spans from the Pythagorean triples rooted in ancient geometry to the sophisticated realms of

modern number theory. Positioned at the confluence of historical curiosity and contemporary

mathematical rigor, this paper aims to explore the representation of integers as sums of squares, a

question that has intrigued mathematicians for centuries [1]. The foundation of modern number theory,

enriched by resources like NRICH and Silverman’s “A Friendly Introduction to Number Theory” [2]

[3], builds upon these ancient questions, showing their relevance in today’s mathematical challenges.

By delving into the historical evolution of this problem, from the early explorations by Pythagoras and

Diophantus to the groundbreaking proofs by Fermat, Euler, and Lagrange, it uncovers the mathematical

underpinnings and implications of such representations. Combined with a comprehensive review of the

historical literature tracing the development of sums of squares in number theory and an analysis of

contemporary mathematical texts and papers demonstrating current research and methods in the field,

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240576

(https://creativecommons.org/licenses/by/4.0/).

175

this paper bridges the gap between historical insights and modern mathematical advances, providing a

holistic view of the subject matter.

2. Historical Background

The journey to express numbers as the sum of squares begins with Diophantus of Alexandria in the 3rd

century (Diophantus of Alexandria, 3rd century CE) [4,5], whose work “Arithmetica” laid early

foundations for algebra and introduced the concept of Diophantine equations—seeking integer solutions

for equations. Diophantus’s insights into equations involving squares paved the way for future

mathematical breakthroughs. The narrative advanced significantly with Pierre de Fermat in the 17th

century. Fermat proposed that every prime number of the form 4n+1 could be uniquely expressed as the

sum of two squares. This proposition, known as Fermat’s theorem on sums of two squares, opened new

vistas in understanding the nature of numbers. The story took a monumental leap with Joseph-Louis

Lagrange in the 18th century, who proved that every positive integer could be represented as the sum of

four squares. Lagrange’s proof not only underscored the significance of sums of squares within number

theory but also highlighted the analytical techniques’ prowess in addressing mathematical challenges.

Leonhard Euler contributed further by developing the Euler four-square identity, enhancing the

mathematical framework for analyzing sums of squares. Similarly, Adrien-Marie Legendre’s work,

including his three-square theorem, deepened the understanding of numbers’ representation as squares,

particularly in relation to prime numbers. These milestones by Diophantus, Fermat, Lagrange, Euler,

and Legendre have fundamentally shaped the study of number theory, especially concerning the

intriguing challenge of expressing numbers as the sum of squares. Their collective work underscores the

mathematical field’s depth, interconnectedness, and the ongoing quest to unravel the complexities of

integers.

3. Mathematical Foundations

In number theory, there are several basic concepts and notations pivotal for understanding theorems such

as the Lagrange’s four-square theorem [1][4], including:

⚫ Integers (ℤ): The set of whole numbers including positive, negative numbers, and zero.

⚫ Prime numbers: Natural numbers greater than 1 that have no positive divisors other than 1 and

themselves.

⚫ Squares: Numbers that are the product of an integer with itself. For example, 4=22 is a square.

⚫ Sum of squares: An expression that represents a number as the sum of the squares of integers.

Lagrange’s Four-Square Theorem states that every positive integer can be expressed as the sum of

four squares of integers. Formally, for any positive integer n, there exist integers a, b, c and d such

that:

n=a2+b2+c2+d2 (1)

⚫ Euler’s Four-Square Identity: According to the Figure 1, this identity shows how the product of two

sums of four squares is itself a sum of four squares. Specifically, if we have two numbers expressed

as the sum of four squares:

(a2+b2+c2+d2)(e2+f2+g2+h2) (2)

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240576

176

Figure 1. The visualization of Euler’s Four-Square Identity

Euler’s identity allows us to express this product again as a single sum of four squares, an essential

concept for proving that the set of numbers expressible as the sum of four squares is closed under

multiplication [6]. This principle is further elucidated in texts such as Silverman’s introduction to

number theory, offering a gateway to understanding complex mathematical structures [3].

Understanding these concepts and their interrelations not only facilitates the comprehension of the

theorem’s proofs but also illustrates the elegance and depth of mathematical structures dealing with

integers and their properties.

4. Proof of Theorem

Lagrange’s original proof of the four-square theorem was presented in a simplified manner, leveraging

earlier works by mathematicians like Fermat [5] and Euler [6]. A detailed step-by-step simplification of

Lagrange’s proof would require a deep dive into complex number theory, the essence of his approach

was to show that every positive integer can be broken down into a sum of four squares, leveraging earlier

works by mathematicians like Fermat. Lagrange’s proof is notable for its methodical approach, showing

that if the theorem holds for certain types of numbers, it must then hold for all positive integers. One

key aspect of his proof involved demonstrating that if two numbers can be expressed as the sum of four

squares, then their product can also be expressed in the same form. This foundational concept is crucial

for understanding the theorem’s proof and its significance.

4.1. Alternative Proofs and Generalizations

The aim of this chapter is to examine alternative proofs and generalizations of the original theories or

conclusions. This not only demonstrates the diversity and flexibility of the original ideas but also

provides new perspectives and possibilities for further research and application.

4.1.1. Infinite Descent. Fermat famously used the method of infinite descent to prove various

propositions, which consisted of assuming there is a smallest counterexample to a proposition and then

showing that a smaller one exists, leading to a contradiction. Though not directly applied to the original

four-square theorem, this method has influenced proofs in related areas.

4.1.2. Hurwitz Quaternions. A more modern approach to understanding sums of squares involves the

algebra of Hurwitz quaternions, which are complex number systems that extend real numbers. These

quaternions provide a powerful framework for generalizing and proving the sums of squares theorems,

illustrating the deep connections between number theory and algebra.

4.2. Computational Methods in Proofs

With the advent of computers, computational methods have become invaluable in exploring the realms

of number theory, including proofs related to the four-square theorem. Computers empower

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240576

177

mathematicians to validate hypotheses on large datasets, identify patterns, and even provide proofs for

specific cases that would be unmanageable manually. These methods have not only confirmed the vast

applicability of the theorem but also opened new avenues for its exploration and application.

4.3. Applications and Implications.

The four-square theorem finds applications across various domains of mathematics and science,

demonstrating its fundamental nature:

4.3.1. Cryptography. In cryptographic systems, particularly those based on lattice problems and

quadratic forms, the ability to represent numbers as sums of squares has implications for encryption

algorithms and security protocols [7].

4.3.2. Coding Theory. The theorem’s concepts are applied in coding theory, where sums of squares are

related to error-detecting and error-correcting codes, crucial for data transmission and storage.

4.3.3. Quantum Computing. In quantum computing, the mathematical structures underlying the four-

square theorem can influence algorithms and the development of quantum error correction.

The four-square theorem, with its rich history and wide applicability, continues to be a subject of

fascination and study within the mathematical community. Its enduring legacy underscores the timeless

nature of mathematical inquiry and its relevance to both foundational research and practical applications.

5. Contemporary Perspectives

In the realm of number theory, researchers often focus on advancing the understanding of the four-square

theorem. And recent developments may include efforts to generalize the theorem to other number

systems or to explore its connections to other area of mathematics. Additionally, researchers might be

working on computational approaches to efficiently find representations of numbers as sums of squares

or investigating specific open problems and conjectures related to the theorem. Nonetheless, this paper

can lead to an understanding of the focus of the research community and the types of developments that

are likely to occur. The advent of powerful computational tools, as detailed by Crandall and Pomerance

in “Prime Numbers: A Computational Perspective,” allows researchers to test hypotheses related to the

four-square theorem on a scale not previously possible, verifying the theorem for very large numbers

and exploring its implications in computational complexity and algorithmic number theory [8].

5.1. Recent Generalizations and Computational Approaches

Recent generalizations include extending the four-square theorem to more complex structures, such as

higher-dimensional lattices or other algebraic systems. Mathematicians are also interested in similar

representations for other forms, like cubes or higher powers, and the conditions under which similar

theorems hold. These explorations are supported by advancements in computational number theory,

which Silverman and Crandall with Pomerance discuss in their respective works [3,8].

⚫ Generalizations: Research might explore extending the four-square theorem to more complex

structures, such as higher-dimensional lattices or other algebraic systems. Mathematicians are also

interested in similar representations for other forms, like cubes or higher powers, and the conditions

under which similar theorems hold.

⚫ Computational Number Theory: The advent of powerful computational tools allows researchers to

test hypotheses related to the four-square theorem on a scale not previously possible. This includes

verifying the theorem for very large numbers or exploring its implications in computational

complexity and algorithmic number theory.

5.2. Open Problems and Conjectures:

⚫ Density and Distribution: Questions about the density and distribution of the representations of

numbers as the sum of squares, and how these properties might influence other areas of number

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240576

178

theory and combinatorics.

⚫ Connections to Other Fields: Exploring deeper connections between the four-square theorem and

other mathematical fields, such as elliptic curves, modular forms, and cryptographic algorithms,

may yield new insights and open problems.

6. Discussion

The Four-square Theorem, proven by Joseph-Louis Lagrange in 1770, stands as a monumental testament

to the beauty and depth of number theory. This theorem, demonstrating that every positive integer can

be represented as the sum of four squares, resolved a long-standing question and catalyzed a new era of

mathematical exploration. Its simplicity belies the profound implications it has for number theory and

beyond, having inspired countless mathematicians to delve into the properties of numbers, leading to

the emergence of new branches within mathematics and a deeper understanding of existing ones.

ThisStewart and Tall’s “Algebraic Number Theory and Fermat’s Last Theorem” and Weil’s historical

approach in “Number Theory: An Approach Through History from Hammurapi to Legendre” provide

context for the theorem’s impact beyond its initial proofs, demonstrating its foundational role in

algebraic number theory and its historical significance [9]. Meanwhile, Conway and Smith’s exploration

of “On Quaternions and Octonions” illuminates the deep connections between the theorem and algebra,

highlighting the quaternion algebra’s role in generalizing and proving sums of squares theorems [10].

7. Conclusion

This paper has sought to illuminate these facets, presenting a comprehensive review of the theorem’s

historical development, its pivotal role in advancing number theory, and the myriad ways it continues to

influence modern mathematical research. By highlighting the theorem’s ongoing relevance and potential

for future discoveries, it underscores the dynamic nature of mathematics, where ancient questions give

rise to contemporary challenges and innovations. In conclusion, the four-square theorem remains a

cornerstone of mathematical inquiry, a source of inspiration for both theoretical exploration and practical

application. Looking ahead, it is clear that the theorem not only constitutes a significant chapter in the

history of mathematics but also serves as a springboard for future generations of mathematicians to

explore the endless mysteries of numbers. This work provides a deeper understanding of the theorem’s

place in mathematical thought, reaffirming its timeless significance and the endless curiosity it inspires

References

[1] Lagrange, J. L. (1770). Demonstration d’un théorème d’arithmétique. Mémoires de l’Académie

Royale des Sciences et Belles-Lettres de Berlin.

[2] Silverman, J. H. (2020). A friendly introduction to number theory. Brown University.

[3] Stewart, I., & Tall, D. (1979). Algebraic number theory and Fermat’s last theorem. Cambridge,

MA: Cambridge University Press.

[4] Fermat, P. de (1670). Observationes ad Diophantum [Marginal notes to Diophantus].

[5] Diophantus of Alexandria. (3rd century CE). Arithmetica

[6] NRICH. (n.d.). An introduction to number theory. Retrieved from

https://nrich.maths.org/numbertheory

[7] Euler, L. (1772). De compositione numerorum ex quattuor quadratis [On the composition of

numbers from four squares]. Novi Commentarii Academiae Scientiarum Petropolitanae, 16,

64-93.

[8] Conway, J. H., & Smith, D. A. (2003). On quaternions and octonions. Wellesley, MA: A K

Peters/CRC Press.

[9] Weil, A. (1798). Number theory: An approach through history from Hammurapi to Legendre.

Paris, France: Springer.

[10] Crandall, R., & Pomerance, C. (2005). Prime numbers: A computational perspective. New York,

NY: Springer.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240576

179

The model of price of sailing ships based on Lasso regression

YueyingZhang1,3,4, XinyiZhou2,5, YingfeiWang2,6, DongminWang2,7

1Information and Computering Science, Jinan University, Guangzhou city, China

2Mathematics and Applied Mathematics, Jinan University, Guangzhou city, China

3Corresponding author

43545841780@qq.com

53196558422@qq.com

6ywang2739@gmail.com

71530846815@qq.com

Abstract. For the sample data of sailing ships and the listed price prediction of sailing ships

based on the characteristics of sailing ships found on the website, we first conducted data

cleaning on the original data obtained. In this stage, there were many missing values and outliers

in the original data. After filling the missing values with mode, We transform the classified

variables into dummy variables, and finally normalize them to convert the original data into the

training data of the model. Then, we obtained the predicted value of Listing Price (USD) through

multiple regression fitting. By calculating R2 as 0.929, it was found that the model fitting effect

was perfect, but there were too many variables due to the conversion of attribute variables to

dummy variables, so it was necessary to compress model variables to select key variables. Since

this topic is the explanation of Listing Price (USD), the coefficient of each variable needs to be

known, so tree model is not adopted. In linear model, Lasso regression mainly screens model

variables. In this case, Lasso is the main screening method. The mean square error of the listing

price predicted by the multiple regression model based on Lasso regression adjustment

parameters is 0.125, indicating that the model has high accuracy and the simulated listing price

predicted is relatively high.

Keywords: Lasso regression, Dummy variables, Multiple regression.

1. Introduction

As with many luxury goods, the price of a sailboat in the sailing market changes as the boat ages and

market conditions change. Since the COVID-19 epidemic, the consumption pattern of second-hand

sailing boats has been gradually accepted by consumers, and the second-hand sailing trading market has

gradually flourished, and the circulation demand of second-hand sailing boats is also increasing. In the

process of second-hand sailboat trading, the most difficult and important problem is the valuation of

second-hand sailboats, which is also the most relevant problem for traders. Second-hand sailboats are

different from general second-hand products, and there is a complexity of ”one boat, one condition”.

First, the price of second-hand sailboats is not only affected by their own configuration, such as model,

boat width, sail area, displacement, and other factors, but also affected by the region, market price, year

of manufacture. As a result, the price of second-hand sailboats cannot be evaluated in batches, which

reduces the valuation efficiency of the second- hand sailboat market. However, there is no complete and

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240628

(https://creativecommons.org/licenses/by/4.0/).

180

reasonable pricing system in the second-hand sailing market at present. Therefore, it is urgent to find a

more accurate and reasonable valuation method and establish a sound evaluation system for second-

hand sailing market. In the age of the Internet, boaters provided COMAP with valuable economic and

research data on used sailboats sold in Europe, the Caribbean, and the United States in December 2020.

With the help of ever- advancing scientific algorithms and mathematical tools, how to efficiently analyze

and process these data and then find a suitable valuation model to determine the transaction price of

second- hand sailboats is the focus of current research.

2. Model building and solution

2.1. Data cleaning

2.1.1. Fulling the missing value

First, we took the data given in the title and the data[1, 2] we found about the characteristics of sailing

boats as the original data. In the original data, there were many missing values, and we carried out a

visual analysis of the missing values. The following figure shows the situation of missing values of each

characteristic variable. Therefore, we choose to directly delete the feature variables with more missing

values; For the characteristic variables with few missing values, we adopt the mode filling method to

process the missing values.

2.1.2. Check and deal the abnormal

After we deal with the missing value of the original data, we will find that there are still some outliers

in the data. First, we need to judge outliers, for which we use boxplot to visualize the data. Some sample

points in the sample that deviate significantly from the residual values are called outliers.

As for the outliers caused by dimensional errors in the samples, we adopt the method of dimensional

correction to deal with them. For outliers caused by other reasons, to reduce the errors in the model

training process, we adopt the method of deleting outliers.

2.1.3. Dummy variable transformation

Before screening characteristic variables, we need to convert the types of characteristic variables.

Among all variables that affect the listing price of second-hand ships, characteristic variables such as

Make, Variant and Geographic Region are disordered multi-classification variables. To quantify the data,

we usually assign values of 1,2,3,4. However, 1,2,3 and 4 have the order relation from small to large,

but in fact, there is no such size relation among classification variables, and they are equal and

independent. If 1,2,3 and 4 are substituted into the model, the result obtained is also unreasonable, so

we need to convert them into dummy variables. The value 0 or 1 reflects the different properties of the

variables.

2.1.4. Normalization

Before putting data into the training model, different characteristic variables often have different

dimensions and dimensional units, so direct input into the model will affect the final training results. To

eliminate the dimensional influence between different characteristic variables, it is necessary to conduct

standardized data processing to solve the comparability between data. The most typical method is to

conduct normalized data processing.

The normalization method adopted here is maximum and minimum normalization, that is, the

original data is linearly transformed into the range of [0,1] through linear function, and the calculated

results are normalized data. The dimensionless expression is transformed into a dimensionless

expression through transformation. The specific formula is as follow:

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240628

181

2.2. Model Preparation

2.2.1. Model evaluation coefficient: mean square error

Mean-square error (MSE) is a measure that reflects the difference between the estimator and the

estimator. MSE is a statistical measure and loss function commonly used in ML regression models, such

as linear regression. Its formula is shown in the figure:

Where yi is the true value and ˆyi is the predicted value.

In this paper, the estimator and the estimator are the listing price.

2.2.2. Adjust the compression penalty parameter λ

where

Where λ is the regulating parameter, sometimes called a hyper-parameter. λ∥β∥1 is the compression

penalty, and P is the number of arguments. Different λ will result in different mean square errors of

regression models with variables selected through the L1 regularization process. We calculate the

coefficients of λ and variables corresponding to the minimum mean square errors to determine the

optimal degree of the model.

2.3. Select characteristic variable

After the data cleaning of the original data, we get the processed data. Next, we adopt the optimization

stepwise regression and neural network to screen the characteristic variables that have a great impact on

the listing price and take the intersection of the variables screened by the two methods as the final

characteristic variable.

2.3.1. Model overview

Firstly, a multiple regression analysis model was established for n regression independent variables

x1,x2,··· ,xn and co-dependent variables Y = β0 + βixi + ε, i = 1,2,3,··· ,n. Each feature, that is, the

independent variable, has a corresponding slope coefficient βi . When we calculated the coefficient βi

through Python multiple regression analysis, we obtained the correlation and significance level of the

corresponding independent variable and dependent variable.

Then, we used Lasso regression and neural network to discard independent variables with poor

correlation and significance level and selected independent variables with strong correlation xi for

mathematical modeling again.

Meanwhile, in the process of obtaining the correlation table above, we will discuss the collinearity

between independent variables xi. If the collinearity between independent variables is strong, we will

screen out variables with relatively large characteristic parameters by adjusting α parameters in Lasso

regression. Variable parameters with relatively small mean square error are selected to build a model as

our final valuation model to explain Listing Price (USD).

2.3.2. Lasso regression model was established Lasso

Lasso regression[4, 5, 6] is a linear model, and this method is a compressed estimate. It obtains a more

refined model by constructing a penalty function, making it compress some regression coefficients, that

is, the sum of the absolute values of the force coefficients is less than a certain fixed value. It is also a

biased estimation for complex collinear data.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240628

182

For the linear regression problem with multiple variables, the fitting model is relatively complicated

due to excessive parameters. However, in order to prevent the overfitting phenomenon, the model should

be simplified as much as possible, and the majority of variables should be replaced by a finite few

variables to explain the estimated quantity. The commonly used methods for parameter selection include

sequence forward selection, sequence backward elimination, sequence forward selection and backward

elimination combination, and Lasso compression variable.

However, in this case, due to the variants of sailboats, there are too many dummy variables, and the

efficiency is too low whether the series is forward selection or backward elimination. Therefore, Lasso

compression variable model is adopted, and the coefficient of irrelevant variables is reduced to zero by

adding penalty term.

2.3.3. Concrete mathematical expression

Linear regression optimization objective:

β∗ = argmin

i=0

Optimization objectives after regularization:

β∗ = argmin

Where ∥ · ∥2 is the binary norm, that is, Rn in the vector space, let x = (x1,x2,··· ,xn)⊤.

2.3.4. Concrete modeling process

In the first question, the data table given in the question includes the manufacturer, sailboat model,

region, year, and price. To meet the requirements in the question, we need to analyze the sailboat

characteristics and regional economic conditions related to the price. We collected the relevant data of

the types of sailboats given in the title on the website of second-hand sailboats and collected the

economic conditions of cities in relevant regions. We decided to use a variety of GDP-related data to

express the regional economy and differentiated the sailboats in different regions by giving different

characteristic values through dummy variables. Finally, because there is no clear requirement in the title,

the data of single and double sails are combined in this question to facilitate the larger data set to have

better fitting effect in the subsequent multiple regression analysis. For sailing-related data, we collected

the waterline length LML, boat width, draft, displacement, sail area and average cargo throughput.

Shown in the following Table 1.

Table 1. HuTll data chart

LWL

Beam

Draft

Displacement

Sail Area

Average cargo

throughout

GDP

GDP per

capital

37.24

12.63

3.94

22046.0

824.0

45350000.0

2939.0

44494.0

36.06

12.99

6.07

15432.0

721.0

595000.0

57.8

13647.0

36.06

12.99

6.07

15432.0

721.0

595000.0

57.8

13647.0

36.06

12.99

6.07

15432.0

721.0

595000.0

57.8

13647.0

36.06

12.99

6.07

15432.0

721.0

595000.0

57.8

13647.0

36.06

12.99

6.07

15432.0

721.0

3150000.0

204.0

19147.0

37.07

13.02

6.23

19621.0

776.0

3150000.0

204.0

19147.0

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240628

183

Then for the newly formed number table, we use Python to carry out multiple regression analysis.

First, we standardized all the data to avoid the abnormal impact of measurement on the price of sailboats.

After standardized processing, we tabulated the data in Excel and understood the correlation between

various independent variables through the preliminary observation of the heat map followed by Figure

Figure 1. Numerical variable correlation thermodynamic

For the standardized data, we conducted preliminary multiple linear regression analysis, and we

could get the parameter table as shown in the following figure to represent the correlation level between

each independent variable and dependent variable and the fitting of the multiple linear regression model.

At the same time, according to the results followed by Table 2, we know that each independent variable

has strong collinearity.

At the same time, we pass P > |t| on the income form, the significance level of sorting, through

technical processing choose strong correlation between independent variables, to build a new

mathematical model. In the specific case of this question, we chose Lasso to select the independent

variables in the question, instead of the method of stepwise regression. The reason is that, given the

unique background of the data in the question, there are many independent variables and many dummy

variables. If stepwise regression is used, there will be many data cycles, which will occupy a large

amount of storage space. Second, the existence of meaningless data loops, will slow down the efficiency

of the code. Therefore, we used this method to analyze model fitting for VIF value and R2 value, to find

out several independent variables with great correlation influence and reserve them.

Table 2. Regression coefficient analysis table

Name

coefficient

Standard error

P > |t|

Sail Area

-3301.72

1.31E+04

-0.252

0.801

Length

8314.47

1.76E+04

0.471

0.638

GDP2

7859.72

5756.478

1.365

0.172

GDP1

-8533.72

5757.049

-1.482

0.138

Average Cargo Throughput

-19640.00

1.00E+04

-1.962

0.050

LWL

32720.00

1.46E+04

2.24

0.025

Beam

109400.00

3.64E+04

3.009

0.003

Average GDP

-9556.25

3245.174

-2.945

0.003

Constant

377200.00

1.79E+04

21.059

0.000

Year

68790.00

2235.4

30.774

0.000

Draft

-103200.00

1.20E+04

-8.569

0.000

Displacement

67930.00

1.12E+04

6.081

0.000

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240628

184

For the selected independent variable, we can find that the multicollinearity problem in the multiple

regression analysis problem is obvious to the retained independent variable through variance expansion

coefficient. We analyze the possible collinearity problem through Lasso regression through parameter

adjustment and seek the optimal situation. Through the improved least square method and L1

regularization, we analyzed the collinearity of the data. On the one hand, we carried out ”feature

screening” for the dependent variables. On the other hand, we find a more meaningful independent

variable X with this method, which minimizes the mean square error of the model. The result is followed

by Figure 2:

Figure 2. Regularized path diagram

The specific algorithm process is shown in the figure above. Finally, the coefficient of each variable

is returned, and then the predicted value is calculated to obtain the mean square error to get the

optimization model. The predicted value is explained according to the coefficient of each variable,

namely the Listing Price.

Finally, the MSE of the optimal model is 0.125/2, so the multiple linear regression model based on

lasso regression parameter optimization has good goodness of fit. The result is followed by Table 3:

Table 3. Final parameters selected table

serial number

Characteristic variable name

Variant Swan 54

CRSSVG

Make Hallberg Rassy

Make HH Catamaras

Make Southerly

Make Discovery

Make Boreal

Make Nautor

Variant Pilot Saloon 48

Variant Series 5

Make Oyster

Make Nautitech

Length

Make Bestevaer

Year

Variant 52 Sport

LWL

Variant Atlantic 49MF

Beam

Variant SABA 50 Maestro

Sail Area

Variant 52

GDP

Variant 52F

AGDP

Variant V50 Mills

Europe

Make Outremer

USA

For the selected parameters in the table, the first 20 are dummy variables that have a relatively high

impact on the listing price, and their impact on the listing price is at the level of 1%−10%. The relevant

dummy variables include the manufacturer of the sailboat, the model of the sailboat variant and the

regional influence variable. The last nine are sailing-related characteristic data (such as ship width, ship

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240628

185

length, displacement, etc.) and relevant regional economic characteristics, which are all independent

variables selected in lasso regression analysis with strong correlation to listing price. For dummy

variables, the influence of region is mainly caused by regional characteristics, which will be discussed

together with regional economic factors in the second follow-up question. As for the influence of the

sailboat manufacturer and the sailboat variant on the price, we can understand the premium generated

by the brand effect. Besides, the characteristic variables related to the sailing ship itself, the length and

width of the ship, on the one hand, determines the size and habitability of the space, on the other hand,

determines the number of materials used in the hull and the more scientific design structure needed for

larger ships, so the captain and width of the ship have a significant impact on the listing price. Moreover,

considering the year of production of the ship, it also reflects the usable time of the ship, and considering

the survival characteristics of the sailing ship, the production quantity of each type of ship is limited.

Such uniqueness, like luxury goods, also has a significant impact on the change of listing price caused

by the year. Nautical miles can also indicate the range of the vessel in a refueling situation, which is

relatively specific to the buyer, so nautical miles have a significant impact on the listing price.

3. Conclusion

Compared with multiple linear regression, Lasso regression analysis adds a penalty norm L1. The

existence of the norm increases the stability of our model and makes the screening model more effective.

In the process of variable screening, Lasso controls the screening process through the hyperparameter

real lambda between (0,1) to ensure that the screening is a continuous process, while making the

screening more robust without losing the interpretability.

Lasso is suitable for the model with larger data volume and more missing values, and when the

meaningful variables are relatively limited, this kind of analysis effect is better. Because L1 norm tends

to produce sparse coefficient, Lasso regression has built-in feature selection. Meanwhile, the solution of

L1 norm is sparse, so it is more efficient in calculation when used together with sparse algorithm.

References

[1] https://www.ayc-yachtbroker.com/alliage-44

[2] https://www.yachtworld.com/yacht/2005-alliage-alliage-44-8666783/

[3] https://itboat.com/search?text=alubat+cigale+16

[4] Reducing bias and mitigating the influence of excess of zeros in regression covariates with

multioutcome adaptive LAD-lasso [J]M¨ott¨onen Jyrki;L¨ahderanta Tero;Salonen

Janne;Sillanp¨a¨a Mikko J. Communications in Statistics - Theory and Methods. Volume 53 ,

Issue 13 . 2024. PP 4730-4744

[5] Lasso regression under stochastic restrictions in linear regression: An application to genomic

data[J] Gen¸c Murat;Ozkale M. Revan Communications in Statistics - Theory and Methods.

Volume 53 ,¨ Issue 8 . 2024. PP 2816-2839

[6] High-dimensional nonconvex LASSO-type [formula omitted]-estimators [J] Jad

Beyhum;Fran¸cois Portier Journal of Multivariate Analysis. Volume 202 , Issue . 2024. PP

105303-

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240628

186

Leader-follower consensus for nonlinear multi-agent systems

under directed topology

Sicheng Lu

Shanghai Normal University, No.100 Guilin Rd. Shanghai, China

lusicheng0923@163.com

Abstract. This paper investigates the consensus problem for multi-agent systems (MASs) under

directed topology. The primary objective is to design the distributed control protocol such that

all agents can converge to the state of leader. The distributed control protocol is designed, and

we derive sufficient conditions by using Lyapunov stability theory to achieve consensus.

Theoretical analysis and numerical simulations are provided to verify the effectiveness of the

proposed control protocol.

Keywords: Multi-agent systems, Consensus, Stability , Distributed control.

1. Introduction

In the past decades, cooperative control has gradually become a research focus in the scientific

community due to the wide range of applications of multi-agent systems(MASs) in many fields, such as

biology, physics, and artificial intelligence [1]. Research of MASs mainly includes consensus, formation,

and controllability problem etc [1]. The task of consensus lies in designing a control input protocol that

enables all agents to converge to the same state in the end. However, in many real-world scenarios, the

convergence of controllers does not achieve a certain expected effect, for example, multiple unmanned

aerial vehicles need to fly to a specified speed for real-time continuous control of the environment and

so on. MASs controllability is an emerging area of research after the study of multi-agent systems

coherence. For a network of agents, external inputs are applied to the leader such that the followers reach

any expected final state from any initial state.

In terms of consensus and controllability, research on MASs is mainly divided into leaderless

consensus and leader-follower consensus [1-2]. And the key to analyze the topic is to design an input

control [3]. Lu etc. present two non-smooth leader -following formation protocols for non-identical

Lipschitz nonlinear MASs [3]. Hui Q. proposed a nonlinear consensus algorithm for first-order systems,

expressed as [4]: 󰇛󰇜 Φ󰇛󰇛󰇜󰇛󰇜󰇜



1

Stability conditions of the system under this nonlinear consensus algorithm were obtained by giving

Lyapunov method[8], and analysis was conducted on the nonlinear consensus under switching

topologies. Lin et al. study the consistency problem for a continuous-time nonlinear system and give

conclusion that the system achieves consensus if and only if the directed switching topological network

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240635

(https://creativecommons.org/licenses/by/4.0/).

187

of the system have a sufficiently large connectivity range and strength [5]. Furthermore, the vector field

of each individual in the system must fall within a minimal sector made up of the individual itself and

its dependent individuals. Meanwhile the consensus problem usually involves the asymptotic stability

of the differential equation, which needs to be analyzed by means of the Lyapunov function. For

continuous time nonlinear system consensus algorithm, Moreaul designed Lyapunov function for

continuous time nonlinear system consensus algorithm [6]. As an extension of classic Lyapunov

function method, non-monotonically decreasing Lyapunov function method (NMDLF method), (Aeyels

& Peuteman, Citation1999) is applicable to complex time-varying dynamics, especially for fast time-

varying systems [7-8].

Motivated by the above analysis, the purpose of this paper is to analyze that the consensus of MASs

converges to the expected state under the condition that the designed control input protocol and the

corresponding parameters are satisfied. By giving a lemma, we transform the consensus problem for

MASs into an asymptotic stability problem for error systems. By investigating the error system, and

analyzing the asymptotic stability of the error systems, thus prove the consensus of the MASs converges

to the expected state.

Through theoretical analysis and numerical simulation, we verify the effectiveness of the control

input protocol and show the process of consensus for the MASs. The research in this paper not only

provides a solution to the consensus problem of MASs, but also provides theoretical support and

practical guidance for the design and realization of distributed control systems.

2. Preliminaries and problem formulation

2.1. Graph theory

Firstly, some notations will be given about the structure of an agent as well as definitions. 󰇛󰇜

refers to the graph of N-agents, where 󰇝12󰇞 denotes the vertex set of graph G.

󰇛󰇜 if and only if the j-th agent can receive the information of the i-th agent. What’s more, 

is also the neighbourhood of  so let 󰇝󰇛󰇜󰇞. Next 󰇟󰇠ℝ is a

weighted adjacency matrix, where 0, and  0 if 󰇛󰇜0 otherwise. The

Laplacian matrix of G is that 󰇟󰇠ℝ, where  



 , and  . A

directed path from node 1 to node  is equivalent to the existence of a sequence of ordered edges

󰇝󰇛12󰇜󰇛23󰇜󰇛1󰇜󰇞 in the directed graph G. If there exists a node called the root, which

has no parent node, such that the node has directed paths to all other nodes in the graph, then the directed

graph G contains a directed spanning tree. We define 󰇝12󰇞 as the attenuation

coefficient matrix associated with G, where 0 if the leader is a neighbour of i-th agent and

otherwise 0. It is assumed that the leader is self-active or moving independently. That is the

followers could receive information from the leader while the leader needs no information from any

follower.

2.2. Problem formulation

Consider the following nonlinear MASs with follower i can be described by :

󰇗󰇛󰇜12 (1)

And the dynamics for leader is described by: 󰇗0󰇛0󰇜 (2)

Where 󰇛󰇜 denotes the state variables of the N-agents, 

󰇛󰇜is the state vector of the i-th agent and 󰇛󰇜represents the nonlinear

function,is the control input protocol to be designed, 󰇛󰇜 denotes the state

of the expected state vector, 󰇛󰇜 is the state of the leader which is also the

expected state.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240635

188

In order to obtain the main result, the following assumptions are needed:

The topological structure G for MASs includes a directed spanning tree with the leader being the

root.

There is a positive constant  such that for any  there hold

󰇛󰇜󰇛󰇛󰇜󰇛󰇜󰇜󰇛󰇜󰇛󰇜

Remark 1: Assumption 2 takes the practicality into consideration, and there are many practical

systems can meet the requirement of Assumption 2 such as chaotic systems like Chen system, the Lorenz

system, and the unified chaotic system have been verified to satisfy this assumption.

When discussing the speed of change of an agent’s state variables, we categorize the factors affecting

the speed of change into internal and external factors, and an object’s state variables such as

displacement, velocity are often affected by the constraints of the fields inside the space in which it is

located and the effects of the external environment such as the interactions of other agents on itself, and

thus we build the above continuous time model of the i-th agent to portray the agent’s state variables.

But generally, the behaviour and state of these N agents are inconsistent, due to the needs of people

these N agents need to make behavior that meets the expectations, the corresponding mathematical

differential equation model shown in (2) denotes the state expected to be reached by the agents.

In order to gradually reach the expected state, each agent receives information from its respective

neighbours and passes information to its neighbours to update the state of the agent in the current

moment. The gap between the i-th agent and the other agents and the difference between each agent and

the corresponding expected state should be focused on and portrayed, so in the light of the idea, we will

give the following design of the control input function which is defined as consensus protocol:

󰇛0󰇜



112

Where  is a positive constant.  is the element of weighted adjacency matrix and  is the

element of . So, on the basis of consensus protocol, we can get the concrete model as following:

󰇗󰇛󰇜󰇛󰇜󰇛0󰇜



112

The coherent control problem is mathematically defined as follows: assume that the MASs contains

N agents, where the state of the i-th agent is denoted by 󰇛󰇜12, if when 

∞

we have





∞

󰇛󰇜0󰇛󰇜012 then the MASs is said to have reached an expected state of

consensus.

2.3. Stability analysis

Firstly, an overall error function is defined as 󰇛󰇜󰇛1󰇛󰇜2󰇛󰇜󰇛󰇜󰇜 to represent the

difference from the expected state 0 at each moment .

Where 󰇛󰇜󰇛󰇜󰇛󰇜 indicates the difference between i-th agent and the expected state.

󰇛󰇜

 󰇛󰇜󰇛󰇜  is the ordinary differential equation for the i-th component

of the error function.

To investigate the consensus problem of the MASs, in other words, to certify 

󰇛󰇜0󰇛󰇜

. We need to give a lemma to establish the asymptotic stability of differential equations

error functions and the consensus of MASs is the same problem.

Lemma1: For the error function, if  is nonsingular, one has 󰇛󰇜󰇛󰇜 󰇛󰇜

min󰇛󰇜

With min being the minimum singular value.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240635

189

So, if the solution of 󰇛󰇜

 󰇛󰇜󰇛󰇜  is asymptotic stable, then we can

get 

󰇛󰇜 and 

󰇛󰇜󰇛󰇜  by the Lemma1. It can be

concluded that the consensus problem of MASs can be transformed into the asymptotic stability problem

of its error systems.

3. Main results

In this section, the sufficient condition for the asymptotic stability of error systems will be given. For

the leader is globally reachable, at least one follower is connected to the leader, so 0.

Lemma2: For any , the eigenvalue for L the Laplacian matrix of G, the smallest eigenvalue is

always 0, which corresponds to the eigenvector being an all-one vector. The eigenvalue  is the

algebraic connectivity degree, which reflects the connectivity of the graph.





Where the  is the maximum eigenvalue and  is the second smallest eigenvalue.

Theorem 3.1. Suppose that the assumptions hold, the consensus of the system (1) (2) is achieved

under the following condition: 

󰇛

󰇝󰇞󰇜



Proof. The Lyapunov function is designed as

󰇛󰇜







Then, 󰇛󰇜

󰇗󰇛󰇜



 󰇗 



 󰇛󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜



 󰇜

Since for any  󰇛󰇜󰇛󰇛󰇜󰇛󰇜󰇜󰇛󰇜󰇛󰇜





 󰇛󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜



 󰇜





 󰇛󰇛󰇜



 󰇜

󰇛󰇜



 󰇛󰇜









󰇛󰇜



 







From Lemma2, one has

󰇛󰇜



 





 󰇛

󰇜



 

Thus, we can conclude that under the condition i and assumptions, the error systems are

asymptotically stable, which is 

󰇛󰇜

󰇛󰇜0󰇛󰇜. Then, the

consensus for the MASs is realized.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240635

190

4. Numerical simulation

In this section, an illustrative example will be presented to verify the effectiveness of our conclusion.

The simulations are performed in a two-dimensional space consisting of the X-direction and Y-direction

with five agents.

Give the nonlinear dynamical function of the system as follows

󰇗󰇛󰇜󰇛󰇜



 

The initial states of leader is: 󰇡

󰇢

And the initial states of followers are:

󰇡

󰇢󰇡

󰇢󰇡

󰇢󰇡

󰇢󰇡

󰇢

Where , the adjacency matrix and attenuation coefficient matrix is:









    

    

    

    

    















    

    















    

    

    

    

    











Figure 1. Topology graph.

Figure 2. Error in the X-direction.

Figure 3. Error in the Y-direction.

Figure 2 and figure 3 show that the error values of each agent with respect to the expected state in

the X-direction and Y-direction gradually converge to 0 over time, that is, it means that the five agents

eventually converge to the expected state consistently.

5. Conclusion

The expected consensus problem for nonlinear MASs is investigated in the paper. The topology of the

MASs is directed and the consensus can be realized. On the basis of the proposed distributed control

protocol and the Lyapunov stability theory, a sufficient condition is derived to reach the consensus for

MASs. Finally, the effectiveness of the distributed control protocol is verified by numerical simulation.

0 1 2 3 4 5

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240635

191

References

[1] Long, M. et al. (2023) ‘Model-free algorithm for consensus of discrete-time multi-agent systems

using reinforcement learning method’, Journal of the Franklin Institute, 360(14), pp. 10564–

10581. doi:10.1016/j.jfranklin.2023.08.010.

[2] Gruyitch, L.T. (2007) ‘Nonlinear Hybrid Control Systems’, Nonlinear Analysis: Hybrid Systems,

1(2), pp. 139–140. doi:10.1016/j.nahs.2006.10.001.

[3] Lü, J., Chen, F. and Chen, G. (2016) ‘Nonsmooth leader-following formation control of

nonidentical multi-agent systems with directed communication topologies’, Automatica, 64,

pp. 112–120. doi:10.1016/j.automatica.2015.11.004.

[4] Hui, Q. and Haddad, W.M. (2008) ‘Distributed Nonlinear Control Algorithms for network

consensus’, Automatica, 44(9), pp. 2375–2381. doi:10.1016/j.automatica.2008.01.011.

[5] Lin, Z., Francis, B. and Maggiore, M. (2007) ‘State agreement for continuous‐time coupled

nonlinear systems’, SIAM Journal on Control and Optimization, 46(1), pp. 288–307.

doi:10.1137/050626405.

[6] Moreau, L. (2005) ‘Stability of multiagent systems with time-dependent communication links’,

IEEE Transactions on Automatic Control, 50(2), pp. 169–182. doi:10.1109/tac.2004.841888.

[7] Aeyels, D. and Peuteman, J. (1999) ‘Uniform asymptotic stability of linear time-varying systems’,

Open Problems in Mathematical Systems and Control Theory, pp. 1–5. doi:10.1007/978-1-

4471-0807-8_1.

[8] Zhang, X., Chen, L. and Chen, Y. (2019) ‘Consensus analysis of multi-agent systems with general

linear dynamics and switching topologies by non-monotonically decreasing Lyapunov

function’, Systems Science & Control Engineering, 7(1), pp. 179–188.

doi:10.1080/21642583.2019.1620654.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/20240635

192

Retraction Agreement

*This agreement is the official document for the retraction application

supported by EWA Publishing. *

Notes:

• The retraction application should be approved by all authors listed in the

published article.

• The retraction process is entirely irreversible and permanent. Authors may not

withdraw the retraction once the agreement has been executed (14 days after

receiving the retraction application).

• The Statement of Retraction will be displayed on the publication website,

replacing the previous published article. The previous published article will be linked

with a revised title of “RETRACTED ARTICLE: [article title]”. All author information

will be retained in the new link.

• The retracted article should not be submitted to any publications operated by

EWA Publishing.

• No refunds will be issued.

If, after carefully reading the above notes, you confirm to proceed with the

retraction application, please complete the form below to initiate the retraction

process:

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0130

(https://creativecommons.org/licenses/by/4.0/).

193

Retraction Application Form

Title of article

Research Progress on 2D Human Pose

Estimation Based on Deep Learning

Name of journal/proceedings

Theoretical and Natural Science

Name of volume

TNS Vol.41

Article DOI

10.54254/2753-8818/41/2024CH0130

Name of author(s) (in order)

Haoyu Liu

Name of corresponding author

Haoyu Liu

Affiliation of corresponding author

University of Electronic Science and

Technology of China

Email of corresponding author

731933957@qq.com

Reasons of retraction

(this part will be displayed in the

Statement of Retraction on the

publication website)

Under the review of my supervisor, there

are many things for improvement in my

paper, including the classification of

methods, the methods cited, and the

summary and analysis of the methods. So I

temporarily choose to retract this paper.

Thanks for your understanding.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0130

194

Please read the information below

If the retraction is disputed by the author(s):

This article will not be retracted unless all authors agree to this retraction

agreement.

If the retraction is disputed by the publisher:

This article will not be retracted until the author(s) receives the retraction

notification.

Authors have the right to contest the retraction.

The article will not be retracted within 14 days of receiving the application, and the

authors can contest the retraction during this period.

Once the article’s retraction has been executed:

The retraction will not be reversed.

The retracted article will not be accepted by any electronic or physical publications

of EWA Publishing;

It is the author’s responsibility to be aware of the information above.

The author has read all of the information above and agreed with this retraction.

Yes√ No☐

The hand-written signatures of all authors:

Date:2024/10/29

*EWA Publishing reserves the right of

final interpretation of this agreement.

Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation

DOI: 10.54254/2753-8818/41/2024CH0130

195

0 views·205 pages

Theoretical and Natural Science: Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation PDF Free Download

Theoretical and Natural Science: Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation PDF free Download. Think more deeply and widely.

Uploaded by michellejames1996 on 3/23/2026

/205

100%