HISTORICAL DATA-BASED GOLD PRICE PREDICTION USING INTELLIGENT ALGORITHMS PDF Free Download

1 / 11
0 views11 pages

HISTORICAL DATA-BASED GOLD PRICE PREDICTION USING INTELLIGENT ALGORITHMS PDF Free Download

HISTORICAL DATA-BASED GOLD PRICE PREDICTION USING INTELLIGENT ALGORITHMS PDF free Download. Think more deeply and widely.

Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 172
HISTORICAL DATA-BASED GOLD PRICE PREDICTION USING INTELLIGENT
ALGORITHMS
Banothu parvathi 1, K.Swetha 2
1, 2Assistant Professor, Department of Computer Science and Engineering(AI&ML),
St. Martin's Engineering College, Secunderabad, Telangana, India
E-mail : parvathi605@gmail.com
Abstract: The price of gold keeps on
changing either it increases or decreases. The
predicting of the gold price is an important
aspect in finance as gold is a leading
component in financial market. Different
papers have been published predicting gold
prices using different machine-learning
models. In this paper using different
classification algorithms like random forest,
decision tree, linear regression and logistic
regression. The subject of this piece is
originated on research done to comprehend
relate to the value of gold. At present there
is always a demand for gold. The trend in
gold prices demonstrates that one of the finest
investment strategies is gold. Predicting the
direction of the gold rate is therefore wise. It
is possible to model and forecast data using a
variety of statistical models. The nonlinear
character of the gold price is consistently
apparent. For the right financial and
investment strategies, price prediction is
essential. An exponential curve can be used
to represent the fluctuation in the gold price.
One of the finest methods for resolving
nonlinearities in data is convolutional neural
networks, and RNNs are particularly
effective for forecasting and estimating time
series. The World Gold Council information
is utilized, and the findings indicate that the
proposed architecture is one of the best
financial forecasting methods.
Keywords - Regression, linear regression,
logistic regression, decision tree, random
forest, Machine Learning and l Prediction.
I Introduction
Looking at historical gold prices may
probably provide information that may help
in buying or selling decisions.
Savings and investing are a part of everyone's
lives in some capacity. In the setting of
economics, an investment is the purchase of
commodities intended for long-term wealth
development as opposed to short-term
consumption. A financial investment is the
purchase of a financial object with the hope
that it will one day produce revenue or be sold
for a profit. India has one of the most rapidly
expanding economies in the world, which
means it has more spare cash and a variety of
business possibilities[1]. A few of the
financial choices available to investors
include stocks, deposits, commodities, and
real estate. Their individual risk and yield
characteristics differ from one another.
Another commodity that many investors find
to be an attractive avenue for investment is
gold due to its increasing worth and extensive
use. Investors are growing favoring gold as a
protective commodity due to their pessimistic
views for the condition of the established
capital markets and foreign exchange
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 173
markets. The "asset of last resort," or the asset
on which buyers rely when the industrialized
world's capital markets are unable to produce
the desired income, is also considered to be
gold. Then, we can say that gold is viewed as
a tool by speculators.
Because of this, it can be said that buyers see
gold as a means of self-defense against shifts
in other markets. Given that gold is a valuable
commodity, supply and demand should have
an impact on its price just like they would for
any other good. The production this year has
little effect on gold's prices, though, because
it can be kept and the quantity has increased
over many years. Gold is used as a product
and a business tool. Gold's increasing value
while other markets, like the real estate and
financial markets, are decreasing in value and
experiencing volatility. Markets have made
gold an increasingly popular choice for
buyers. But recently, there has been a lot of
volatility in the price of gold, increasing the
danger associated with gold purchases. There
is uncertainty regarding how long these high
expenses will last and when they will begin
to decrease. Despite the reality that many
studies have looked at how different
economic variables affect the price of gold. It
is said that research shows the manipulate and
impact of different socioeconomic aspects on
the price of gold. As a result, the goal of this
article is to examine the relationship between
specific economic and market factors and the
price of gold.
After the introduction A section on literature
reviews, followed by sections on model
planning, data discovery and model building
and lastly a section on conclusion, make up
the structure of this essay.
Here, we have taken some records of
historical gold prices. This contains seven
columns that are Date, Open, Close, Low,
High, Volume, Currency.
II Literature Survey
The values of gold have been the subject of
numerous studies and articles. According to
research, the price of gold has dropped since
2013.
There are different studies on the value of
gold in the books. though there are many
various factors used in these lessons, it is
clear that gold values are evolved alongside
the dollar and market return in general. Other
socioeconomic factors and gold rates have
also been the subject of numerous studies.
The relationship between the price of gold
and the values of other goods, especially
crude oil, has also been the subject of
numerous studies. The results of these
studies, however, appear to be contradictory.
The research on the factors affecting gold
price as well as various approaches for
analyzing these relationships are covered in
the following parts.
Based on the relationship of gold and the
factors affecting the it, which include cost of
petroleum, dollar rupee conversion is solid
which is based on the review of literature
Manjula K. A., Karthikeyan[2]. Iftikharul
Sami and KhurumNazirJunejo says that in
olden days gold was used as a mode of
payment and gold represent the financial
strength of country.
In [3] R. Hafezi and A. N. Akhavan have
proposed a model named artificial neural
networks to predict future cost of gold which
concludes that the BAT-Neural Network is
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 174
good for predicting gold prices and Xiaohui
Yang suggested that the ARIMA model is the
finest among all the other models to predict
the price of gold [4]. Shian-Chang Huang and
Cheng-Feng Wu used the Deep multiple
kernel learning (DMKL) models to project oil
prices using the data collected from oil, gold,
and currency markets[5].
Lawrence asserts that returns on gold do not
significantly correlate with changes in some
socioeconomic measures[11]. In addition, it
found that returns on gold have a weaker
association standards for stock and bond
returns are higher than those for other
commodities.. Dr.Scassiavilanni, said that
inflation rate in the market and gold value is
against each other and the gold price and
inflation rate id going to be affected more in
future[6]. On the basis of a review of the
literature, Hanan Naser thinks that past
studies are unclear with regard to the
association among the gold price and increase
[7].
Ismail et al.'s a range of economic factors,
such as the commodity research bureau future
index, are used to forecast gold prices. the
Standard and Poor 500 index, the USD/EUR
trade rate, the inflation rate, the New York
Stock trade indicator, the money supply, the
Treasury bill, and the USD indicator. The
study finds that the goods the USD/EUR
foreign exchange rate, Research Bureau
future indicator, the price rises rate, and the
money supply all have a sizable blow on gold
prices[8-9].
Khaemasunun, provides the information
about different currencies, interest rates and
energy costs and how they impact the gold
price[15]. The price of gold and the exchange
rate are related both long-term and short-
term. per the actual statistics provided by Ai
et al. Malik and Ewing concentrated on more
about crude and gold contracts and their price
differences. The quantity of US inflation,
interest rates, and the worth of the currency
are all correlated with gold prices, claim
Ghosh et al. They also exposed a long-term
correlation between gold prices and the US
customer Price Index as a consequence of the
co integration study [10]. There is a
disagreement between the price of gold and
the a variety of variables that are believed to
influence it, according to a study of relevant
literature.
There are several methods that has been used
by researchers of the gold prices fluctuation
and how it going to change the related
factors[12]. Numerous studies have used
multivariate regression models to evaluate
the sensitivity of gold prices to various
aspects, according to Toraman. In this
respect, the many linear regression (MLR)
models developed by Ismail et al. who
forecast gold values and in doing so the MLR
model was appeared to be successful. The
literature analysis makes clear that multiple
linear regressions is widely used.
III Model Planning
Based on a study of the literature, 5 key
elements that are believed to affect the price
of gold were identified. The factors
considered in this study include the stock
market, the price of petroleum oil, the rate at
which the rupee and the dollar are exchanged,
inflation, and interest rates. The Nifty 500
indicator values are considered as an accurate
depiction of stock values. The National Stock
Exchange contains the top 500 firms which
were presented by nifty 500 indicators[14].
Inflation is gauged using the Consumer Price
Index with a reference year. The accounts
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 175
opened for longer than a year and their term
savings is represented as the interest rate. The
current rate in rupees per little is used to
represent the price of gold. For each of these
factors, monthly statistics were collected.
The Centre for observing Indian Economy's
datasets be used to acquire these figures.
These variables each had 228 samples. The
remaining 20% of the data was used for
testing, and the left over 80% was used to
teach the algorithm. The machine learning
methods used in this cram are linear
regression, and gradient boosting regression.
a. Data Discovery
1. Learning about domain
Domain specifies about the Gold price over
a period of time. It has the details like Date,
open, Close, Low, High, Currency. That
means on a particular date it tells the opening
amount Closing amount High and Low and
Currency for that amount of gold[13].
2. Identifying resources
We use Kaggle as tool for the dataset to
predicts gold prices. Each record is collected
based on date and currency is calculated
based on their Opening amount, Closing
amount.
3. Framing the problem
Problem is to find out the currency for the
amount of gold on a particular day based on
their Opening amount and closing amount.
4. Identifying key stakeholders
The primary stakeholder are shareholders,
gold miners, investors, Traditional
authorities, Non-governmental and
community based organizations.
5. Interviewing the Analytics Sponsor
do the gold price is low
Is gold price did is high
The reason for loss in gold price
It can predict the rise of gold on a certain
date
6. Developing Initial Hypothesis
Using classification and regression models
namely decision tree, Logistic Regression,
Linear Regression, Random Forest to the
value forecast of gold at a particular time
period.
7. Identifying Potential Data Sources
Details about the gold price to predict the
currency.
Review the raw data about gold historical
prices.
b. Data Preprocessing
Data preprocessing is a element of facts
training, which describes the processing
performed on primary data to prepare it for
another data processing method. It has been
normally an important primary step for the
data mining technique.
Steps:
1. arrange systematic sandbox.
2. Executing in ETLT
3. study about the data
4. Data training
5. review and envisage
1. Preparing systematic sandbox
The Daily Gold Price Historical Data in
order to train and analyse the data, a dataset
is imported and loaded into the R
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 176
environment. 2.Performing ETLT (Extract
Transform Load Transform)
The data is collected and extracted from the
Daily Gold Price Historical Data dataset and
then transformed for cleaning and analyzing
the data.
3. Learning about the data
The Daily Gold Price Historical Data
dataset contains multivariable data, counting
5774 rows and 7 columns that are related to
Gold Prices like Open, Close, High, Low,
Volume[16].
4. Data conditioning
The Daily Gold Price Historical Data
dataset is cleaned and transformed using the
R environment based on the formula given
to generate the classification & regression
analysis.
5. Survey and visualize
We will analyze and review the negative
values which has to be deleted and surveyed
III Architecture
This architecture states the process that we
are going to build to predict gold prices.
Fig.1 Architecture Process flow of the
Prediction.
To train and design the model the given data
we utilises the different machine learning
methods. The majority of the data is
employed for preparation, and the left over
portions are used for model testing. The
LSTM Model, linear regression, and random
forest regression are the machine learning
approaches used in this study.
A. Defining explanatory variables
We define explanatory variables because its
the reason to decide the value of Gold EFT
price. By putting this variables we can simply
employ to forecast the Gold ETF price.The
moving averages for the previous three and
nine days are the explanatory variables in this
method. The NaN values has to be removed
and we have to store the feature variable so,
we use dopna() method.
To anticipate the price of Gold ETF , we have
to add other factors to X. These factors can
include technical indications, used as the
Gold Miners ETF (GDX) or the Oil ETF
(USO), or US financial statistics.
B. Divide the data into training and
testing dataset
In this stage we have to divide the historical
data into training dataset and testing dataset.
After the we have to combine the result which
we are assumed, then the linear regression
model is developed using the training data.
Fig. 2 Storage of historical data.
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 177
C. Create a linear regression model
The statistical method of calculating the
connection between several variables is
known as regression analysis. This analysis is
used to explain changes od independent
values while other values remain constant
and how the dependent values varies.
Fig. 3 Regression variable model.
Multiple linear models are forms of linear
regression with multiple independent
variables. Below is a representation of
multiple linear regressions, where Y is the
dependent variable and X1, X2 are the values
of the independent variables.
Y = a + (b1)*(X1 +(b2*)X2 )+(bp*Xp)
…………. (1)
Now both a information and ML Algorithm.
D. Evaluate and Predict the Gold ETF
prices
Now we have to check the model and work in
the test dataset. With the help of training
dataset we can predict Gold prices. The
predicting technique calculates the Gold price
(y) given the explanatory variable X.
Fig. 4 Prediction and Actual price
Evaluation.
III Model Building
a. Model Planning
A model that anticipates expenses with the
highest degree of precision would be picked
to power a tool or programme.
Fig. 5 Decision Tree model.
1 Decision Tree
A decision tree is also known as selection
tree. Decision tree is a tool that is used for the
classification and prediction. A selection tree
is a flowchart like shape in which each inner
node represents the result of the check, and
each leaf node represents the result of the
check. This process is like supervised
learning process which means it uses trained
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 178
data for the prediction process. By using this
selection tree, we can visualize the decisions
that make us easier to understand, thats why
it is popularly known as data mining
technique.
To process this decision tree models we have
used two packages that is RPART and
PARTY packages. With the help of these two
packages we can predict the decisions.
2 Random Forest
Random forest is a used for both the
regression problems as well as the
classification problems, which is also a
supervised learning algorithm. It is an
ensembling learning approach, which allows
the users to combine multiple classifier for
the purpose of solving complex problems and
also improves the model performance.
Compared to the Random Forest decision
tree, it provides accurate results providing
diversity when building a model with several
different features.
A Random Forest is an together strategy that
uses many decision trees and a method called
Bootstrap Aggregation, often known as
bagging, to perform both regression and
classification problems. The fundamental
idea behind this method is to incorporate
more than a few decision trees to reach the
final result rather than depending solely on
one decision tree. To lessen fluctuation and
retain the minimal bias produced by a
Decision Tree model, the Random Forest
executes bootstrapping on Decision Trees.
Compared to most other algorithms, the
Random Forest algorithm has the following
advantages: The random forest technique can
also be used to engineer features by selecting
the most important ones beginning the
preparation dataset's accessible features.
3 Linear Regression
Linear regression is the model which is used
to describe the relation between one or more
independent variables and a continuous
dependent variable. For any type of linear
regression, the model finds to plot a line that
fits as best through a set of points of that data,
this can be calculated using least square
method. We use lm() function to build the
linear regression.
Models of linear regression called multiple
linear models include more than one
independent variable. The numerous linear
regression example that follows uses Y as the
reliant patchy and X1, X2, as the autonomous
patchys.
Y = ( a + b1) *(X1 + b2)*(X2 + ... + bp)*(Xp)
(2)
It now serves as a machine learning technique
as well as a statistical method
4 Logistic Regression
The logistic regression is a type of
arithmetical model which is frequently used
for predictive analytics and also for
classification. This algorithm used to predict
the probability of an event based on the given
set of autonomous variables in that data. The
value of the resultant dependent variable will
be lie in between 0 and 1. In logistic
regression, a transformation of logic is
applied to the probability. To buils the
logistic regression to a data we use glm()
command.
b. Model Building
Dataset: Daily Gold Price Historical Data
Decision tree execution (using party
package):
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 179
Algorithm 1: Decision Tree on Trained
dataset
set.seed(i,ii,iii,iv)
i<-sample(2,nrow(gold2),replace=T,
prob=c(0.7, 0.3))
tD <- gold2[i==1,]
testD <- gold2[i==2,]
library(party)
MF <-
Open~High+Low+Close+Volume
Gctree<-ctree(MF,
data=trainData)
table(pred1(gold_ctree),TrainData
$Open)
print(gold_ctree)
plot(gold_ctree)
tPred<pred1(gold_ctree,newdata=t
estData)
table(tPred,
testData=gold2$Open)
In the above algorithm decision tree
algorithms is applied on the trained dataset
for the prediction of the gold.
Algorithm 2: Decision tree execution
(using rpart package)
MF <-
Open~High+Low+Close+Volume
gold_rpart <- rpart(myFormula,
data = gold.train,
control = rpart.control(minsplit
=10))
attributes(gold_rpart)
pt(gold_rpart$cptable)
pl(gold_rpart)
te(gold_rpart, use.n=T)
opt <-
which.min(gold_rpart$cptable[,"xerror"])
cp <- gold_rpart$cptable[opt,
"CP"]
gold_prune <- prune(gold_rpart,
cp = cp)
pl(gold_pru)
te(gold_pru, use.n=T)
In the above algorithm decision tree
algorithms is applied on the trained dataset
for execution using rpart package for the
prediction of the gold.
Algorithm 3: Random Forest
execution:
in<-
sample(2,nrow(gold2),replace=T,
prob=c(0.6, 0.2))
trData <- gold2[in==1,]
teData <- gold2[in==2,]
library(rForest)
rf <- rForest(Open ~ .,
data=trData, ntree=100,
proxi=TRUE)
tab(pred(rf), trData$Open)
plot(rf)
varImpPlot(rf)
In the above algorithm Random forest is
applied on the trained dataset for execution
using for the prediction of the gold. Similarly
Logistic and the variable importance
algorithm is applied on the historical data
which is considered as the trained data.
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 180
Results
Fig.6 Decision Tree with Package Party
In fig.6 gives the process of decision tree
with the package party for the prediction of
the gold.
Fig.7 Random Forest
In fig.7 gives the process of Random Forest
for the prediction of the gold on the trained
data.
Fig.8 Variable Importance
In fig.8 gives the process of Variable
Importance for the prediction of the gold on
the trained data. It shows the specific days
of importance where the price is increased.
Fig.10 Logistic Regression
In fig.8 gives the process of Logistic
Regression for the prediction of the gold on
the trained data.
Conclusion
The major aspire of this paper is to
forecast the price of gold using different
classification algorithms. This paper uses the
dataset of gold prices which was collected
daily in the year between 2015 and 2021, The
best result for prediction is decision tree as it
is easier to understand and analyse the result.
This investigation sought to ascertain the
relationship of gold and a few factors that
have an impact on it, such as the stock
market, the cost of petroleum oil, the rate at
which the rupee is exchanged for the dollar,
inflation, and interest rates. The analysis used
monthly price statistics from January 2015
through December 2021. Additionally, the
data was divided into time groups, with
period I covering the period from November
2015 to December 2018. showing a
downward tendency in the price of gold.
Three machine learning algorithms linear
regression, random forest regression, and
gradient boosting regression were used to
assess these statistics. It is found that there is
a strong link between the factors during time.
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 181
When the different times are taken into
account, it is discovered that gradient
boosting regression offers a higher prediction
accuracy than random forest regression for
the entire period. The study concludes that
while machine learning algorithms are
extremely beneficial in this kind of research,
their accuracy relies on the properties of the
data. Additional study using these data and
different methods may be done to better
understand how these approaches function.
References
[1] Xiaohui Yang, "The Prediction of Gold
Price Using ARIMA Model", 2nd
International Conference on Social Science,
Public Health and Education 2019.
[2] Manjula K. A., Karthikeyan P, "Gold
Price Prediction using Ensemble based
Machine Learning Techniques", Third
International Conference on Trends in
Electronics and Informatics, 2019.
[3] Mrs. B. Kishori 1, V. Preethi, "Gold Price
forecasting using ARIMA Model",
International Journal of Research, 2018.
[4] R. Hafezi* , A. N. Akhavan, "Forecasting
Gold Price Changes: Application of an
Equipped Artificial Neural Network", AUT
Journal of Modeling and Simulation, 2018.
[5] Xiaohui Yang, "The Prediction of Gold
Price Using ARIMA Model", 2nd
International Conference on Social Science,
Public Health and Education (SSPHE 2018).
[6] K. R SekarManav Srinivasan, K. S.
Ravichandran and J. Sethuraman, "Gold
Price Estimation Using A Multi Variable
Model", International Conference on
Networks & Advances in Computational
Technologies, 2017.
[7] W. Du and J. Schreger, Local Currency
Sovereign Risk, Social Science Research
Network, Rochester, NY, SSRN Scholarly
Paper ID 2976788, Dec. 2013.
[8] J. Jagerson and S. W. Hansen, All about
investing in gold, McGraw-Hill Publishing,
2011.
[9] Z. Ismail, A. Yahya, and A. Shabri,
Forecasting gold prices using multiple
linear regression method, Am. J.Appl. Sci.,
vol. 6, no. 8, p. 1509, 2009.
[10] C. Toraman, Ç. Basarir, and M. F.
Bayramoglu, Determination of factors
affecting the price of gold: A study of
MGARCH model, Bus. Econ. Res. J., vol. 2,
no.4, p. 37, 2011.
[11] C. Lawrence, Why is gold different
from other assets? An empirical
investigation, Lond. UK World Gold
Council., 2003.
[12] L. A. Sjaastad and F. Scacciavillani,
The price of gold and the exchange rate, J.
Int. Money Finance, vol. 15, no.6, pp. 879
897, 1996.
[13] S. A. Baker and R. C. Van Tassel,
Forecasting the price of gold: A
fundamentalist approach, Atl. Econ. J., vol.
13, no. 4, pp. 4351, 1985.
[14] H. Naser, Can Gold Investments
Provide a Good Hedge Against Inflation? An
Volume 14 Issue 03 March 2025 ISSN 2456 5083 Page 182
Empirical Analysis, Int. J. Econ. Financ.
Issues, vol. 7, no. 1, pp. 470475, 2017.
[15] P Khaemasunun, Forecasting Thai gold
prices, Available Http://www Wbiconpro
Com3-Pravit. Pdf Acess, vol. 2, 2014.
[16] S. M. Hammoudeh, Y. Yuan, M.
McAleer, and M. A. Thompson, Precious
metalsexchange rate volatility transmissions
and hedging strategies, Int. Rev. Econ.
Finance, vol. 19, no. 4, pp. 633647, 2010.