
450 H. Arregui et al.
is calculated and this is the depot assigned to the vehicle. Otherwise, the
DBSCAN algorithm is applied again using data from another day.
– For parcel delivery, the algorithm iterates over all the vehicles, and all the
data for each vehicle is saved but differentiated by day. In this way, a centroid
is calculated with the first data of each day and another centroid with the
last data of each day. This is done because it has been seen that motorcycles
often leave and arrive at the same place. Calculating the distances of the two
centroids, the maximum distance has generally turned out to be very small,
and in most cases the two centroids are very close, which would indicate that
it reflects a DEPOT. Each vehicle is assigned the centroid calculated with
the first points of each day.
3.3 Trip Model and Statistics
Once the depot locations are obtained, consecutive tracks are joined and then
divided every time a visit to a depot is detected, obtaining a set of circular trips.
For HDFE services, the duration and speed of these trips have a very skewed
distribution to the left, having an average of t = 25.69 min for the duration and
d = 4.14 kilometres for the distance travelled. The delivery time windows for
an acceptable quality of service specified by the food companies are unknown
but the duration statistics can be used to model them. For HDPE services, the
distribution is more balanced, but there are many outliers that have a very
high value for the duration and length of the trip, while their means are t =
206.60 min and d = 21.53 kilometres, respectively. To filter this data, it has been
decided to use a variable that relates duration and length: average speed (speed
= duration/length). Its distribution is also a little skewed by outliers. The data
is filtered according to speed values so that a more centred distribution of the
three variables is obtained.
For HDFE, values that are above the 97th quantile in speed have been dis-
carded. So that there are no values with the speed very close to 0, values below
quantile 1 have been discarded. For HDPE, values with d = 0 are discarded. In
this case, the values that are within the 1st and 92nd quantile are selected. By
doing this cleaning, almost 7% of the data is lost. Furthermore, the quartiles
have not changed much, while the maximums have decreased considerably.
3.4 Battery Consumption vs. Trip Relation Modelling
Each battery SoC data is associated to a geographic location in the trip (the trip-
points) corresponding to the same motorbike, according to the timestamp. Then,
trip-points are grouped into short-distance segments using a random 500m-
1000m segment size. These segments will be the basis of the training model.
For each segment, we calculate the following: elevation difference between the
start and end point locations, distance of the segment, SoC deviation from start
point to the end point, as well as the average speed limit of the road between
the points of the segment according to OpenStreetMap.