
A bivariate extension of the regularised adjusted plus-minus model for Basketball
match prediction.
by Luca Grassetti, Valentina Mameli, Michele Lambardi di San Miniato CS SA045
Basketball analytics is a relevant topic in the sports analytics literature, with many published papers. Key
research areas include player and team performance evaluation, injury prevention, and game strategy
assessment. Notwithstanding, the literature regarding predicting basketball match results is limited, and its
applications are not widely studied. Predicting outcomes is challenging due to the low signal in the data. For
example, in the NBA championship, teams frequently face each other multiple times, resulting in varying
outcomes without clear explanations. Moreover, basketball involves alternating possessions and a catch-
up restart rule, ensuring a balanced number of possessions between teams. Unlike other ball sports like
handball or water polo, basketball has differentiated scoring for each possession, a non-standard measure in
literature. Players on the court can be changed without restrictions during game suspensions. Consequently,
offensive and defensive strengths depend on the players in play, resulting in significant variability between
and within teams. As a result, useful stylised facts cannot be exploited as straightforwardly as in other sports;
the typical home advantage cannot be identified, either.
This project aims to develop a prediction model that combines existing soccer match prediction literature,
particularly bivariate models for home and away scores, with models for assessing player performance
typical in the basketball framework.
From the perspective of game outcomes, the predictive capability of model-based player performance
metrics, such as regularised adjusted plus-minus (RAPM), is limited. The bivariate model formulation can
improve this aspect, mimicking the standard solutions used to predict match outcomes in other sports, such
as soccer, and typically based on a more standardised scoring metric. The development of the proposed
model includes two main steps. First, separate models are developed for home and away teams’ scores,
with each equation affected by the interplay of offensive and defensive players’ contributions, as in the
RAPM model formulation. Second, the play-by-play data are aggregated over evenly spaced intervals, called
rounds, to reduce data heterogeneity. This solution requires that, for each round, the presence of players on
the field is evaluated by considering their usage percentage rather than classical indicator variables.
We show that this last modification only slightly affects the results of the original RAPM model but simplifies
the bivariate generalisation, making it more usable. The formulated model can accommodate various
distributions depending on the rounds’ length. The research compares solutions based on Poisson, over- or
under-dispersed Poisson, and Gaussian distributions, including their zero-inflated versions if needed. All
models are estimated using a Bayesian approach.
NBA data from the 2022-2023 season is analysed to assess the proposal. The findings suggest that bivariate
RAPM models benefit from the advantages of model-based approaches regarding player performance,
and they can be used to characterise the game’s flow better and predict the outcomes of game periods.
Notwithstanding, the analyses show that the predictive capability, determined by comparing observed and
estimated results of the rounds, is inadequate. Conversely, a superior outcome was obtained by aggregating
rounds across games. The models have been evaluated considering different criteria: accuracy and positive
and negative predictive values.
***
58