GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE PDF Free Download

1 / 76
2 views76 pages

GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE PDF Free Download

GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE PDF free Download. Think more deeply and widely.

GENERALIZABILITY OF RISK STRATIFICATION
ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC
OBSTRUCTIVE PULMONARY DISEASE
by
Joseph (Khoa Nguyen) Ho
PharmD, University of British Columbia, 2021
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
in
THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES
(Pharmaceutical Sciences)
THE UNIVERSITY OF BRITISH COLUMBIA
(Vancouver)
April 2023
© Joseph (Khoa Nguyen) Ho, 2023
ii
GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION
OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE
submitted by
Joseph (Khoa Nguyen) Ho
in partial fulfillment of the requirements for
the degree of
Master of Science
in
Pharmaceutical Sciences
Examining Committee:
Mohsen Sadatsafavi, Associate Professor, Pharmaceutical Sciences, UBC
Supervisor
Donald Sin, Professor, Medicine, UBC
Supervisory Committee Member
Kate Johnson, Assistant Professor, Pharmaceutical Sciences & Medicine Joint Appointment, UBC
Supervisory Committee Member
Jacquelyn Cragg, Assistant Professor, Pharmaceutical Sciences, UBC
Supervisory Committee Member
Larry Lynd, Professor, Pharmaceutical Sciences, UBC
Additional Examiner
iii
Abstract
Background: Contemporary management guidelines for chronic obstructive pulmonary disease
(COPD) rely on exacerbation history to risk-stratify patients and guide therapy for the prevention
of future exacerbations. However, exacerbation history alone may not reliably predict future
exacerbations due to random variability in frequency. To address this problem, multivariable
prediction models have been developed to improve predictive accuracy.
Objective: The objective of this thesis was to assess the generalizability of COPD exacerbation
risk stratification algorithms and assess whether the inclusion of race improves the performance
of such algorithms.
Methods: I evaluated three algorithms: the Acute COPD Exacerbation Prediction Tool
(ACCEPT),1,2 a prediction model by Bertens et al.,3 and exacerbation history alone, using data
from three COPD clinical trials representing different levels of exacerbation risk. I examined
discrimination, calibration, and clinical utility as measures for model performance. I then
recalibrated the models using the setting-specific exacerbation risk for comparison. I explored
race as a variable that could convey information on background risk and assessed whether
adjusting for race with a random-effects approach could improve model performance.
Results: Both prediction models had better discrimination compared to exacerbation history
alone with Δ area under the curves (AUCs) ranging from 0.05 to 0.10 (P-values <0.001).
However, no algorithm was superior in clinical utility, and all had the risk of harm. When the
models were recalibrated, clinical utility was significantly improved, and the risk of harm was
substantially mitigated. The crude exacerbation risk ratios (RRs) of race varied between 0.96 to
1.57. However, in the random-effects model, the shrunken RRs ranged between 0.99 to 1.07.
Using the adjusted RRs to update ACCEPT, I showed that the inclusion of race in ACCEPT did
not significantly improve model performance compared to the base ACCEPT. The ΔAUCs were
<0.01 in all samples with P-values > 0.17. There were also no notable improvements to
calibration, clinical utility, or goodness-of-fit (P-value 0.15) after race-adjustment.
iv
Conclusions: Risk stratification algorithms for COPD exacerbations are not universally
applicable across all settings. However, the flexibility of clinical prediction models allows them
to be updated to accommodate setting differences.
v
Lay Summary
Chronic obstructive pulmonary disease (COPD) is a common lung disease that affects over 2.6
million Canadians. Episodes of flare-ups (known as exacerbations) are burdensome and common
in COPD. Thus, exacerbation prevention is crucial in COPD management. Although current
guidelines only utilize exacerbation history to estimate the future risk of exacerbations and risk-
stratify patients, considering other patient characteristics can improve risk prediction accuracy.
Clinical prediction models have been designed to combine multiple patient-specific
characteristics to better calculate the risk of future exacerbations. The objective of my thesis was
to evaluate clinical risk stratification algorithms across different settings and explore measures to
improve their accuracy. I found that these algorithms should not be universally applied to all
settings because their performance can vary from one population to another. However, unlike
exacerbation history, prediction models can be updated to account for population differences and
provide a better estimation of risk across all populations.
vi
Preface
This thesis is comprised of two individual studies completed by Joseph Ho. I was responsible for
conducting the literature review, developing the study design and analytic plan, conducting the
analyses, interpreting the results, and writing the chapters of this thesis. My supervisor, Dr.
Mohsen Sadatsafavi, conceived the research question for both studies. My MSc supervisory
committee comprised of Drs. Mohsen Sadatsafavi, Donald Sin, Kate Johnson, and Jacquelyn
Cragg, provided feedback on the study design and interpretation of results. Dr. Donald Sin
provided additional clinical expertise for the interpretation of results. Drs. Mohsen Sadatsafavi
and Donald Sin acquired the data for the included studies. All co-authors of the included
manuscripts provided feedback on the study designs, interpretation, and reviewed manuscript
drafts.
Accepted manuscript:
1. Ho JK, Safari A, Adibi A, Sin DD, Johnson K, Sadatsafavi M. Generalizability of Risk
Stratification Algorithms for Exacerbations in COPD. 2022. CHEST. (Related to Chapter
2)
In progress manuscript:
2. Assessing the Impact of Race on the Predictive Performance of a COPD Exacerbation
Risk Prediction Model (Related to Chapter 3)
Ethics approval for the included chapters were obtained from the University of British
Columbia’s Human Ethics Board (H22-01462).
vii
Table of Contents
ABSTRACT............................................................................................................................................................. iii
LAY SUMMARY ......................................................................................................................................................... v
PREFACE…… ............................................................................................................................................................ vi
TABLE OF CONTENTS ........................................................................................................................................... vii
LIST OF TABLES ....................................................................................................................................................... x
LIST OF FIGURES .................................................................................................................................................... xi
LIST OF ABBREVIATIONS .................................................................................................................................... xii
ACKNOWLEDGEMENTS ...................................................................................................................................... xiii
CHAPTER 1: INTRODUCTION ................................................................................................................................ 1
1.1 CHRONIC OBSTRUCTIVE PULMONARY DISEASE ............................................................................................... 1
1.2 MANAGEMENT OF COPD: RISK STRATIFICATION AND PREVENTIVE THERAPIES ............................................. 2
1.3 THE NEED FOR ACCURATE RISK STRATIFICATION IN COPD ............................................................................ 4
1.4 CLINICAL PREDICTION MODELS ....................................................................................................................... 4
1.5 ASSESSING THE PERFORMANCE OF CLINICAL PREDICTION MODELS ................................................................ 5
1.6 UPDATING CLINICAL PREDICTION MODELS ..................................................................................................... 6
1.7 CLINICAL PREDICTION MODELS UNDER INVESTIGATION: ACCEPT AND BERTENS ........................................ 7
1.8 CURRENT KNOWLEDGE GAPS ........................................................................................................................... 8
1.9 OBJECTIVES ...................................................................................................................................................... 9
1.10 THESIS SUMMARY .......................................................................................................................................... 10
CHAPTER 2: GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR
EXACERBATIONS IN COPD ................................................................................................................................. 12
2.1 INTRODUCTION ............................................................................................................................................... 12
2.2 METHODS ....................................................................................................................................................... 13
viii
2.2.1 Risk Stratification Algorithms .............................................................................................................. 13
2.2.2 Sources of Data .................................................................................................................................... 14
2.2.3 Primary Outcome ................................................................................................................................. 15
2.2.4 Discrimination of Risk Stratification Algorithms ................................................................................. 15
2.2.5 Clinical Utility of Risk Stratification Algorithms ................................................................................. 15
2.2.6 Risk Prediction Model Recalibration ................................................................................................... 16
2.3 RESULTS ......................................................................................................................................................... 16
2.3.1 Participants .......................................................................................................................................... 16
2.3.2 Discrimination ...................................................................................................................................... 18
2.3.3 Calibration ........................................................................................................................................... 19
2.3.4 Net benefit ............................................................................................................................................ 20
2.4 DISCUSSION .................................................................................................................................................... 23
2.5 CONCLUSION .................................................................................................................................................. 26
CHAPTER 3: ASSESSING THE IMPACT OF RACE ON THE PREDICTIVE PERFORMANCE OF A
COPD EXACERBATION RISK PREDICTION MODEL ................................................................................... 27
3.1 INTRODUCTION ............................................................................................................................................... 27
3.2 METHODS ....................................................................................................................................................... 28
3.2.1 Sample Data ......................................................................................................................................... 28
3.2.2 Clinical Prediction Tool ....................................................................................................................... 29
3.2.3 Racial Differences in Exacerbation and Model Adjustment ................................................................ 30
3.2.4 Model Goodness-of-fit .......................................................................................................................... 31
3.2.5 Model Performance .............................................................................................................................. 31
3.3 RESULTS ......................................................................................................................................................... 32
3.3.1 Participants .......................................................................................................................................... 32
3.3.2 Racial Differences in Exacerbation Risk ............................................................................................. 33
3.3.3 The Effect of Individual Predictors ...................................................................................................... 34
3.3.4 The Effect of Adjusting for Race on Model Performance .................................................................... 35
ix
3.4 DISCUSSION .................................................................................................................................................... 38
CHAPTER 4: CONCLUSION ................................................................................................................................... 41
4.1 OVERVIEW AND CONTRIBUTION ..................................................................................................................... 41
4.2 STRENGTHS OF THIS RESEARCH ...................................................................................................................... 42
4.3 LIMITATIONS OF THIS RESEARCH .................................................................................................................... 43
4.4 IMPLICATIONS FOR PRACTICE ......................................................................................................................... 45
4.5 IMPLICATIONS FOR FUTURE RESEARCH ........................................................................................................... 46
REFERENCES ........................................................................................................................................................... 48
APPENDICES ............................................................................................................................................................ 59
x
List of Tables
Table 2-1. Baseline Characteristics of the Study Sample of the Included Trials ......................................................... 17
Table 2-2. Time-dependent AUC at 12 Months ............................................................................................................ 19
Table 2-3. Dominating Risk Stratification Algorithms at the Three Threshold Levels ................................................ 22
Table 3-1. Baseline Characteristics of Included Participants ..................................................................................... 32
Table 3-2. Racial Differences in Exacerbation ............................................................................................................ 34
Table 3-3. Model Series Evaluating Goodness-of-fit ................................................................................................... 34
Table 3-4. Mean Exacerbation Risk by Race ............................................................................................................... 36
xi
List of Figures
Figure 1-1. COPD Pharmacotherapy Management Guidelines .................................................................................... 3
Figure 2-1. Decision Curve Analysis of Risk Stratification Algorithms ...................................................................... 21
Figure 3-1. Calibration plots of ACCEPT and ACCEPT-Race. .................................................................................. 36
Figure 3-2. Decision curves of ACCEPT vs. ACCEPT-Race. ..................................................................................... 37
xii
List of Abbreviations
ACCEPT
Acute COPD Exacerbation Prediction Tool
AECOPD
acute exacerbations of COPD
AUC
area under the curve
BC
British Columbia
BMI
body mass index
CAD
Canadian dollar
CAT
COPD Assessment Test
mMRC
Modified Medical Research Council
COPD
chronic obstructive pulmonary disease
CTS
Canadian Thoracic Society
DCA
decision curve analysis
ED
emergency department
FEV1
forced expiratory volume at one second
FVC
forced vital capacity
GOLD
Global Initiative for Chronic Obstructive Lung Disease
ICS
inhaled corticosteroid
LABA
long-acting β2 agonists
LAMA
long-acting muscarinic receptor antagonists
LOTT
Long-term Oxygen Treatment Trial
RR
risk ratio
xiii
Acknowledgements
I want to thank my supervisor Dr. Mohsen Sadatsafavi for his endless support in helping me
accomplish this work. His mentorship has made my graduate experience truly amazing, and I
will forever be grateful. I also want to thank the other members of my supervisory committee
Drs. Donald Sin, Jacquelyn Cragg, and Kate Johnson for their expertise, constructive feedback,
and guidance, which significantly enhanced this work. I owe additional thanks to Kate Johnson
for her mentorship and words of encouragement to guide me through my professional career.
I extend my thanks to the entire Respiratory Evaluation Sciences Program (RESP) and
Collaborations for Outcomes Research and Evaluation (CORE) team for assisting me throughout
this journey. I owe particular thanks to Dr. Abdollah Safari, University of Tehran, and Amin
Adibi for setting the foundational work of this thesis and providing extensive analytic advice. I
also thank Harry Tae Yoon Lee and Joseph Emil Amegadzie for their unwavering support in the
forms of advice, assistance, and most importantly, friendship.
Finally, I owe thanks to my friends and family that have been supporting me since the beginning,
as producing this thesis would not have been possible without them. My father Khiet Ho, my
mother Chieu Tran, my sister Lynn Ho, and the entire Ho and Tran Family for their support. All
my friends in TSF who kept me grounded and reminded me of the purpose of this arduous but
fun professional journey.
1
Chapter 1: Introduction
1.1 Chronic Obstructive Pulmonary Disease
The clinical context of this thesis is chronic obstructive pulmonary disease (COPD). COPD is a
chronic lung condition affecting millions of Canadians.4 Its natural course is characterized by
symptoms of breathlessness and frequent coughing, gradual lung function decline, and episodes
of intensified disease activity referred to as exacerbations (or lung attacks).5 The two main risk
factors for COPD are smoking and aging;6 others include genetics, occupational exposure, and
poorly controlled asthma.6 Historically, COPD has been considered a disease for elderly men,
reflecting their high prevalence of smoking.7 However, due to recent smoking trends, and the
greater susceptibility of female lungs to inhaled toxins, COPD now greatly affects the wider
Canadian population.7
In addition, the global prevalence of COPD has greatly increased over the last decades.8 COPD
is a significant cause of morbidity and mortality and is associated with substantial healthcare
costs.9,10 Globally, COPD is the third leading cause of death.11 Some studies have projected that
by 2030, there will be a further 150% increase in the number of patients with COPD.12 In
addition, hospitalizations due to COPD were projected to increase by 182% in the same time
span.12 In 2011, COPD expenditure was estimated to cost the Canadian healthcare system $4.25
billion.13 In British Columbia (BC), the excess total costs of COPD from 2001 to 2010 were
$5,424 (2010 CAD) per patient-year with the majority of costs attributable to hospital
admissions.9 Given the growing prevalence of COPD, an aging population, and clear gaps in
care, the burden of COPD is expected to further increase.12 Thus, it is critical to address the
causes of hospitalization for COPD to alleviate its health and economic burden.
COPD exacerbations considerably affect quality of life and are a leading cause of
hospitalizations.12,14,15 Exacerbations can be defined in different ways. The symptom-based
definition relies on patient-reported worsening of symptoms such as increased dyspnea and
sputum production. Alternatively, the event-based definition is based on a change to the required
healthcare service in the presence of worsening respiratory symptoms. With the event-based
2
definition, exacerbations are categorized as mild (self-managed at home with short-acting
bronchodilators), moderate (requiring treatment with systemic corticosteroids or antibiotics), or
severe (requiring inpatient care). Major COPD guidelines generally utilize the event-based
definition.5,1618
1.2 Management of COPD: Risk Stratification and Preventive Therapies
Guidelines recommend the diagnosis of COPD be based on spirometry testing.16,19 A forced
expiratory volume at one second (FEV1) to forced vital capacity (FVC) ratio of less than 0.7 is
typically used to diagnose COPD.16,19 Alternatively, an FEV1/FVC ratio below the lower limit of
normal, defined as the lower fifth percentile of a healthy reference population, is also considered
a diagnosis of COPD. The influential Global Initiative for Chronic Obstructive Lung Disease
(GOLD) guidelines stratifies patients into 4 groups depending on their exacerbation history and
symptom score (Modified Medical Research Council dyspnea scale or COPD Assessment Test
score).16 These 3 groups (A, B, and E) guide the therapeutic management of COPD to reduce
mortality and prevent exacerbations. The Canadian COPD management guidelines, developed by
the Canadian Thoracic Society (CTS), adopt a similar classification system to inform therapy for
Canadians.17 COPD severity is also divided into 4 levels (mild, moderate, severe, and very
severe) based on cut-offs of FEV1/FVC ratios.20
Inhaled pharmacotherapies are the cornerstone of preventive therapy for exacerbations in COPD.
The main drug classes are inhaled bronchodilators (short and long-acting β2 agonists),
corticosteroids (ICS), and long-acting muscarinic antagonists (LAMA). 16,17 COPD guidelines
and management strategies by GOLD and the Canadian Thoracic Society (CTS) recommend a
stepwise approach to therapy based on the ‘frequent exacerbator’ label.17,21 This is defined as
having > 2 moderate or > 1 severe exacerbation in the previous 12 months.17,21 Figure 1-1 shows
the current major COPD treatment algorithms from GOLD and CTS. COPD preventive therapy
is largely guided by categorizing patients into one of the risk groups based on their history of
exacerbation and a symptom score.
A
3
B
Figure 1-1. COPD Pharmacotherapy Management Guidelines
A: GOLD guidelines for COPD pharmacotherapy;21,22 B: CTS guidelines for COPD pharmacotherapy (adapted from
Bourbeau et al.17)
Patients are considered at ‘low risk of AECOPDwith < 1 moderate AECOPD in the last year (moderate
AECOPD is an event with prescribed antibiotic and/or oral corticosteroids), and did not require hospital
admission/ED visit; or at ‘high risk of AECOPDwith > 2 moderate AECOPD or > 1 severe exacerbation in the last
year (severe AECOPD is an event requiring hospitalization or ED visit).17 The CTS definition of ‘high risk of
AECOPDis equivalent to the frequent exacerbatorlabel used by GOLD.17,21
* Blood eosinophil > 300/mL in patients with a history of AECOPD may be useful to predict a favorable response to
an ICS combination inhaler.
4
‡ Oral therapies = roflumilast, N-acetylcysteine, daily dose azithromycin
AECOPD = acute exacerbations of COPD; CAT = COPD Assessment Test; CTS = Canadian Thoracic Society;
GOLD = Global Initiative for Chronic Obstructive Lung Disease; FEV1 = forced expiratory volume at one second;
ICS = inhaled corticosteroid; LABA = long-acting β2 agonists; LAMA = long-acting muscarinic antagonist; mMRC
= Modified Medical Research Council; SABD = short-acting bronchodilator.
1.3 The Need for Accurate Risk Stratification in COPD
A current problem with major COPD guidelines is that they fail to consider the heterogenous
nature of COPD when risk-stratifying patients. The risk of experiencing an exacerbation greatly
varies across patients.23 Even amongst patients with the same history of exacerbation, there is
substantial heterogeneity in future exacerbation risk.24 The guidelines are overly reliant on
exacerbation history and use categorical labels to define patients as being either an ‘infrequent’
or a ‘frequent’ exacerbator.17,21 This binary classification does not quantify future risk and thus
cannot communicate individualized risk to patients. Although exacerbation history is the single
best predictor for future exacerbations,23,25 there is increasing evidence to suggest that its
predictability of future exacerbations, based on history alone, may be less reliable than
previously thought.26,27 A recent analysis showed that the ‘frequent exacerbator’ classification
can change by 45% over two consecutive years due to chance alone.28 The high variability of
exacerbation history within patients year-to-year, coupled with its uncertain predictive power,
raises serious concerns regarding its suitability to guide pharmacotherapy.26,27,29
1.4 Clinical Prediction Models
Clinical prediction models (also referred to as clinical prediction tools or clinical prediction
algorithms) are multivariable models that combine patient-specific characteristics to estimate the
risk of a clinically important outcome.30 They are designed to improve prognostic accuracy and
risk stratification.30 Prediction models are widely used and are an integral component of care
across different clinical domains. For example, the Framingham Risk Score used in
cardiovascular disease can estimate the 10-year risk of coronary heart disease.31 Unlike binary
5
classifiers, prediction models can quantify and communicate future risk with patients to enable
shared decision-making and personalization of care.30 Personalized risk prediction would enable
targeting therapies to those that would benefit most from them.
Prediction models can be developed by various means, such as, fitting a regression equation or
machine learning algorithm with clinical data. Steyerberg et al.32,33 proposed a systematic
checklist to ensure methodological rigor. The important steps in developing a model include
consideration of the prediction problem, predictor coding, model specification, model estimation,
model performance, model validation, and model presentation. Evaluating validity through
external validation is a critical step to assess model generalizability. Guidelines for reporting on
clinical prediction models have been proposed in the Transparent Reporting of a multivariable
prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement.34
1.5 Assessing the Performance of Clinical Prediction Models
The performance of clinical prediction models is generally evaluated with two key measures.33
Calibration relates to the degree to which the model’s predicted risk aligns with the observed
risk. Calibration can be evaluated through calibration plots, calibration intercept and slope, and
mean calibration. Perfect calibration would be represented with a calibration slope of 1 and an
intercept of 0. Discrimination relates to the model’s ability to discriminate those with the
outcome from those without the outcome. The concordance (c) statistic and the receiver
operating characteristic curve’s (ROC) area under the curve (AUC) are the most commonly used
measures of discrimination.33,35 The c-statistic is equivalent to AUC for binary outcomes. ROC
curves plot true positive rates (sensitivity) against false positive rates (1 – specificity). A high-
quality model has both good calibration and discrimination.
Calibration and discrimination are inherently statistical metrics for the performance of clinical
prediction models.36 Even if a model performs well in both metrics, its clinical utility should be
assessed. Ideally, clinical studies (randomized trials, before-after studies) would be used to
evaluate the model in a clinical setting. However, the ‘potential’ clinical utility of a clinical
prediction model can also be evaluated using the same data (e.g., external validation sample) that
6
is used to assess calibration and discrimination. The decision curve analysis (DCA) is a method
that can quantify the ‘net benefit’ of a prediction model and evaluate its potential clinical
utility.36 The DCA assigns a relative weight between true positive and false positive
classifications of the prediction model based on a specific treatment threshold value.36,37 True
positive cases are those who are classified as high-risk by the model and will experience an
outcome, and false positives are those classified as high-risk but do not experience an outcome.
For example, a treatment threshold of 50% implies that the decision-maker considers the benefit
of a true positive classification to be equal to the harm of a false positive classification; this
weight can be used to calculate the net benefit of the prediction model at a given threshold:
1.6 Updating Clinical Prediction Models
If the performance of a clinical prediction model is shown to be suboptimal in a target
population, it is possible to revise the model to improve its performance.38,39 When applied to
new patient populations, prediction models can suffer significant performance decreases which
can result in harmful decision-making.40 Model updates are often required and can drastically
improve performance to mitigate risks of harm.40 Such model revisions can take on any level of
complexity.4143 For example, if a clinical prediction tool systemically underestimates or
overestimates the risk, and if the underlying model is based on a regression equation, the
intercept of the equation can be adjusted to correct for the biased estimation of average risk.4143
If the model is egregiously miscalibrated and the slope of the calibration plot is significantly
different from 1, the equation slope can also be recalibrated. More complex model updating
involves re-estimation of predictor coefficients and even the addition or removal of predictors.41
43 A stepwise approach is recommended to avoid excessive revision and reduce the risk of
overfitting, especially when the new sample size is small.37 Continual model evaluation across
different settings is critical for the future success of prediction models when they are eventually
disseminated into clinical practice.
7
1.7 Clinical Prediction Models Under Investigation: ACCEPT and Bertens
There are many clinical prediction models for COPD exacerbations. However, a 2017 review of
COPD exacerbation risk prediction models found that most previously developed models did not
undergo robust development and were ultimately considered not ready for clinical
implementation.44 A model by Bertens et al.,3 presented in the review, was the only one with a
low risk of bias and could be considered potentially relevant. The Acute COPD Exacerbation
Prediction Tool (ACCEPT) was developed to address the shortcomings of previously proposed
prediction models and is the first one considered clinically ready.1,37,45 Both prediction models
enable individualized predictions of the future risk of moderate/severe exacerbations in patients
with COPD.1–3 These nuanced predictions allow clinicians to accurately risk-stratify two patients
with identical exacerbation histories to tailor preventive treatment.
ACCEPT was developed using pooled data from three randomized control trials which included
2,380 patients with a mean age of 64.7 years (SD 8.8 years).1 In all three trials, patients who had
a history of at least one exacerbation in the past 12 months were recruited.1,4648 ACCEPT uses
up to 13 predictors which is detailed in its developmental studies.1,2 These include the number of
moderate and severe exacerbations in the previous 12 months, baseline age, sex, current smoking
status (y/n), post-bronchodilator FEV1% predicted, current statin use, domiciliary oxygen
therapy, and body mass index (BMI) as core predictors.1,2 The St. George’s Respiratory
Questionnaire (SGRQ)49 score (or COPD Assessment Test50), and current use of inhaled
pharmacotherapy, such as, LAMA, long-acting β2 agonists (LABA), and ICS are optional
predictors.
The model by Bertens et al.3 was developed using data from a COPD cohort which included 240
patients with a mean age of 73.6 years (SD 5.2 years). Patients aged 65 years or over and
diagnosed with COPD were selected from 51 primary care sites in the Netherlands. The model
uses 4 predictors which include the presence of a moderate or severe exacerbation in the past 12
months, smoking pack-years, FEV1% predicted, and history of vascular disease (i.e., stroke,
transient ischemic attack, or peripheral arterial disease).
8
1.8 Current Knowledge Gaps
Clinical prediction models are not widely adopted in the routine management of COPD despite
their potential benefits. By comparison, contemporary management of cardiovascular diseases is
based on multistep algorithms involving objective risk predictions across all levels.51 The current
standard risk stratification methods for COPD care fail to consider many important patient
characteristics that impact disease management. Traditionally, the integration of prediction tools
has largely been hindered by their ease of use, hampering their clinical sensibility.52 With the
increasing availability of electronic health records (EHR), real-time data retrieval and automatic
risk calculation should fully address this issue. There are no completed studies evaluating the use
of clinical prediction tools in a clinical setting for COPD management to date. Currently, there is
an ongoing cluster randomized clinical trial evaluating the impact of integrating clinical
prediction tools for COPD exacerbations into harmonized EHR in British Columbia.53
Before clinical implementation, a critical issue regarding the applicability of risk prediction
models across different clinical settings was recently brought to light in an extensive study of
104 unique cardiovascular disease risk prediction models.40 This study found that there were
significant decreases in model performance, even with externally validated models, when they
were naively applied to new patient cohorts.40 This concern is increasingly relevant in the COPD
landscape as COPD prediction models are rarely externally validated.44 The heterogeneity
between different patient populations beyond the model’s included predictors can greatly affect
its performance. Thus, a crucial aspect of implementation is to evaluate the prediction model in
different patient cohorts and assess for factors that can finetune the model performance. The
practice of continual evaluation and model updating is of utmost importance to ensure models
yield maximal benefit in real-world practice.4143
Model generalizability is especially important for COPD because exacerbation risk can vary
greatly across different subgroups.5457 These subgroups encompass a combination of factors that
result in different exacerbation risks. For example, Calverley et al.54 showed notable differences
in exacerbation frequency internationally that could not be explained by exacerbation history or
9
differences in baseline characteristics of the patients recruited. These differences may also be
beyond what is captured by a prediction model’s included predictors. To address this, some
prediction models have incorporated setting-specific adjustments into their risk prediction to
enhance their applicability across different settings.38,39,58 Examples of variables that are often
not included in clinical prediction models are geographic region, specialty of care, race/ethnicity,
and gender roles.
Race in particular is a complex socio-cultural variable that is associated with a multitude of
exacerbation risk factors such as smoking, BMI, socioeconomic status, exposure to
environmental pollutants, and differences in quality of care received.55,56,59,60 Existing COPD risk
prediction models incorporate some of the predictors, such as smoking and BMI;1,3,44 previous
studies have reported greater symptom burden and lower lung function in Blacks and Hispanics
compared to Caucasians, which are predictors in some tools. 6163 However, such differences
might reflect differences in disease progression, diagnosis, and care, not all of which are
manifested in the value of a few clinical indices. There may be many other factors associated
with race that may not be adequately accounted for in existing models. Ultimately, these
differences may affect the risk of COPD exacerbations and it is crucial to ensure that risk
prediction models can effectively capture these differences.
1.9 Objectives
The first objective of my thesis was to evaluate the predictive performance and clinical utility of
COPD exacerbation risk stratification algorithms, including exacerbation history alone (current
standard of care for risk stratification) and multivariable prediction models, across patient
cohorts with different background exacerbation risks. Following this, I aimed to determine
whether recalibrating the model with the background exacerbation risk within each cohort could
improve model performance and clinical utility.
Exacerbation risk can be different across race groups due to a variety of factors.55,56 It is
currently not known whether existing predictors in an exacerbation risk prediction model can
capture this difference or if race should be explicitly added to the model. The second objective of
10
my thesis was to assess whether adjusting a prediction model for race could improve its
predictive performance and clinical utility.
1.10 Thesis summary
The encompassing goal of this thesis was to evaluate the performance of COPD exacerbation
risk prediction models across different clinical settings and assess whether they required setting-
specific updates to provide acceptable performance.
In Chapter 2, I report on the generalizability of three risk stratification algorithms across three
sample cohorts representing populations at different levels of background exacerbation risk. I
examined ACCEPT and Bertens et al.’s 3 prediction model compared to exacerbation history
alone for predicting moderate/severe exacerbations in the next 12 months. I measured the
algorithms’ clinical utility using the DCA 36 and quantified predictive performance by measuring
discrimination and calibration. Lastly, I examined the effect of model recalibration on clinical
utility. The results of this chapter showed that although prediction models for COPD
exacerbation risk prediction had better predictive performance compared to exacerbation history
alone, model recalibration is required to confer higher clinical utility.
In Chapter 3, I report on the unadjusted and adjusted differences in exacerbation risk between
race groups. I also applied shrinkage methods to account for heterogeneity in race effects due to
sampling variability.39 I developed a race-adjustment factor for ACCEPT based on the shrunken
race effects and assessed whether an ‘ACCEPT + race’ model has improved discrimination,
calibration, and clinical utility compared to the ACCEPT model. My analyses showed ACCEPT
was well calibrated across the race groups. Further, it showed that most observed race/ethnicity
effects were due to differences in other predictors across race groups and sampling variability.
Thus, a race-adjustment factor did not improve predictive performance, clinical utility, or model
goodness-of-fit.
In Chapter 4, I conclude this thesis by summarizing my findings and discussing their
implications for implementing prediction models into routine COPD management. I identify the
11
strengths and limitations of my studies as well as considerations for future research in the
ongoing refinement of existing COPD prediction models to improve their quality and facilitate
adoption into care.
12
Chapter 2: Generalizability of Risk Stratification Algorithms for
Exacerbations in COPD
2.1 Introduction
Chronic obstructive pulmonary disease (COPD) is a common pulmonary disease whose course is
punctuated by acute episodes of worsening symptoms (breathlessness, excessive sputum, and
coughing), referred to as exacerbations.4 Exacerbation prevention is a cornerstone of
contemporary COPD management. According to the Global Initiative for Chronic Obstructive
Lung Disease (GOLD), exacerbation prevention is achieved by directing pharmacotherapy based
on patients’ prior 12-month history of moderate or severe exacerbations.5,16 This strategy is also
adopted by the Canadian Thoracic Society guidelines.17 While exacerbation history is the single
best predictor for future exacerbations,23,25 relying on history alone for risk prediction may be
sub-optimal as a growing body of evidence suggests that the predictability of exacerbations
based on history alone may be less reliable than previously believed.26,27 A recent study
demonstrated high variability in exacerbation history within patients from one year to another
due to chance alone, which raised serious doubts regarding the suitability of this approach in
guiding pharmacotherapy.29
Clinical prediction tools are multivariable models that combine several patient characteristics to
increase the accuracy of risk stratification.30 Unlike exacerbation history alone, they can quantify
(e.g., in risk %) and communicate future risk with patients to enable shared decision-making.
Importantly, prediction models are flexible and can be updated to accommodate background risk
in different settings.38,39 This is likely to be a critical issue in COPD exacerbation risk prediction,
as a recent study has demonstrated wide variability in exacerbation rates across the world, even
among patients who had the same 12-month exacerbation history at baseline.54 Given this,
models that are developed in high-risk settings may perform poorly when applied to low-risk
settings, and vice versa.
13
The sensitivity of model performance to background event rate was highlighted in a recent
review evaluating the performance of 104 unique cardiovascular disease risk prediction
models.40 This study found that when models were naively applied to new patient cohorts, there
was a significant drop in model performance, arising from poor calibration, which in some cases
had the potential to cause patient harm (the clinical utility of the model being lower than not
doing any risk stratification).40 In contrast, when models incorporated the background risk of the
outcome in the target population, clinical utility was significantly enhanced.40 This suggests that
risk stratification algorithms need to be flexible and adjustable for optimal clinical
implementation. The primary objective of this study was to evaluate the clinical utility of risk
stratification algorithms, including multivariable risk prediction models and exacerbation history
alone, across cohorts with different exacerbation risks. The secondary objective was to determine
whether model recalibration with the observed exacerbation risk within each sample can improve
their clinical utility.
2.2 Methods
2.2.1 Risk Stratification Algorithms
I compared the GOLD risk stratification label of ‘frequent exacerbators’ (defined as having ≥2
moderate or ≥1 severe exacerbation) with two published, validated COPD exacerbation risk
prediction models. The first model was from Bertens et al.3 (henceforth referred to as ‘Bertens’).
This was the only model that a 2017 comprehensive systematic review of COPD exacerbation
risk prediction models considered to have undergone robust development and external
validation.44 Bertens uses 4 predictors which include the number of exacerbations in the previous
12 months, forced expiratory volume at one second (FEV1) expressed as % predicted, pack-years
of smoking, and a history of vascular disease.3 The second model was the latest version of the
Acute COPD Exacerbation Prediction Tool (ACCEPT), which was developed to enable
individualized predictions of the rate and severity of exacerbations.1,2 ACCEPT uses up to 13
predictors including the number of non-severe and severe exacerbations in the previous 12
months, baseline age, sex, current smoking status (y/n), post-bronchodilator FEV1% predicted,
current use of statins as a surrogate for cardiovascular disease risk, domiciliary oxygen therapy,
14
and body mass index (BMI).1,2 The St. George’s Respiratory Questionnaire (SGRQ)49 score (or
COPD Assessment Test50), as well as current use of inhaled long-acting muscarinic receptor
antagonists (LAMA), long-acting β2 agonists (LABA), and inhaled corticosteroids (ICS) are
optional predictors. I used the full version of ACCEPT which requires all of the above-
mentioned predictors.
2.2.2 Sources of Data
I used data from three randomized clinical trials representing 3 levels of exacerbation risk: the
placebo arm of the Study to Understand Mortality and Morbidity in COPD (SUMMIT,
N=2,421)64, the Long-term Oxygen Treatment Trial (LOTT, N=595)65, and the placebo arm of
the Towards a Revolution in COPD Health (TORCH, N=1,091)66. Both treatment arms from
LOTT were used because, unlike the other two studies, there were no significant differences in
the exacerbation rate between the treatment arms. Patients in SUMMIT had on average a higher
post-bronchodilator FEV1 compared to patients in LOTT and TORCH. TORCH had the greatest
proportion of patients with a previous 12-month history of exacerbations, followed by LOTT and
then SUMMIT. These resulted in a gradient of exacerbation risk across SUMMIT (low risk),
LOTT (medium risk), and TORCH (high risk).
Missing predictor values were imputed with multiple imputation according to the same
methodology as described previously.2 In brief, I performed 10 repetitions and the final
prediction values were based on the mean predicted values of all repetitions. No participant
received a LAMA in TORCH as it was not widely available during the study period and
concomitant use was also not permitted for this study. However, setting the value of LAMA in
this dataset to zero would be inappropriate, as such form of non-use would not be tantamount to
not using the medication if it were available. Similar to the approach taken in a previous study,2
LAMA values were imputed for TORCH. SGRQ scores (used for ACCEPT) were the only other
missing predictor (SUMMIT had 1,708 participants and TORCH had 264 participants with
missing SGRQ scores). All analyses were done in R 4.1.2 (R Foundation for Statistical
Computing, Vienna, Austria). Ethics Approval was obtained from the University of British
Columbia’s Human Ethics Board (H22-01462).
15
2.2.3 Primary Outcome
The primary outcome was the prospective 12-month risk of a moderate/severe exacerbation.
Moderate exacerbations were those that required treatment with systemic corticosteroids and/or
antibiotics. Severe exacerbations were those that resulted in emergency department visits or
hospitalizations. These event-based definitions were used by all three trials and are in alignment
with the definition promulgated by GOLD.16
2.2.4 Discrimination of Risk Stratification Algorithms
Model discrimination (the ability of a risk stratification algorithm to distinguish high- versus
low-risk patients) was assessed by receiver operating characteristic (ROC) curves and calculating
the area under the curve (AUC). Time-dependent (at 12 months) ROC curves and AUCs were
used to account for the variability in follow-up time across patients and were compared using the
DeLong test.6769
2.2.5 Clinical Utility of Risk Stratification Algorithms
Clinical utility was measured through net benefit calculations using the decision curve analysis
(DCA).36 Whereas discrimination evaluates the statistical performance, the DCA provides a
comprehensive assessment of the clinical utility of risk stratification to inform treatment
decisions.36 The underlying principle for the DCA is that a given treatment threshold specified to
separate high-risk from low-risk individuals implies a relative weight between true positive and
false positive classifications.36,37 Here, true positive cases are those who are classified as high-
risk and will actually experience an exacerbation in the next 12 months, and false positives are
those who are predicted to exacerbate but do not experience an exacerbation in follow-up. For
example, a treatment threshold of 50% implies that the decision-maker considers the benefit of a
true positive classification to be equal to the harm of a false positive classification. Such a weight
can then be used to calculate the net benefit of a risk stratification algorithm at a given threshold.
Because no treatment thresholds are formally identified for exacerbation risk prediction in
COPD, I evaluated the net benefit curve at three different thresholds (low:0.22, medium:0.38,
16
and high:0.52) which corresponds to the annual observed exacerbation risks in each cohort. The
net benefit of risk stratification algorithms should always be compared against two default
strategies that do not require risk stratification: treating no patients and treating all patients. A
risk stratification algorithm was considered harmful if it generated a lower net benefit than either
of the default strategies at a given threshold.40 To account for variable follow-up time, a time-
dependent decision curve analysis was conducted, using the methodology described by Vickers
et al.70 I did not report 95% confidence intervals for the decision curves because statistical
inference for a measure of clinical utility is not a relevant concept for decision making.71
2.2.6 Risk Prediction Model Recalibration
Because multivariable risk prediction models generate numerical estimates of risks, their
calibration (how well the predicted risks agree with the observed risks) can be evaluated and, if
necessary, updated. Model calibration was assessed by comparing the predicted and observed
risks through calibration plots. Individuals were first grouped into deciles based on their
predicted risk. The mean observed risk for each decile was then plotted against the predicted
risk. Each risk prediction model was separately recalibrated within each cohort by adjusting the
model intercept with a fixed odds-ratio transformation using the sample’s observed outcome risk
(recalibration-in-the-large).41,72 Of note, such monotonical transformation does not change the
AUC of the model.
2.3 Results
2.3.1 Participants
The baseline characteristics of the three study samples are summarized in Table 2-1. SUMMIT
included 2,421 patients (mean age 65.9 years, 75.1% male) and contributed 636 exacerbations.
LOTT included 595 patients (mean age 69.7 years, 72.9% male) and contributed 369
exacerbations. TORCH included 1,091 patients (mean age 65.5 years, 77.2% male) and
contributed 1,074 exacerbations. The annual risk of exacerbations in SUMMIT, LOTT, and
TORCH was 0.22, 0.38, and 0.52, respectively.
17
Table 2-1. Baseline Characteristics of the Study Sample of the Included Trials
SUMMIT
LOTT
TORCH
2,421
595
1,091
0.83 (0.24)
0.95 (0.15)
0.96 (0.13)
65.9 (7.9)
69.7 (7.3)
65.5 (8.2)
1,818 (75.1)
434 (72.9)
842 (77.2)
28.1 (5.7)
28.8 (6.3)
25.6 (5.3)
1144 (47.2)
134 (22.5)
462 (42.3)
40.6 (24.5)
60.7 (32.7)
48.4 (26.5)
624 (25.8)
374 (62.9)
601* (55.1)
1308 (54.0)
436 (73.3)
372 (34.1)
1258 (52.0)
442 (74.3)
518 (47.5)
18
SUMMIT
LOTT
TORCH
43.7 (16.2)
48.7 (18.5)
45.7 (16.9)
58.3 (11.4)
46.7 (16.8)
44.9 (13.9)
0.23
0.38
0.48
0.22
0.38
0.52
*LAMA use is based on imputed values.
BMI = body mass index; FEV1 = forced expiratory volume at one second; ICS = inhaled corticosteroid; LABA =
long-acting β2 agonists; LAMA = long-acting muscarinic receptor antagonists; SGRQ = St. George’s Respiratory
Questionnaire
2.3.2 Discrimination
A summary of the AUC values can be found in Table 2-2. The AUC for exacerbation history
alone in predicting future exacerbations in SUMMIT, LOTT, and TORCH was 0.59 (95%CI
0.57–0.61), 0.63 (95%CI 0.59–0.67), and 0.65 (95%CI 0.63–0.68), respectively. Bertens had a
higher AUC compared to exacerbation history alone in SUMMIT (increase of 0.10, P-value
<0.001), and TORCH (increase of 0.05, P-value <0.001), but not in LOTT (increase of 0.01, P-
value 0.84). ACCEPT had higher AUC compared with exacerbation history alone in all study
samples, by 0.08 (P-value <0.001), 0.07 (P-value 0.001), and 0.10 (P-value <0.001),
respectively. Compared to Bertens, ACCEPT had higher AUC by 0.06 (P-value 0.001) in LOTT
and 0.05 (P-value <0.001) in TORCH, whereas the AUCs were not different in SUMMIT
(change of -0.02, P-value 0.16). ROC curves can be found in Appendix A.1.
19
Table 2-2. Time-dependent AUC at 12 Months
Time-dependent AUC (95% Confidence Interval)
SUMMIT
LOTT
TORCH
Exacerbation
History
0.59
(0.57 - 0.61)
0.63
(0.59 - 0.67)
0.65
(0.63 - 0.68)
Bertens
0.69 a
(0.66 - 0.72)
0.64
(0.59 - 0.69)
0.70 a
(0.67 - 0.74)
ACCEPT
0.67 a
(0.63 - 0.70)
0.70 a, b
(0.65 - 0. 74)
0.75 a, b
(0.72 - 0.78)
ACCEPT (Acute COPD Exacerbation Prediction Tool)
a Statistically significant compared to exacerbation history
b Statistically significant compared to Bertens
2.3.3 Calibration
Calibration plots of the average risk of exacerbations per decile are presented in Appendix A.2.
In SUMMIT, Bertens was well calibrated and showed good agreement between observed and
predicted risk (observed risk 0.22 vs predicted risk 0.20). Comparatively, ACCEPT
overestimated the risk with a predicted annual risk of 0.34. In LOTT, Bertens underestimated the
risk (observed risk 0.38 vs predicted risk 0.27) while ACCEPT overestimated the risk (predicted
risk 0.53). In TORCH, Bertens underestimated the risk (observed risk 0.52 vs predicted risk
0.28) while ACCEPT was well calibrated with a predicted risk of 0.51.
After model recalibration, the mean adjusted predicted risk of exacerbation for both Bertens and
ACCEPT matched the observed risks in the study samples. Because Bertens was already well
20
calibrated in SUMMIT, and ACCEPT in TORCH, the improvements were relatively minor for
each model in the respective studies. A summary of the unadjusted and adjusted risk compared to
the observed risk of exacerbation is presented in Appendix A.3.
2.3.4 Net benefit
The decision curves for all risk stratification algorithms are presented in Figure 2-1. The
algorithm with the highest net benefit at the three pre-specified thresholds within each sample are
provided in Table 2-3.
21
Figure 2-1. Decision Curve Analysis of Risk Stratification Algorithms
Decision curve analysis comparing the net benefit of the risk stratification algorithms when unadjusted and adjusted
with the sample-specific exacerbation risk. (A) Unadjusted prediction models in SUMMIT; (B) Adjusted prediction
models in SUMMIT; (C) Unadjusted prediction models in LOTT; (D) Adjusted prediction models in LOTT; (E)
Unadjusted prediction models in TORCH; (F) Adjusted prediction models in TORCH.
22
Table 2-3. Dominating Risk Stratification Algorithms at the Three Threshold Levels
UNADJUSTED
SUMMIT
LOTT
TORCH
LOW
B
A
A
MEDIUM
B/Hx
B
A
HIGH
Hx
Hx
A
ADJUSTED
LOW
B
A
A/B
MEDIUM
A/ Hx
A/B/Hx
A/B
HIGH
A/Hx
A
A/B
Unadjusted: Prediction models without adjustment for background exacerbation risk. Adjusted: Prediction models
adjusted for background exacerbation risk
A: ACCEPT (Acute COPD Exacerbation Prediction Tool), B: Bertens, Hx: Exacerbation History
In SUMMIT, Bertens and exacerbation history outperformed ACCEPT. Bertens dominated at the
low threshold, whereas exacerbation history dominated at the high threshold. In LOTT, no risk
23
stratification algorithm clearly dominated. ACCEPT was the best at the low threshold, Bertens at
the medium threshold, and exacerbation history at the high threshold. In TORCH, ACCEPT
dominated the other algorithms at all three threshold values.
All three risk stratification algorithms were associated with a risk of harm (their net benefit being
lower than that of treating no patients or treating all patients). Exacerbation history had lower net
benefit than treating all patients at the low threshold in LOTT, and at the low and medium
thresholds in TORCH. Bertens had lower net benefit than treating all patients at the low
threshold and treating no patients at the high threshold in LOTT. Bertens was also worse than
treating all patients at the low threshold in TORCH. ACCEPT had lower net benefit than treating
no patients at the medium threshold in SUMMIT and at the high threshold in LOTT.
The clinical utility of both prediction models greatly improved following recalibration. The
recalibrated ACCEPT either dominated or was no worse than exacerbation history across all
three thresholds and samples. Use of ACCEPT was no longer harmful at any of the thresholds in
the study samples. In SUMMIT, there was a significant improvement in ACCEPT’s net benefit
at all threshold levels. In LOTT, both prediction models had improved clinical utility at different
thresholds. Bertens showed improvement at the low threshold, ACCEPT at the medium
threshold, and both at the high threshold. In TORCH, the recalibrated Bertens showed significant
improvements and was now tied with ACCEPT at all thresholds. The use of Bertens was no
longer harmful at any threshold.
2.4 Discussion
Accurate prediction of exacerbation risk is essential for COPD management. Contemporary
guidelines are reliant on a 12-month exacerbation history to risk-stratify patients for the choice of
preventive therapies. However, it has been shown that exacerbation history alone cannot account
for the variability in exacerbation risk imposed by other factors.24,26,27 Multivariable risk
prediction models, which combine other characteristics with exacerbation history, have been
developed to improve predictive powers. This study evaluated the statistical performance and
clinical utility of three risk stratification algorithms in three clinical cohorts representing patients
24
with varying levels of exacerbation risk. I found that risk prediction models generally had better
discriminatory performance compared with exacerbation history alone. However, when
considering clinical utility, no algorithm emerged as universally better than others. Critically, it
was found that all three risk stratification algorithms had the risk of causing harm.
These results have clinical implications. First, in high exacerbation risk settings, use of
exacerbation history alone to guide therapeutic choices may cause harm when the therapy has a
low to medium treatment threshold, such as therapies which have a low risk of adverse events
and are relatively inexpensive (e.g., LAMA+LABA therapy). In such instances, the clinical
utility of exacerbation history alone might be below that of providing such low-risk therapies to
all patients. The only common clinical scenario in which exacerbation history may be useful is in
guiding therapeutic choices for therapies that have high treatment thresholds (i.e. fraught with
significant side effects) such as azithromycin or oral roflumilast.73,74
Another important finding is that prediction models cannot be universally applied; rather, they
should be adapted for different patient populations based on the overall exacerbation risk. A
recent study examining international differences in the frequency of COPD exacerbations
showed that there were large variations in exacerbation risk even among individuals who had the
same exacerbation history.54 Evidence from other specialties indicates that in the face of such
heterogeneity, risk stratification algorithms can be associated with low clinical utility. Gulati et
al.40 found that even widely used prediction models for cardiovascular diseases, such as the
Framingham Risk Score, can be harmful in settings where the risk of the targeted event was
significantly different from that of the derivation cohort.40 I showed that when Bertens, which
was developed in a cohort with a low risk of exacerbation (annual observed exacerbation risk:
0.16),3 was applied to a high exacerbation risk cohort, it performed poorly with a significant risk
of causing harm to patients. ACCEPT, on the other hand, was developed using data from clinical
trials that only recruited patients with a positive exacerbation history,1,2 and as such performed
sub-optimally in patients with low risk of exacerbations. In addition to causing harm, a model
that misestimates risk can lead to over or under-treatment compared with guidelines.
25
The capability of being adaptable to different local settings is a key feature differentiating risk
prediction models from binary classifiers like exacerbation history. Such model revision can take
any level of complexity.4143 A stepwise approach to model updating is recommended to avoid
extensive revision and the dangers of overfitting in new samples.43 The most accessible approach
is through an intercept adjustment based on an estimate of the outcome risk in the target
population.41,43 When I applied this methodology, there were major improvements to model
calibration, and subsequently, improvements to the clinical utility of both risk prediction models.
Similar to the study by Gulati et al.40, I showed that the risk of harm was substantially mitigated
by adjusting the models for background risk of outcome in the cohort. The updated risk
prediction models were mostly superior to exacerbation history alone at all selected treatment
thresholds in all three cohorts. These results highlight the importance of model flexibility to
incorporate background risk as well as other local factors in generating predicted risk
estimates.41–43
The defining strength of this study is that it follows best practices and evaluates the performance
of existing prediction models in different patient populations rather than developing new
ones.42,75 Prediction models, specifically in COPD, are rarely externally validated in new
populations.76 In general, clinical prediction models that have been externally validated are only
evaluated once and it has been shown that model performance greatly varies when evaluated
across multiple cohorts.76,77 Model performance tends to decrease in new patient populations
which can be detrimental to their clinical utility.40 However, the adaptability of risk prediction
models means that it is possible to substantially improve model performance and mitigate the
risk of harm in new patient cohorts by adjusting the model to the target population.
Our study had several limitations. I evaluated only two risk prediction models; therefore, our
results may not be generalizable to other models in this clinical domain. Missing data for
predictor values required imputation. However, the only missing predictor values were for
ACCEPT (LAMA and SGRQ score) and it has been shown that ACCEPT’s results are robust to
the absence of these values.2 Both prediction models incorporate cardiovascular risk into their
risk prediction, and as such, may have reduced discrimination in SUMMIT because all
26
participants had cardiovascular risk factors by inclusion criteria. Our net benefit analysis was
over a range of treatment thresholds because there are no formal treatment thresholds for COPD
exacerbations. Nonetheless, the flexibility of the DCA allows net benefit to be assessed across all
treatment thresholds and thus allows these results to be revisited once additional studies are
performed to identify relevant thresholds. Lastly, although intercept adjustment significantly
improved model performance, further studies are needed to examine more nuanced model
updating strategies capable of further improving model generalizability.
2.5 Conclusion
Current COPD management guidelines recommend preventive exacerbation therapy based on the
patients’ exacerbation history, but this strategy can be associated with harm in certain situations.
Multivariable risk prediction models, if well-calibrated, can provide a more accurate risk
prediction; however, they can suffer from miscalibration if applied to cohorts that are dissimilar
to ones that were used in their development. The clinical utility of these risk prediction models
can be significantly enhanced if they are calibrated to the background exacerbation risk in each
population. This highlights the importance of model evaluation and possible re-calibration before
they are deployed in a new clinical setting
27
Chapter 3: Assessing the Impact of Race on the Predictive Performance of a
COPD Exacerbation Risk Prediction Model
3.1 Introduction
In chronic obstructive pulmonary disease (COPD), exacerbations are major drivers of clinical
deterioration and mortality.4,14 The burden of COPD exacerbations on the healthcare system is
projected to worsen given an aging population, continued exposure to risk factors, and gaps in
care.12 Accurate assessment of risk and prevention of future exacerbations is crucial in the
contemporary management of COPD.5,16 Currently, exacerbation history is the single best
predictor of future exacerbations and is the primary tool for risk stratification in major COPD
guidelines.5,16 However, there are concerns regarding its reliability for risk stratification and
guiding therapy because exacerbation history can greatly vary from year to year.26,27,29
Efforts have been made to combine other predictors with exacerbation history in multivariable
prediction models to better predict prospective exacerbation risk.1,44 These models are important
tools for quantifying risk, providing clinical guidance for healthcare providers, and informing
patients. The Acute COPD Exacerbation Prediction Tool (ACCEPT) was a recently developed
model that predicts individualized risk of moderate/severe COPD exacerbations based on clinical
information.1,2 Although designed to improve accuracy in prediction, the applicability of
prediction models can be hampered by differences in their developmental and target settings that
are not captured by their included predictors.40,78 This heterogeneity in the ‘case mix’ can
decrease model performance and ultimately result in potential harm to patients when prediction
models are naively applied in different settings.40,78
In the previous chapter, I showed that even externally validated COPD prediction models could
be miscalibrated when applied to different settings.78 Such differences highlight the importance
of model flexibility to provide accurate setting-specific predictions. Part of the difference in
exacerbation risk across populations can be attributed to patient, setting, and system factors that
are not included in a prediction model. For example, a recent study evaluating clinical trial data
28
has shown significant variability in exacerbation risk between countries, even amongst the
relatively homogeneous samples of patients included in the clinical trials.54 This implies that
including a ‘country effect’ in COPD exacerbation risk prediction may improve model
performance by potentially accounting for a proportion of the unexplained heterogeneity. Other
factors that can explain such heterogeneity but are often left out of conventional risk-scoring
tools include socioeconomic status, gender roles (over and above sex as a variable), race, and
socioeconomic status.
In this chapter, I focused on race as a setting-specific adjustment factor for ACCEPT. Across
other disease domains, race has been used to adjust prediction outputs.79 There is ongoing
discussion regarding the ethics of including race in prediction algorithms, especially related to
the hotly debated issue of ‘algorithmic fairness’.79,80 Arguments for excluding race stem from the
lack of biologically substantiated evidence and the potential to perpetuate race-based health
inequities.79 Contrastingly, the exclusion of race may hamper prediction accuracy and have a
significant impact on recommended care.80,81 It is currently not known whether adjusting
ACCEPT for race can improve its predictive performance. The objective of my study is to
investigate the impact of race-adjustment, using a random-effects approach, on ACCEPT’s
discrimination, calibration, and clinical utility.
3.2 Methods
3.2.1 Sample Data
My sample included data from three randomized clinical trials, as was used in my previous study
(n=4,097): the placebo arm of the Study to Understand Mortality and Morbidity in COPD
(SUMMIT, N=2,421),64 the Long-term Oxygen Treatment Trial (LOTT, N=594),65 and the
placebo arm of the Towards a Revolution in COPD Health (TORCH, N=1,082).66 I used both
arms from LOTT because the exacerbation rate in the treatment and placebo arm was not
statistically significantly different, unlike the other included studies. Race and ethnicity data
were self-reported in all three clinical trials.
29
The race variable was categorized into 5 groups: Caucasian, Black, Asian, Hispanic, and
Indigenous. Indigenous was comprised of trial participants listed as ‘Native Hawaiian or Pacific
Islander’ and ‘American Indian or Alaska Native’. Hispanic was categorized as a ‘race’ in
TORCH but as a binary variable under ‘ethnicity’ in SUMMIT and LOTT; in these 2 trials,
individuals could be any other race plus Hispanic ethnicity. My study categorized all individuals
with Hispanic ethnicity as Hispanic race, similar to a widely cited study examining racial/ethnic
disparities in diabetes prevalence.82 Thus, all other race groups were considered non-Hispanic. I
categorized individuals with multiple races based on their non-Caucasian race; if Indigenous was
included in the combination, the individual was assumed to be Indigenous. There were no non-
Caucasian race combinations. Participants were excluded from my analysis if they were
categorized as ‘other’ for race and when no information on race was available.
The strategy for imputing missing data with multiple imputation is detailed in previous studies
with ACCEPT.2,78 I performed 10 iterations of imputation and used the mean prediction value of
the iterations. In TORCH, no participant received a LAMA because of lack of availability, and
concurrent use was not permitted during the study. Setting LAMA use to zero for all patients
would not be appropriate because this form of non-use would not be representative of non-use if
LAMAs were available. Thus, LAMA use was imputed for TORCH participants.2,78 St. George’s
Respiratory Questionnaire (SGRQ)49 scores were missing for patients in TORCH (n=264) and
SUMMIT (n=1,708) and were imputed. All analyses were done in R 4.1.2 (R Foundation for
Statistical Computing, Vienna, Austria). Ethics Approval was obtained from the University of
British Columbia’s Human Ethics Board (H22-01462).
3.2.2 Clinical Prediction Tool
I used the latest version of ACCEPT in my analysis.1,2 ACCEPT uses up to 13 predictors to
generate quantifiable predictions for moderate/severe exacerbations.1,2 The core predictors
include the 12-month history of moderate and severe exacerbations, age, sex, current smoking
status (y/n), post-bronchodilator forced expiratory volume in 1 second % predicted (FEV1%),
current statin use, domiciliary oxygen use, and body mass index (BMI).1,2 Optional predictors
include current use of COPD inhaled pharmacotherapy such as long-acting muscarinic receptor
30
antagonists (LAMA), long-acting β2 agonists (LABA), and corticosteroids (ICS) as well as
COPD symptom scores such as the SGRQ 49 score (or the COPD Assessment Test).50 I used the
full version of ACCEPT with all 13 predictors. Predictor values were collected at the beginning
of the trial period and outcome was assessed in the following 12-month period.
Exacerbations were categorized using the event-based definition, which is in accordance with the
Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines and the included
trials.16 Exacerbations that required systemic corticosteroids and/or antibiotics but did not result
in an inpatient visit were classified as moderate. Severe exacerbations required either an
emergency department visit or a hospitalization.
3.2.3 Racial Differences in Exacerbation and Model Adjustment
I estimated the unadjusted and adjusted racial differences in 12-month moderate/severe
exacerbation risk using fixed-effects and random-effects regression analyses that included
dummy variables for each race. I used generalized linear models with a binomial distribution and
a logit link function. The unadjusted analysis included race as the only independent variable and
observed exacerbation frequency as the dependent variable. The adjusted analyses accounted for
ACCEPT’s prediction output as an additional independent variable. I compared each race to the
unadjusted or adjusted average exacerbation risk. This was done by setting the regression
analyses offset variable as the log average exacerbation risk for the unadjusted analysis and as
the intercept of the adjustment model for the adjusted analysis.
To adjust ACCEPT’s prediction output for race, I developed an intercept adjustment factor using
a random-effects approach. Whereas the fixed-effects method estimates group differences in
isolation, a random-effects approach has been proposed to assess the distribution of outcome
differences across different settings.39,8386 The random-effects approach uses information from
other settings to regress extreme estimates to the mean. For example, settings that have
extremely low/high exacerbation risk would actually be more likely to have true risks closer to
the average than to have more extreme outcomes. Random-effects updating has been applied in
cardiology and renal prediction models and is viewed as the method of choice when different
31
levels of effects are considered.39,8386 I adapted Steyerberg et al.’s 39 methodology to quantify a
shrunk estimator for race which was used as the race-specific intercept adjustment factor for
ACCEPT (henceforth referred to as ‘ACCEPT + Race’).
3.2.4 Model Goodness-of-fit
To evaluate whether race-adjustment improved model fit, I used the likelihood-ratio χ2 test
(LRT) to compare the fixed-effect model of ACCEPT + Race with ACCEPT alone. I also
assessed a series of nested models, which were comprised of bivariate analyses with each of
ACCEPT’s predictors as the independent variable and observed exacerbation frequency as the
dependent variable. Using the LRT, the nested models were compared to their equivalent race-
adjusted model. This was done to assess which predictor impacted the effect of race. If adding
race statistically significantly improved the nested models’ goodness of fit, then that variable was
assumed to not effectively capture the observed racial differences.
3.2.5 Model Performance
I compared the performance of ACCEPT to ACCEPT + Race by evaluating discrimination,
calibration, and clinical utility. Discrimination is a prediction model’s ability to correctly
distinguish individuals who experience an outcome from those who do not. I measured
discrimination by using time-dependent receiver operating characteristic (ROC) curves and
calculating the area under the curve (AUC) to account for variable follow-up time across
patients. The DeLong test was used to assess statistical significance.6769 Calibration is the extent
to which the model’s predicted risk aligns with the observed risks. I evaluated this with
calibration plots by first grouping individuals into deciles based on their predicted risk. The
mean observed risk for each decile was then plotted against the predicted risk. Calibration
intercept and slope were also examined to compare differences between the models.
Clinical utility was measured using the decision curve analysis (DCA). Both discrimination and
calibration are statistical measures that do not inform whether the model improves clinical
decisions.36,87 The DCA was developed to overcome this limitation and assess the model’s
performance in supporting decision-making.36 The core principle of the DCA is that a given
32
treatment threshold, at which treatment is recommended, implies a relative weight between true
and false positive classifications.36,37 For example, a treatment threshold of 50% implies an equal
value for the benefit of a true positive classification compared to the harm of a false positive
classification. This relationship can be used to quantify net benefit at any given treatment
threshold. I assess clinical utility using a time-dependent DCA to account for variable follow-up
time.70 Calibration, discrimination, and clinical utility were all assessed at the 12-month point in
follow-up.
3.3 Results
3.3.1 Participants
Table 3-1 summarizes the patient characteristics of the three samples. SUMMIT contributed
2,421 patients (mean age 65.9 years, 75.1% male, 77.9% Caucasian) and had an observed
moderate/severe exacerbation risk of 0.22. LOTT contributed 594 patients (mean age 69.7 years,
72.9% male, 84.5% Caucasian) and had an observed moderate/severe exacerbation risk of 0.38.
There was 1 patient in LOTT that had no race information and was excluded from the analysis.
TORCH had contributed 1,082 patients (mean age 65.6 years, 77.2% male, 81.1% Caucasian)
and had an observed moderate/severe exacerbation risk of 0.52. There were 9 patients in TORCH
that had no race information and were excluded from the analysis. Baseline characteristics
stratified by race can be found in Appendix B.1.
Table 3-1. Baseline Characteristics of Included Participants
SUMMIT
LOTT
TORCH
N
2421
594
1082
Mean (SD)
Follow-up time, yr
0.83 (0.24)
0.95 (0.15)
0.96 (0.13)
Age, yr
65.9 (7.9)
69.7 (7.3)
65.6 (8.1)
BMI, kg/m2
28.2 (5.7)
28.8 (6.3)
25.6 (5.3)
SGRQ
43.9 (10.4)
48.7 (18.5)
45.6 (16.9)
FEV1 % Predicted
58.3 (11.4)
46.7 (16.8)
45.0 (13.9)
33
SUMMIT
LOTT
TORCH
Count (%)
Males
1818 (75.1)
433 (72.9)
835 (77.2)
Current smoking
status
1144 (47.3)
134 (22.6)
458 (42.3)
LAMA
624 (25.8)
373 (62.8)
596 (55.0)
LABA
1308 (54.0)
435 (73.2)
369 (34.1)
ICS
1258 (52.0)
441 (74.2)
513 (47.4)
Statin
1562 (64.5)
256 (43.1)
655 (60.5)
Oxygen
69 (2.9)
90 (15.2)
75 (6.9)
Race
Caucasian
1887 (77.9)
502 (84.5)
877 (81.1)
Asian
349 (14.4)
3 (0.5)
154 (14.2)
Black
31 (1.3)
64 (10.8)
13 (1.2)
Hispanic
131 (5.4)
7 (1.2)
38 (3.5)
Indigenous
23 (1.0)
18 (3.0)
0 (0.0)
History of > 1
moderate/severe
exacerbation
0.23
0.38
0.47
Observed risk of > 1
moderate/severe
exacerbation
0.22
0.38
0.52
BMI = body mass index; FEV1 = forced expiratory volume at one second; ICS = inhaled corticosteroid; LABA =
long-acting β2 agonists; LAMA = long-acting muscarinic receptor antagonists; SGRQ = St. George’s Respiratory
Questionnaire
3.3.2 Racial Differences in Exacerbation Risk
Table 3-2 summarizes the unadjusted and adjusted racial differences in moderate/severe
exacerbations, as well as the shrunk estimator from the random-effects model. The unadjusted
effect of race on moderate/severe exacerbations was statistically significant in the LRT (P =
0.003). These differences ranged from risk ratios (RRS) of 0.96 for Caucasians to 1.57 for
Indigenous. After adjusting for ACCEPT, the estimated fixed-effect differences are reduced
corresponding to RRs between 0.97 and 1.28. The random-effects shrinkage further reduced the
differences to a range of 0.99 to 1.07. The more uncertain estimates experienced greater
shrinkage towards 1, as seen in the Indigenous group compared to Caucasians.
34
Table 3-2. Racial Differences in Exacerbation
Race
Unadjusted
95% CI
Fixed-
Effects
95% CI
Random-
Effects
Caucasian
0.96
0.92 - 1.01
0.97
0.92 - 1.02
0.99
Asian
1.21
1.06 - 1.37
1.16
1.02 - 1.31
1.07
Black
1.31
1.02 - 1.64
1.02
0.80 - 1.28
1.03
Hispanic
1.01
0.80 - 1.25
1.11
0.88 - 1.37
1.04
Indigenous
1.57
1.06 - 2.22
1.28
0.86 - 1.81
1.04
RRs for unadjusted, fixed-effects and random-effects adjusted exacerbation risk.
CI = Confidence interval; RR = Risk ratio
3.3.3 The Effect of Individual Predictors
Table 3-3 shows a series of nested models compared to its race-adjusted counterpart with regard
to goodness-of-fit. Race was a statistically significant predictor of exacerbation risk when
examined with each of ACCEPT’s predictors in isolation, with exacerbation history being the
only exception (P = 0.067). As well, race did not have an independent effect in the model with
ACCEPT’s risk score (P = 0.145).
Table 3-3. Model Series Evaluating Goodness-of-fit
Nested Model
Nested model
deviance
Race-adjusted
model deviance
Δ Deviance
P-value
Exacerbation
history
4567.5
4558.8
8.8
0.067
Sex
5634.0
5615.3
18.7
0.001
Age
5637.5
5621.8
15.7
0.003
Oxygen use
5529.6
5513.7
15.9
0.003
Current
smoker
5613.3
5600.3
13.0
0.011
BMI
5561.6
5551.7
9.9
0.042
FEV1 %
predicted
5257.4
5246.8
10.6
0.031
SGRQ
5503.3
5483.9
19.4
0.001
Statin use
5632.4
5614.1
18.3
0.001
35
Nested Model
Nested model
deviance
Race-adjusted
model deviance
Δ Deviance
P-value
LAMA use
5405.9
5386.4
19.5
0.001
LABA use
5640.0
5623.7
16.2
0.003
ICS use
5596.5
5580.4
16.1
0.003
ACCEPT
4023.8
4016.9
6.8
0.145
A statistically significant improvement (P < 0.05) indicates that goodness-of-fit improved with the addition of race.
BMI = body mass index; FEV1 = forced expiratory volume at one second; ICS = inhaled corticosteroid; LABA =
long-acting β2 agonists; LAMA = long-acting muscarinic receptor antagonists; SGRQ = St. George’s Respiratory
Questionnaire
3.3.4 The Effect of Adjusting for Race on Model Performance
The AUCs for ACCEPT in predicting future exacerbations in SUMMIT, LOTT, and TORCH
were 0.67 (95%CI 0.63 – 0.70), 0.70 (95%CI 0.65 – 0.74), and 0.74 (95%CI 0.71 – 0.77),
respectively. The Δ AUC for ACCEPT-Race were <0.01 in all samples and not statistically
significant with P-values of 0.30, 0.20, and 0.17, respectively. Calibration plots of ACCEPT
compared to ACCEPT + Race are presented in Figure 3-1. There were no discernible changes in
calibration when ACCEPT was adjusted for race. In SUMMIT, LOTT, and TORCH, the Δ
calibration intercepts were 0.03, 0.01, 0.02, and the Δ calibration slopes were 0.007, 0.004,
0.002, respectively. The mean predicted exacerbation risk stratified by race can be found in
Table 3-4. ACCEPT appears well-calibrated in each race group with no notable improvements in
calibration with ACCEPT + Race. The decision curves comparing ACCEPT to ACCEPT + Race
are presented in Figure 3-2. There were no significant changes to the clinical net benefit of
ACCEPT after adjusting for race across all treatment thresholds.
36
Figure 3-1. Calibration plots of ACCEPT and ACCEPT-Race.
The plots compare per decile mean observed risk presented with 95% confidence intervals per decile. (A) SUMMIT;
(B) LOTT; (C) TORCH.
Table 3-4. Mean Exacerbation Risk by Race
Caucasian
Asian
Black
Hispanic
Indigenous
Observed
0.29
0.31
0.32
0.25
0.37
ACCEPT
0.29
0.30
0.37
0.26
0.33
ACCEPT-
Race
0.29
0.31
0.37
0.27
0.38
37
Figure 3-2. Decision curves of ACCEPT vs. ACCEPT-Race.
The net benefit of the two models is compared in (A) SUMMIT, (B) LOTT, and (C) TORCH.
38
3.4 Discussion
The applicability of prediction models across different settings is an essential element to their
future success in clinical implementation. The adaptability of prediction models differentiates
them from binary risk classifiers because they can be revised through model updates. Often in
prediction modeling, the focus is largely placed on clinical variables representing disease status
and severity. However, health outcomes can also be affected by individual- and system-level
factors. One such factor is race. In this study, I identified an association between race and COPD
exacerbations. This association was largely mitigated when accounting for ACCEPT’s overall
risk predictions. By examining each of ACCEPT’s predictors individually, I showed that the
single most important predictor explaining this effect was exacerbation history. The association
between race and COPD exacerbations was further shrunken after applying a random-effects
approach. Concordantly, I showed that incorporating a race-adjustment factor into ACCEPT did
not notably improve traditional metrics of predictive performance or clinical utility. The
implications of my analyses suggest that the current combination of ACCEPT’s predictors
successfully captures the race effect for risk prediction of COPD exacerbations.
Race correction in the field of prediction algorithms has been a long-debated topic. There are
many concerns regarding the inclusion of race in prediction algorithms, with most arguments
stemming from the potential of these algorithms to propagate race-based medicine and
perpetuate health inequities across marginalized racial groups.79,88 There are still many prediction
algorithms that incorporate race as a feature. Although there is mounting evidence to suggest that
race is not a biological construct, it is still possible for the race variable to yield predictive
power. There are concerns that entirely ignoring race may decrease predictive performance
overall which could lead to substandard clinical decision-making and yield negative health
consequences for all races.80,81
My study has several strengths. Primarily, I presented a rigorous framework for assessing
different aspects of model performance following an update. I focused specifically on race as a
factor that could convey information on varying background exacerbation risks between
populations. However, this framework could be applied to any of the aforementioned individual-
39
or system-level factors. My analysis also did not entirely refit the model and instead utilized an
intercept adjustment to avoid overfitting the model in this sample. Nevertheless, I showed that
the racial differences after adjustment were negligible and, as such, race-adjustment may not be
necessary. My study also employed a random-effects method which is related to hierarchical
modeling and recognized as more rigorous for developing a setting-specific adjustment
factor.39,8386 I illustrated that the random-effects method resulted in smaller racial differences
compared to the traditional fixed-effects method. By taking the uncertainty of the estimated
differences into account, this approach reduced any exaggerated differences between settings.39
This ‘shrinkage’ of estimates is preferred for predictive purposes.83,8992
Limitations for my study should be acknowledged. The sample sizes for non-Caucasian groups
were relatively small as the sample was largely Caucasian. Nonetheless, a statistically significant
difference in exacerbation risk was still detected between race groups. Further examination of
setting-specific adjustment factors is warranted in larger and more diverse samples. Another
concern is the limitation of race data obtained from clinical trials. Broad assumptions are made
regarding how certain races are classified which can result in the potential misclassification of
race.93,94 Mismeasurement of race can also occur when races are generalized into broader groups
(e.g., Asian including Southeast Asian, South Asian, Japanese, etc.). Compared to existing race-
adjusted prediction algorithms, which typically only examine < 2 races, my study evaluates more
race classes. Lastly, missing data for predictor values requiring imputation was a limitation.
However, the only missing values were SGRQ scores and LAMA input for TORCH. ACCEPT
has been shown to be largely robust to these missing values.2
An important point of discussion is that the results of my study do not capture the nuances of
algorithm fairness and whether to or not include race in prediction algorithms based on these
considerations.79,95 Rather, my analyses strictly pertain to the statistical quality and predictive
performance of ACCEPT. The study of algorithm discrimination, biases, and fairness has
become more prominent with the emergence of prediction algorithms. The focus of this field is
to dissect algorithm biases and discuss whether the algorithm propagates discrimination
inherently or through the inclusion of variables that are entwined with biases.79,96,97 The
40
measurement of these biases is separate from measurements of predictive performance and
should be addressed for all prediction algorithms. Both aspects of model evaluation should be
considered, and thus future studies are needed to investigate algorithmic biases in ACCEPT.
In Chapter 2, I showed that there is significant unexplained heterogeneity in the background
exacerbation risk across clinical settings that mandate setting-specific model updating. When a
prediction model fails to accurately predict risk, it could be the result of a variable not accounted
for by the model that can explain part of such heterogeneity. In this chapter, I detected an
association between race and exacerbation risk and explored the variable race as a potential
candidate. I found that a race-adjustment factor did not notably improve ACCEPT’s predictive
performance or clinical utility. As such, race could not have played any substantial role in the
variability observed in the previous analysis. With regard to ACCEPT’s ability to produce valid
predictions, there may not be a need to include a racial component. The challenge still stands in
seeking and developing innovative model updating strategies to refine existing models before
implementation in real-world settings, as well as ensuring algorithmic fairness across subgroups
of patients with COPD.
41
Chapter 4: Conclusion
4.1 Overview and contribution
In this thesis, I assessed the generalizability of COPD exacerbation risk prediction models and
used model updating strategies to improve model applicability across different clinical settings.
Both chapters present a framework in which prediction model performance can be evaluated.
In Chapter 2, I examined 3 risk stratification algorithms, which included 2 externally validated
prediction models (ACCEPT and Bertens) compared to exacerbation history alone (status quo),
in 3 clinical trials. I measured the algorithms’ clinical utility with a decision curve analysis and
their predictive performance with discrimination and calibration. Following this, I recalibrated
both prediction models with a monotonical intercept adjustment using the background
exacerbation risk in each sample and evaluated whether there were improvements to their
clinical utility. The results of my analysis highlighted the potential risk of harm when risk
stratification algorithms are naively applied in different settings. Exacerbation history alone is
the current gold standard used by major COPD guidelines for risk-stratifying patients and
recommending therapy. However, exacerbation history is not capable of adapting to different
patient populations due to its inflexible nature. Comparatively, prediction models are capable of
being revised and my analysis showed that they provided more accurate risk predictions.
Although the prediction models also posed a risk of harm when miscalibrated, this was
substantially mitigated once they were recalibrated. These results remained consistent across the
3 sample cohorts which represented varying levels of background exacerbation risk. A major
clinical implication of my findings from Chapter 2 is that risk stratification algorithms should not
be universally applied. Rather, careful consideration should be taken in re-evaluating the
algorithm for different patient populations and adapting based on the overall outcome risk.
Ultimately, these results provide a cautionary tale for ‘off-the-shelf’ use of prediction models but
shed light on potential strategies to improve performance and ensure their future success when
implemented into COPD care.
Information on a population’s exact background exacerbation risk is not always available for use
to correct risk prediction models. Unexplained differences in exacerbation risk across
42
populations can also be attributed to patient-, system-, or setting-level variables not accounted
for by the prediction model’s included predictors. These variables often represent subgroups
(e.g., race, geographic region, socioeconomic status, gender roles) that encompass a combination
of factors resulting in different exacerbation risks. Thus, adjustment factors have been
incorporated into prediction models in other disease domains to account for this heterogeneity.
38,39,58 In Chapter 3, I focused on the effect of race on exacerbation risk and evaluated whether a
race-adjustment factor could improve ACCEPT’s model performance. Using data from the same
3 clinical trials, I first estimated the unadjusted and adjusted racial differences in exacerbation
risk. I showed that there was an association between race and exacerbation risk which
disappeared when I controlled for predicted exacerbation risk (using ACCEPT). This indicates
that the observed racial differences in exacerbation risk can be explained by the differences in
the distribution of predictors that are included in ACCEPT. By assessing each predictor
individually, I narrowed in on exacerbation history as the most important predictor explaining
this effect. Additionally, I utilized a random-effects methodology to demonstrate that the racial
differences could be further reduced when accounting for uncertainty in the estimated
differences. I developed a race-adjustment factor for ACCEPT, by applying this same
methodology, and compared model discrimination, calibration, and clinical utility between the
base and race-adjusted model. I found that there were no significant improvements to predictive
performance or clinical utility following race-adjustment. The implication of my findings is that
ACCEPT does not require a race-adjustment factor with regard to producing valid predictions.
Ultimately, the challenge remains to find a population-specific adjustment factor to enhance the
applicability of COPD exacerbation risk prediction models.
4.2 Strengths of this research
There are several strengths to my thesis. Compared to other clinical domains,40 the evaluation of
exacerbation risk prediction models in COPD is not well studied in new cohorts.44,76 My work
contributed to this area of study by evaluating existing COPD exacerbation risk prediction
models with the goal of refining them for future implementation in different clinical settings.
Consistent with findings from Gulati et al.’s 40 evaluation of cardiovascular disease risk
prediction models, I showed that risk prediction models in the COPD domain also exhibit
43
performance loss when applied to different cohorts. I also utilized state-of-the-art methodology
for model performance assessment and updating. I demonstrated in Chapter 2 that a simple
monotonical fixed-effect intercept adjustment given the background outcome risk is capable of
significantly improving the clinical utility of a risk prediction model. Chapter 3 explored a more
nuanced method by using a random-effects methodology to develop an intercept adjustment
factor for race, albeit the added predictability of exacerbation risk with the inclusion of race was
minimal. Another strength of this research is in the diversity of the data used to evaluate these
models. I used data from 3 landmark multicentered COPD clinical trials (with 2 international
trials) that represented different levels of exacerbation risk. Although there are limitations to
clinical trial data, the diversity of this sample should better reflect the heterogenous COPD
patient population. The model evaluation framework is one of the main strengths of this research.
I measured model performance with regards to both calibration and discrimination. These
measures inform on both the models’ ability to distinguish high-risk individuals as well as how
well the predicted risks align with the observed risks, respectively. In additional to statistical
measures, I also assessed clinical utility with respect to whether the prediction model improved
clinical decision-making. Overall, my thesis provides a thorough assessment of COPD
exacerbation risk prediction model performance and opens doors for potential model refinement
strategies.
4.3 Limitations of this research
While the well-phenotyped and rigorously collected clinical trial data ensures high internal
validity, the application of trial results to ‘real-world’ COPD patients remains less clear. The
strict inclusion and exclusion criteria used by clinical trials to upkeep their internal validity
ultimately restrict their generalizability. Individuals willing to participate in clinical trials can be
inherently different from those in the patient population that do not participate. COPD
exacerbation burden and management in the tightly controlled environment of a clinical trial also
differs from the real world. Missing predictor value data requiring imputation is often an issue
when using clinical trial data as well. However, in clinical practice, it is likely that there will be
many instances in which predictor values are missing. Thus, it is crucial for model predictions to
be robust under these circumstances. Defining and measuring a factor like race can be
44
challenging. Especially in the context of clinical trial data, there are concerns about
misclassifying individuals. Chapter 3 explores the race variable, which is broadly defined in the
included clinical trials, but is a complex construct. Unfortunately, no standard is universally used
for how detailed a variable such as ‘race’ or ‘region’ should be defined, and, as such,
assumptions are needed to classify such variables for interpretation. For example, the clinical
trials define Asian as a single broad classification. However, the Asian group is heterogenous as
it encompasses a variety of subgroups (e.g., South Asian, Southeast Asian, Korean, Japanese,
etc.). Additionally, because sample size was limited it was difficult to further investigate the
model performance within race subgroups, such as Asian males compared to Asian females.
My thesis does not address the nuances of algorithm fairness and whether race-adjustment is
appropriate in this context.79,95 Additional studies with specific outcome measures for algorithm
fairness are needed to assess if race-adjustment in COPD exacerbation risk prediction
perpetuates race-based health inequities. My analyses are limited to the model’s predictive
properties and potential clinical utility in the realm of decision-making. My research only
evaluated two COPD exacerbation risk prediction models, with further investigation into the
refinement of one model (ACCEPT), so the findings may not be generalizable to other models in
this domain. Currently, there are no other reviews evaluating model generalizability in the COPD
domain that is comparable to Gulati et al.’s 40 extensive review. Nonetheless, to still apply a
similar assessment framework to COPD exacerbation risk prediction models, I focused on
prediction models that had potential for use in clinical practice. Guerra et al.’s 44 systematic
review on COPD exacerbation risk prediction found that most models were at high risk of bias
from unsound statistical methods used to develop the model and an overall lack of external
validation. The models evaluated in this thesis were developed using sound statistical methods
and also externally validated.1,3,44,45 Lastly, a notable limitation of interpreting clinical utility
within this thesis lies in the use of a decision curve analysis without formal treatment thresholds
for the management of COPD exacerbations. Because the decision curves are plotted against the
entire possible range of thresholds, these results can be revisited once treatment thresholds are
formally determined.
45
4.4 Implications for practice
My findings have important implications for advancements in the management of COPD.
Whereas contemporary cardiovascular disease guidelines rely on prediction algorithms to
generate objective risk scores,51 major COPD guidelines still fail to capitalize on the potential of
quantitative risk prediction and rely on binary classifications, such as ‘frequent exacerbator’, to
inform treatment.5,17,21 There is mounting evidence to suggest that exacerbation history alone is
suboptimal for risk-stratifying patients.26,27,29 Building on this, my thesis provides supporting
evidence for the shortcomings of relying on exacerbation history and also presents certain
situations where it can be potentially harmful in the context of clinical decision-making. With
EHRs becoming increasingly prominent in routine care, the integration of prediction models into
COPD management becomes more clinically sensible. My results add to the current literature by
showing that COPD exacerbation risk prediction models are superior to the status quo in
predicting future risk. When objective risk prediction is adopted into routine COPD care, this
will open opportunities for conducting studies to determine specific risk thresholds for treatment
and updating major COPD guidelines with these thresholds.
Despite the promises, my results also show that caution must be taken when using prediction
models in different clinical settings as they can be miscalibrated and, as a result, potentially
harmful. Critically, it illustrates the importance of continual model re-evaluation to ensure that it
is performing up to standard. The silver lining is that prediction models are capable of being
updated and recalibrated to different settings whereas binary classifiers like exacerbation history
are static in nature. The integration of models into EHRs could allow them to be periodically
updated based on their setting-specific performance metrics, given the ability of EHR systems to
collect practice data over time.98 While the particular variable that I investigated, did not have a
strong effect on the predictability of future exacerbations, my study still raises the awareness for
consideration of non-clinical variables in future risk prediction models. It also promotes a state-
of-the-art quantitative framework for properly doing so. This can ultimately enhance the
applicability of COPD exacerbation risk prediction model across different settings without
needing to completely refit the model each time. Overall, this thesis provides perspective into the
46
nuances of risk prediction in COPD and is an additional stepping stone on the path to
incorporating quantitative risk prediction into routine care.
4.5 Implications for future research
My research only begins to highlight the abundant possibilities of predictive analytics in the
management of COPD. Compared to other disease domains, evidence-based ‘Precision
Medicine’ in COPD is still in its infancy. It serves as a primer for the COPD clinical and
research community to move beyond exacerbation history and embrace more nuanced
multivariable risk stratification strategies. I showed that it is unlikely for a single COPD
exacerbation risk prediction model, based on clinical variables alone, to be universally
applicable. Future research on COPD prediction models should be prepared to explicitly include
background risk as a predictor. The search for these variables remains and presents opportunities
to explore other factors, such as geographic region, socioeconomic status, or gender roles to fully
capitalize on the adaptability of prediction models.
As previously mentioned, a notable limitation in my thesis is the reliance on solely clinical trial
data to evaluate and update the models. Thus, evaluating the models in practice-generated data is
a crucial next step to assess their performance in a real-world setting. The decision curve analysis
is also only a prerequisite for examining clinical utility. Multicentered clinical trials using local
data would be ideal to assess whether the implementation of a risk prediction model into routine
care would yield greater clinical benefit compared to the current standard of care. Multiple
setting-specific trials would be needed to accurately capture the differences between each setting.
Additionally, future studies should also evaluate the cost-effectiveness of integrating an
exacerbation risk prediction model into routine care. There are costs associated with setting up
the infrastructure to support the prediction model in EHRs. For example, the initial setup of the
model into local systems, maintenance, training personnel, and time costs. Assessing the cost-
effectiveness of prediction models is essential to inform decision-makers on the value of
adopting the model into their healthcare system. Additionally, the integration of prediction
models into routine COPD management would present clinicians and patients with quantitative
risk scores. Future studies are needed to identify relevant treatment thresholds and incorporate
47
these thresholds into guideline management recommendations. This is an important next step in
advancing evidence-informed care and shared decision-making processes for COPD patients.
Lastly, there is a high ceiling for advancing predictive analytics in COPD care. Model prediction
accuracy will continue to improve as more clinical data becomes available to allow for model
updates. As seen in other sectors, prediction model revision can be automated to optimize
predictive accuracy. This opens future research opportunities to incorporate machine learning
into clinical prediction models and maximize their capabilities. There are many possible routes
for future research in prediction modeling. Whether it is determining a setting-specific
adjustment factor, assessing clinical benefit and value, or exploring opportunities for machine
learning, all routes ultimately lead to the advancement of personalized care for COPD patients.
48
References
1. Adibi, A. et al. The Acute COPD Exacerbation Prediction Tool (ACCEPT): a modelling
study. Lancet Respir Med 8, 1013–1021 (2020).
2. Safari, A. et al. ACCEPT 2·0: Recalibrating and externally validating the Acute COPD
exacerbation prediction tool (ACCEPT). EClinicalMedicine 51, 101574 (2022).
3. Bertens, L. C. M. et al. Development and validation of a model to predict the risk of
exacerbations in chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis
8, 493–499 (2013).
4. Evans, J., Chen, Y., Camp, P. G., Bowie, D. M. & McRae, L. Estimating the prevalence of
COPD in Canada: Reported diagnosis versus measured airflow obstruction. Health Rep 25,
3–11 (2014).
5. 2022 GOLD Reports. Global Initiative for Chronic Obstructive Lung Disease - GOLD
https://goldcopd.org/2022-gold-reports-2/.
6. Barnes, P. J. et al. Chronic obstructive pulmonary disease. Nat Rev Dis Primers 1, 15076
(2015).
7. Barnes, P. J. Sex Differences in Chronic Obstructive Pulmonary Disease Mechanisms. Am J
Respir Crit Care Med 193, 813–814 (2016).
8. Soriano, J. B. et al. Global, regional, and national deaths, prevalence, disability-adjusted
life years, and years lived with disability for chronic obstructive pulmonary disease and
asthma, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015.
The Lancet Respiratory Medicine 5, 691–706 (2017).
9. Khakban, A. et al. Ten-Year Trends in Direct Costs of COPD. Chest 148, 640–646 (2015).
49
10. López-Campos, J. L., Tan, W. & Soriano, J. B. Global burden of COPD: Global burden of
COPD. Respirology 21, 14–23 (2016).
11. World Health Organization. The top 10 causes of death. 2020.
https://www.who.int/en/news-room/fact-sheets/detail/the-top-10-causes-of-death.
12. Khakban, A. et al. The Projected Epidemic of Chronic Obstructive Pulmonary Disease
Hospitalizations over the Next 15 Years. A Population-based Perspective. Am J Respir Crit
Care Med 195, 287–291 (2017).
13. Najafzadeh, M. et al. Future impact of various interventions on the burden of COPD in
Canada: a dynamic population model. PLoS ONE 7, e46746 (2012).
14. Celli, B. R. & Barnes, P. J. Exacerbations of chronic obstructive pulmonary disease. Eur
Respir J 29, 1224–1238 (2007).
15. Hospital stays in Canada | CIHI. https://www.cihi.ca/en/hospital-stays-in-canada.
16. Vogelmeier, C. F. et al. Global Strategy for the Diagnosis, Management, and Prevention of
Chronic Obstructive Lung Disease 2017 Report. GOLD Executive Summary. Am J Respir
Crit Care Med 195, 557–582 (2017).
17. Bourbeau, J. et al. Canadian Thoracic Society Clinical Practice Guideline on
pharmacotherapy in patients with COPD – 2019 update of evidence. Canadian Journal of
Respiratory, Critical Care, and Sleep Medicine 3, 210–232 (2019).
18. Kim, V. & Aaron, S. D. What is a COPD exacerbation? Current definitions, pitfalls,
challenges and opportunities for improvement. Eur Respir J 52, 1801261 (2018).
19. 2021 report. Global Initiative for Chronic Obstructive Lung Disease - GOLD
https://goldcopd.org/2021-gold-reports/.
50
20. Johnson, J. D. & Theurer, W. M. A stepwise approach to the interpretation of pulmonary
function tests. Am Fam Physician 89, 359–366 (2014).
21. Singh, D. et al. Global Strategy for the Diagnosis, Management, and Prevention of Chronic
Obstructive Lung Disease: the GOLD science committee report 2019. Eur Respir J 53,
1900164 (2019).
22. 2023 GOLD Reports. Global Initiative for Chronic Obstructive Lung Disease - GOLD
https://goldcopd.org/2022-gold-reports-2/.
23. Hurst, J. R. et al. Susceptibility to exacerbation in chronic obstructive pulmonary disease. N
Engl J Med 363, 1128–1138 (2010).
24. Obeidat, M., Sadatsafavi, M. & Sin, D. D. Precision health: treating the individual patient
with chronic obstructive pulmonary disease. Med J Aust 210, 424–428 (2019).
25. Marott, J. L. et al. Exacerbation history, severity of dyspnoea and maintenance treatment
predicts risk of future exacerbations in patients with COPD in the general population.
Respir Med 192, 106725 (2022).
26. Han, M. K. et al. Frequency of exacerbations in patients with chronic obstructive
pulmonary disease: an analysis of the SPIROMICS cohort. Lancet Respir Med 5, 619–626
(2017).
27. Calverley, P. M. et al. Determinants of exacerbation risk in patients with COPD in the
TIOSPIR study. Int J Chron Obstruct Pulmon Dis 12, 3391–3405 (2017).
28. Sadatsafavi, M. Should the number of acute exacerbations in the previous year be used to
guide treatments in COPD? Eur Respir J Aug 27, (2020).
29. Sadatsafavi, M. et al. Should the number of acute exacerbations in the previous year be
used to guide treatments in COPD? Eur Respir J 57, 2002122 (2021).
51
30. Pencina, M. J. & Peterson, E. D. Moving From Clinical Trials to Precision Medicine: The
Role for Predictive Modeling. JAMA 315, 1713–1714 (2016).
31. D’Agostino, R. B., Grundy, S., Sullivan, L. M., Wilson, P., & for the CHD Risk Prediction
Group. Validation of the Framingham Coronary Heart Disease Prediction Scores: Results of
a Multiple Ethnic Groups Investigation. JAMA 286, 180 (2001).
32. Steyerberg, E. W. & Vergouwe, Y. Towards better clinical prediction models: seven steps
for development and an ABCD for validation. European Heart Journal 35, 1925–1931
(2014).
33. Steyerberg, E. W. Clinical prediction models: a practical approach to development,
validation, and updating. (Springer, 2019).
34. Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a
multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the
TRIPOD Statement. BMC Medicine 13, 1 (2015).
35. Pencina, M. J. & D’Agostino, R. B. Evaluating Discrimination of Risk Prediction Models:
The C Statistic. JAMA 314, 1063 (2015).
36. Vickers, A. J. & Elkin, E. B. Decision curve analysis: a novel method for evaluating
prediction models. Med Decis Making 26, 565–574 (2006).
37. Sadatsafavi, M. et al. Moving beyond AUC: decision curve analysis for quantifying net
benefit of risk prediction models. Eur Respir J 58, 2101186 (2021).
38. Smits, M. et al. Predicting intracranial traumatic findings on computed tomography in
patients with minor head injury: the CHIP prediction rule. Ann Intern Med 146, 397–405
(2007).
52
39. Steyerberg, E. W., Eijkemans, M. J. C., Boersma, E. & Habbema, J. D. F. Applicability of
clinical prediction models in acute myocardial infarction: a comparison of traditional and
empirical Bayes adjustment methods. Am Heart J 150, 920e11-e17 (2005).
40. Gulati, G. et al. Generalizability of Cardiovascular Disease Clinical Prediction Models: 158
Independent External Validations of 104 Unique Models. Circ Cardiovasc Qual Outcomes
15, e008487 (2022).
41. Steyerberg, E. W., Borsboom, G. J. J. M., van Houwelingen, H. C., Eijkemans, M. J. C. &
Habbema, J. D. F. Validation and updating of predictive logistic regression models: a study
on sample size and shrinkage. Stat Med 23, 2567–2586 (2004).
42. Moons, K. G. M. et al. Risk prediction models: II. External validation, model updating, and
impact assessment. Heart 98, 691–698 (2012).
43. Vergouwe, Y. et al. A closed testing procedure to select an appropriate method for updating
prediction models. Stat Med 36, 4529–4539 (2017).
44. Guerra, B., Gaveikaite, V., Bianchi, C. & Puhan, M. A. Prediction models for exacerbations
in patients with COPD. Eur Respir Rev 26, 160061 (2017).
45. Bhatt, S. P. COPD exacerbations: finally, a more than ACCEPTable risk score. The Lancet
Respiratory Medicine 8, 939–941 (2020).
46. Albert, R. K. et al. Azithromycin for Prevention of Exacerbations of COPD. N Engl J Med
365, 689–698 (2011).
47. Criner, G. J. et al. Simvastatin for the Prevention of Exacerbations in Moderate-to-Severe
COPD. N Engl J Med 370, 2201–2210 (2014).
53
48. Aaron, S. D. et al. Tiotropium in Combination with Placebo, Salmeterol, or Fluticasone–
Salmeterol for Treatment of Chronic Obstructive Pulmonary Disease: A Randomized Trial.
Annals of Internal Medicine 146, 545 (2007).
49. Jones, P. W., Quirk, F. H. & Baveystock, C. M. The St George’s Respiratory
Questionnaire. Respir Med 85 Suppl B, 25–31; discussion 33-37 (1991).
50. Jones, P. W. et al. Development and first validation of the COPD Assessment Test. Eur
Respir J 34, 648–654 (2009).
51. Rossello, X. et al. Risk prediction tools in cardiovascular disease prevention: A report from
the ESC Prevention of CVD Programme led by the European Association of Preventive
Cardiology (EAPC) in collaboration with the Acute Cardiovascular Care Association
(ACCA) and the Association of Cardiovascular Nursing and Allied Professions (ACNAP).
Eur J Prev Cardiol 26, 1534–1544 (2019).
52. Laupacis, A., Sekar, N. & Stiell, I. G. Clinical prediction rules. A review and suggested
modifications of methodological standards. JAMA 277, 488–494 (1997).
53. Michaux, K. D. et al. IMplementing Predictive Analytics towards efficient COPD
Treatments (IMPACT): protocol for a stepped-wedge cluster randomized impact study.
Diagn Progn Res 7, 3 (2023).
54. Calverley, P. M. A. et al. International Differences in the Frequency of COPD
Exacerbations Reported in Three Clinical Trials. Am J Respir Crit Care Med 206, 25–33
(2022).
55. Eisner, M. D. et al. Socioeconomic status, race and COPD health outcomes. J Epidemiol
Community Health 65, 26–34 (2011).
54
56. Ejike, C. O. et al. Contribution of Individual and Neighborhood Factors to Racial
Disparities in Respiratory Outcomes. Am J Respir Crit Care Med 203, 987–997 (2021).
57. Perez, T. A. et al. Sex differences between women and men with COPD: A new analysis of
the 3CIA study. Respiratory Medicine 171, 106105 (2020).
58. Steyerberg, E. W. et al. Perioperative mortality of elective abdominal aortic aneurysm
surgery. A clinical prediction rule based on literature and individual patient data. Arch
Intern Med 155, 1998–2004 (1995).
59. Chatila, W. M., Wynkoop, W. A., Vance, G. & Criner, G. J. Smoking patterns in African
Americans and whites with advanced COPD. Chest 125, 15–21 (2004).
60. Mamary, A. J. et al. Race and Gender Disparities are Evident in COPD Underdiagnoses
Across all Severities of Measured Airflow Obstruction. Chronic Obstr Pulm Dis 5, 177–
184 (2018).
61. Chatila, W. M. et al. Advanced emphysema in African-American and white patients: do
differences exist? Chest 130, 108–118 (2006).
62. Diaz, A. A. et al. Differences in Health-Related Quality of Life Between New Mexican
Hispanic and Non-Hispanic White Smokers. Chest 150, 869–876 (2016).
63. Aldrich, M. C. et al. Genetic ancestry-smoking interactions and lung function in African
Americans: a cohort study. PLoS One 7, e39541 (2012).
64. Vestbo, J. et al. Fluticasone furoate and vilanterol and survival in chronic obstructive
pulmonary disease with heightened cardiovascular risk (SUMMIT): a double-blind
randomised controlled trial. Lancet 387, 1817–1826 (2016).
55
65. The Long-Term Oxygen Treatment Trial Research Group. A Randomized Trial of Long-
Term Oxygen for COPD with Moderate Desaturation. N Engl J Med 375, 1617–1627
(2016).
66. Vestbo, J. & TORCH Study Group. The TORCH (towards a revolution in COPD health)
survival study protocol. Eur Respir J 24, 206–210 (2004).
67. DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or
more correlated receiver operating characteristic curves: a nonparametric approach.
Biometrics 44, 837–845 (1988).
68. Chiang, C.-T. & Hung, H. Non‐parametric estimation for time-dependent AUC. Journal of
Statistical Planning and Inference 140, 1162–1174 (2010).
69. HUNG, H. & CHIANG, C.-T. Estimation methods for time-dependent AUC models with
survival data. The Canadian Journal of Statistics / La Revue Canadienne de Statistique 38,
8–26 (2010).
70. Vickers, A. J., Cronin, A. M., Elkin, E. B. & Gonen, M. Extensions to decision curve
analysis, a novel method for evaluating diagnostic tests, prediction models and molecular
markers. BMC Med Inform Decis Mak 8, 53 (2008).
71. Vickers, A. J., van Calster, B. & Steyerberg, E. W. A simple, step-by-step guide to
interpreting decision curve analysis. Diagn Progn Res 3, 18 (2019).
72. Sadatsafavi, M., Tavakoli, H. & Safari, A. Marginal Versus Conditional Odds Ratios When
Updating Risk Prediction Models. Epidemiology 33, 555–558 (2022).
73. Taylor, S. P., Sellers, E. & Taylor, B. T. Azithromycin for the Prevention of COPD
Exacerbations: The Good, Bad, and Ugly. Am J Med 128, 1362.e1–6 (2015).
56
74. Yebyo, H. G. et al. Personalising add-on treatment with inhaled corticosteroids in patients
with chronic obstructive pulmonary disease: a benefit-harm modelling study. Lancet Digit
Health 3, e644–e653 (2021).
75. Moons, K. G. M. et al. Risk prediction models: I. Development, internal validation, and
assessing the incremental value of a new (bio)marker. Heart 98, 683–690 (2012).
76. Bellou, V., Belbasis, L., Konstantinidis, A. K., Tzoulaki, I. & Evangelou, E. Prognostic
models for outcome prediction in patients with chronic obstructive pulmonary disease:
systematic review and critical appraisal. BMJ 367, l5358 (2019).
77. Wessler, B. S. et al. External Validations of Cardiovascular Clinical Prediction Models: A
Large-Scale Review of the Literature. Circ Cardiovasc Qual Outcomes 14, e007858 (2021).
78. Ho, J. K. et al. Generalizability of Risk Stratification Algorithms for Exacerbations in
COPD. Chest S0012369222042167 (2022) doi:10.1016/j.chest.2022.11.041.
79. Vyas, D. A., Eisenstein, L. G. & Jones, D. S. Hidden in Plain Sight - Reconsidering the Use
of Race Correction in Clinical Algorithms. N Engl J Med 383, 874–882 (2020).
80. Manski, C. F. Patient-centered appraisal of race-free clinical risk assessment. Health Econ
(2022) doi:10.1002/hec.4569.
81. Diao, J. A. et al. Clinical Implications of Removing Race From Estimates of Kidney
Function. JAMA 325, 184–186 (2021).
82. Schneiderman, N. et al. Prevalence of Diabetes Among Hispanics/Latinos From Diverse
Backgrounds: The Hispanic Community Health Study/Study of Latinos (HCHS/SOL).
Diabetes Care 37, 2233–2239 (2014).
83. Van Houwelingen, H. C. & Thorogood, J. Construction, validation and updating of a
prognostic model for kidney graft survival. Stat Med 14, 1999–2008 (1995).
57
84. DeLong, E. Hierarchical modeling: its time has come. Am Heart J 145, 16–18 (2003).
85. Krumholz, H. M., Chen, J., Rathore, S. S., Wang, Y. & Radford, M. J. Regional variation in
the treatment and outcomes of myocardial infarction: investigating New England’s
advantage. Am Heart J 146, 242–249 (2003).
86. Austin, P. C., Tu, J. V. & Alter, D. A. Comparing hierarchical modeling with traditional
logistic regression analysis among patients hospitalized with acute myocardial infarction:
should we be analyzing cardiovascular outcomes data differently? Am Heart J 145, 27–35
(2003).
87. Vickers, A. J. Incorporating Clinical Considerations into Statistical Analyses of Markers: A
Quiet Revolution in How We Think About Data. Clin Chem 62, 671–672 (2016).
88. Braun, L., Wentz, A., Baker, R., Richardson, E. & Tsai, J. Racialized algorithms for kidney
function: Erasing social experience. Soc Sci Med 268, 113548 (2021).
89. Louis, T. A. & Shen, W. Innovations in bayes and empirical bayes methods: estimating
parameters, populations and ranks. Stat Med 18, 2493–2505 (1999).
90. Steyerberg, E. W., Eijkemans, M. J., Harrell, F. E. & Habbema, J. D. Prognostic modeling
with logistic regression analysis: in search of a sensible strategy in small data sets. Med
Decis Making 21, 45–56 (2001).
91. Van Houwelingen, J. C. & Le Cessie, S. Predictive value of statistical models. Stat Med 9,
1303–1325 (1990).
92. Steyerberg, E. W., Eijkemans, M. J. C. & Habbema, J. D. F. Application of Shrinkage
Techniques in Logistic Regression Analysis: A Case Study. Statistica Neerland 55, 76–88
(2001).
58
93. Campbell, M. E. & Troyer, L. The Implications of Racial Misclassification by Observers.
Am Sociol Rev 72, 750–765 (2007).
94. Feliciano, C. Shades of Race: How Phenotype and Observer Characteristics Shape Racial
Classification. American Behavioral Scientist 60, 390–419 (2016).
95. Mhasawade, V., Zhao, Y. & Chunara, R. Machine learning and algorithmic fairness in
public and population health. Nat Mach Intell 3, 659–666 (2021).
96. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an
algorithm used to manage the health of populations. Science 366, 447–453 (2019).
97. Cirillo, D. et al. Sex and gender differences and biases in artificial intelligence for
biomedicine and healthcare. NPJ Digit Med 3, 81 (2020).
98. Adibi, A., Sadatsafavi, M. & Ioannidis, J. P. A. Validation and Utility Testing of Clinical
Prediction Models: Time to Change the Approach. JAMA 324, 235–236 (2020).
59
Appendices
Appendix A
A.1. Receiver Operator Characteristic (ROC) Curves
ROC curves of the risk stratification algorithms in (A) SUMMIT, (B) LOTT, and (C) TORCH
60
A.2. Calibration Plots
Calibration plots of the risk prediction models comparing per decile average predicted and observed risk of
exacerbation before and after adjustment. (A) ACCEPT in SUMMIT; (B) Bertens in SUMMIT; (C) ACCEPT in
LOTT; (D) Bertens in LOTT; (E) ACCEPT in TORCH; (F) Bertens in TORCH
61
A.3. Unadjusted and Adjusted Annual Risk of Exacerbation of the Risk Prediction Models
SUMMIT
LOTT
TORCH
Observed risk
0.22
0.38
0.52
Unadjusted predicted risk
ACCEPT
0.34
0.53
0.51
Bertens
0.20
0.27
0.28
Adjusted predicted risk
ACCEPT
0.22
0.38
0.52
Bertens
0.22
0.38
0.52
Appendix B
B.1 Baseline Characteristics of Included Participants Stratified by Race
Caucasian
Asian
Black
Hispanic
Indigenous
N
3266
506
108
176
41
Mean (SD)
Follow-up time,
yr
0.89 (0.21)
0.84 (0.23)
0.93 (0.18)
0.82
(0.23)
0.82 (0.26)
Age, yr
66.2 (8.1)
67.4 (7.6)
65.9 (7.8)
66.0
(7.5)
69.0 (6.8)
BMI, kg/m2
28.2 (5.7)
22.8 (4.0)
28.4 (7.6)
28.2
(5.7)
28.4 (5.7)
62
SGRQ
45.3 (14.3)
42.5
(10.9)
48.2 (16.2)
44.5
(10.6)
49.0 (12.5)
FEV1 %
Predicted
53.1 (14.2)
53.3 (15.2)
49.4 (15.8)
55.1
(14.1)
51.3 (16.7)
Count (%)
Males
2393 (73.3)
470 (92.9)
78 (72.2)
118
(67.0)
27 (65.9)
Current
smoking status
1456 (44.6)
156 (30.8)
39 (36.1)
73 (41.5)
12 (29.3)
LAMA
1312 (40.2)
166 (32.8)
57 (52.6)
41 (23.6)
16 (39.0)
LABA
1718 (52.6)
224 (44.3)
72 (66.7)
70 (39.8)
28 (68.3)
ICS
1781 (54.5)
243 (48.0)
72 (66.7)
89 (50.6)
27 (65.9)
Statin
2019 (61.8)
267 (52.8)
55 (50.9)
112
(63.6)
20 (48.8)
Oxygen
188 (5.8)
21 (4.2)
16 (14.8)
6 (3.4)
3 (7.3)
History of > 1
moderate/severe
exacerbation
0.30
0.36
0.38
0.28
0.41
Observed risk of
> 1
0.29
0.31
0.32
0.24
0.37
63
moderate/severe
exacerbation
BMI = body mass index; FEV1 = forced expiratory volume at one second; ICS = inhaled corticosteroid; LABA =
long-acting β2 agonists; LAMA = long-acting muscarinic receptor antagonists; SGRQ = St. George’s Respiratory
Questionnaire