GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE PDF Free Download

Name: GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE PDF
Author: jennifer_docs

1 / 76

2 views•76 pages

GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE PDF Free Download

GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE PDF free Download. Think more deeply and widely.

GENERALIZABILITY OF RISK STRATIFICATION

ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC

OBSTRUCTIVE PULMONARY DISEASE

Joseph (Khoa Nguyen) Ho

PharmD, University of British Columbia, 2021

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES

(Pharmaceutical Sciences)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

April 2023

GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION

OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE

submitted by

Joseph (Khoa Nguyen) Ho

in partial fulfillment of the requirements for

the degree of

Master of Science

Pharmaceutical Sciences

Examining Committee:

Mohsen Sadatsafavi, Associate Professor, Pharmaceutical Sciences, UBC

Supervisor

Donald Sin, Professor, Medicine, UBC

Supervisory Committee Member

Kate Johnson, Assistant Professor, Pharmaceutical Sciences & Medicine Joint Appointment, UBC

Supervisory Committee Member

Jacquelyn Cragg, Assistant Professor, Pharmaceutical Sciences, UBC

Supervisory Committee Member

Larry Lynd, Professor, Pharmaceutical Sciences, UBC

Additional Examiner

iii

Abstract

Background: Contemporary management guidelines for chronic obstructive pulmonary disease

(COPD) rely on exacerbation history to risk-stratify patients and guide therapy for the prevention

of future exacerbations. However, exacerbation history alone may not reliably predict future

exacerbations due to random variability in frequency. To address this problem, multivariable

prediction models have been developed to improve predictive accuracy.

Objective: The objective of this thesis was to assess the generalizability of COPD exacerbation

risk stratification algorithms and assess whether the inclusion of race improves the performance

of such algorithms.

Methods: I evaluated three algorithms: the Acute COPD Exacerbation Prediction Tool

(ACCEPT),1,2 a prediction model by Bertens et al.,3 and exacerbation history alone, using data

from three COPD clinical trials representing different levels of exacerbation risk. I examined

discrimination, calibration, and clinical utility as measures for model performance. I then

recalibrated the models using the setting-specific exacerbation risk for comparison. I explored

race as a variable that could convey information on background risk and assessed whether

adjusting for race with a random-effects approach could improve model performance.

Results: Both prediction models had better discrimination compared to exacerbation history

alone with Δ area under the curves (AUCs) ranging from 0.05 to 0.10 (P-values <0.001).

However, no algorithm was superior in clinical utility, and all had the risk of harm. When the

models were recalibrated, clinical utility was significantly improved, and the risk of harm was

substantially mitigated. The crude exacerbation risk ratios (RRs) of race varied between 0.96 to

1.57. However, in the random-effects model, the shrunken RRs ranged between 0.99 to 1.07.

Using the adjusted RRs to update ACCEPT, I showed that the inclusion of race in ACCEPT did

not significantly improve model performance compared to the base ACCEPT. The ΔAUCs were

<0.01 in all samples with P-values > 0.17. There were also no notable improvements to

calibration, clinical utility, or goodness-of-fit (P-value 0.15) after race-adjustment.

Conclusions: Risk stratification algorithms for COPD exacerbations are not universally

applicable across all settings. However, the flexibility of clinical prediction models allows them

to be updated to accommodate setting differences.

Lay Summary

Chronic obstructive pulmonary disease (COPD) is a common lung disease that affects over 2.6

million Canadians. Episodes of flare-ups (known as exacerbations) are burdensome and common

in COPD. Thus, exacerbation prevention is crucial in COPD management. Although current

guidelines only utilize exacerbation history to estimate the future risk of exacerbations and risk-

stratify patients, considering other patient characteristics can improve risk prediction accuracy.

Clinical prediction models have been designed to combine multiple patient-specific

characteristics to better calculate the risk of future exacerbations. The objective of my thesis was

to evaluate clinical risk stratification algorithms across different settings and explore measures to

improve their accuracy. I found that these algorithms should not be universally applied to all

settings because their performance can vary from one population to another. However, unlike

exacerbation history, prediction models can be updated to account for population differences and

provide a better estimation of risk across all populations.

Preface

This thesis is comprised of two individual studies completed by Joseph Ho. I was responsible for

conducting the literature review, developing the study design and analytic plan, conducting the

analyses, interpreting the results, and writing the chapters of this thesis. My supervisor, Dr.

Mohsen Sadatsafavi, conceived the research question for both studies. My MSc supervisory

committee comprised of Drs. Mohsen Sadatsafavi, Donald Sin, Kate Johnson, and Jacquelyn

Cragg, provided feedback on the study design and interpretation of results. Dr. Donald Sin

provided additional clinical expertise for the interpretation of results. Drs. Mohsen Sadatsafavi

and Donald Sin acquired the data for the included studies. All co-authors of the included

manuscripts provided feedback on the study designs, interpretation, and reviewed manuscript

drafts.

Accepted manuscript:

1. Ho JK, Safari A, Adibi A, Sin DD, Johnson K, Sadatsafavi M. Generalizability of Risk

Stratification Algorithms for Exacerbations in COPD. 2022. CHEST. (Related to Chapter

In progress manuscript:

2. Assessing the Impact of Race on the Predictive Performance of a COPD Exacerbation

Risk Prediction Model (Related to Chapter 3)

Ethics approval for the included chapters were obtained from the University of British

Columbia’s Human Ethics Board (H22-01462).

vii

Table of Contents

ABSTRACT… ............................................................................................................................................................. iii

LAY SUMMARY ......................................................................................................................................................... v

PREFACE…… ............................................................................................................................................................ vi

TABLE OF CONTENTS ........................................................................................................................................... vii

LIST OF TABLES ....................................................................................................................................................... x

LIST OF FIGURES .................................................................................................................................................... xi

LIST OF ABBREVIATIONS .................................................................................................................................... xii

ACKNOWLEDGEMENTS ...................................................................................................................................... xiii

CHAPTER 1: INTRODUCTION ................................................................................................................................ 1

1.1 CHRONIC OBSTRUCTIVE PULMONARY DISEASE ............................................................................................... 1

1.2 MANAGEMENT OF COPD: RISK STRATIFICATION AND PREVENTIVE THERAPIES ............................................. 2

1.3 THE NEED FOR ACCURATE RISK STRATIFICATION IN COPD ............................................................................ 4

1.4 CLINICAL PREDICTION MODELS ....................................................................................................................... 4

1.5 ASSESSING THE PERFORMANCE OF CLINICAL PREDICTION MODELS ................................................................ 5

1.6 UPDATING CLINICAL PREDICTION MODELS ..................................................................................................... 6

1.7 CLINICAL PREDICTION MODELS UNDER INVESTIGATION: ACCEPT AND BERTENS ........................................ 7

1.8 CURRENT KNOWLEDGE GAPS ........................................................................................................................... 8

1.9 OBJECTIVES ...................................................................................................................................................... 9

1.10 THESIS SUMMARY .......................................................................................................................................... 10

CHAPTER 2: GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR

EXACERBATIONS IN COPD ................................................................................................................................. 12

2.1 INTRODUCTION ............................................................................................................................................... 12

2.2 METHODS ....................................................................................................................................................... 13

viii

2.2.1 Risk Stratification Algorithms .............................................................................................................. 13

2.2.2 Sources of Data .................................................................................................................................... 14

2.2.3 Primary Outcome ................................................................................................................................. 15

2.2.4 Discrimination of Risk Stratification Algorithms ................................................................................. 15

2.2.5 Clinical Utility of Risk Stratification Algorithms ................................................................................. 15

2.2.6 Risk Prediction Model Recalibration ................................................................................................... 16

2.3 RESULTS ......................................................................................................................................................... 16

2.3.1 Participants .......................................................................................................................................... 16

2.3.2 Discrimination ...................................................................................................................................... 18

2.3.3 Calibration ........................................................................................................................................... 19

2.3.4 Net benefit ............................................................................................................................................ 20

2.4 DISCUSSION .................................................................................................................................................... 23

2.5 CONCLUSION .................................................................................................................................................. 26

CHAPTER 3: ASSESSING THE IMPACT OF RACE ON THE PREDICTIVE PERFORMANCE OF A

COPD EXACERBATION RISK PREDICTION MODEL ................................................................................... 27

3.1 INTRODUCTION ............................................................................................................................................... 27

3.2 METHODS ....................................................................................................................................................... 28

3.2.1 Sample Data ......................................................................................................................................... 28

3.2.2 Clinical Prediction Tool ....................................................................................................................... 29

3.2.3 Racial Differences in Exacerbation and Model Adjustment ................................................................ 30

3.2.4 Model Goodness-of-fit .......................................................................................................................... 31

3.2.5 Model Performance .............................................................................................................................. 31

3.3 RESULTS ......................................................................................................................................................... 32

3.3.1 Participants .......................................................................................................................................... 32

3.3.2 Racial Differences in Exacerbation Risk ............................................................................................. 33

3.3.3 The Effect of Individual Predictors ...................................................................................................... 34

3.3.4 The Effect of Adjusting for Race on Model Performance .................................................................... 35

3.4 DISCUSSION .................................................................................................................................................... 38

CHAPTER 4: CONCLUSION ................................................................................................................................... 41

4.1 OVERVIEW AND CONTRIBUTION ..................................................................................................................... 41

4.2 STRENGTHS OF THIS RESEARCH ...................................................................................................................... 42

4.3 LIMITATIONS OF THIS RESEARCH .................................................................................................................... 43

4.4 IMPLICATIONS FOR PRACTICE ......................................................................................................................... 45

4.5 IMPLICATIONS FOR FUTURE RESEARCH ........................................................................................................... 46

REFERENCES ........................................................................................................................................................... 48

APPENDICES ............................................................................................................................................................ 59

List of Tables

Table 2-1. Baseline Characteristics of the Study Sample of the Included Trials ......................................................... 17

Table 2-2. Time-dependent AUC at 12 Months ............................................................................................................ 19

Table 2-3. Dominating Risk Stratification Algorithms at the Three Threshold Levels ................................................ 22

Table 3-1. Baseline Characteristics of Included Participants ..................................................................................... 32

Table 3-2. Racial Differences in Exacerbation ............................................................................................................ 34

Table 3-3. Model Series Evaluating Goodness-of-fit ................................................................................................... 34

Table 3-4. Mean Exacerbation Risk by Race ............................................................................................................... 36

List of Figures

Figure 1-1. COPD Pharmacotherapy Management Guidelines .................................................................................... 3

Figure 2-1. Decision Curve Analysis of Risk Stratification Algorithms ...................................................................... 21

Figure 3-1. Calibration plots of ACCEPT and ACCEPT-Race. .................................................................................. 36

Figure 3-2. Decision curves of ACCEPT vs. ACCEPT-Race. ..................................................................................... 37

xii

List of Abbreviations

Acute COPD Exacerbation Prediction Tool

AECOPD

acute exacerbations of COPD

AUC

area under the curve

British Columbia

BMI

body mass index

CAD

Canadian dollar

CAT

COPD Assessment Test

mMRC

Modified Medical Research Council

COPD

chronic obstructive pulmonary disease

CTS

Canadian Thoracic Society

DCA

decision curve analysis

emergency department

FEV1

forced expiratory volume at one second

FVC

forced vital capacity

GOLD

Global Initiative for Chronic Obstructive Lung Disease

ICS

inhaled corticosteroid

LABA

long-acting β2 agonists

LAMA

long-acting muscarinic receptor antagonists

LOTT

Long-term Oxygen Treatment Trial

risk ratio

xiii

Acknowledgements

I want to thank my supervisor Dr. Mohsen Sadatsafavi for his endless support in helping me

accomplish this work. His mentorship has made my graduate experience truly amazing, and I

will forever be grateful. I also want to thank the other members of my supervisory committee

Drs. Donald Sin, Jacquelyn Cragg, and Kate Johnson for their expertise, constructive feedback,

and guidance, which significantly enhanced this work. I owe additional thanks to Kate Johnson

for her mentorship and words of encouragement to guide me through my professional career.

I extend my thanks to the entire Respiratory Evaluation Sciences Program (RESP) and

Collaborations for Outcomes Research and Evaluation (CORE) team for assisting me throughout

this journey. I owe particular thanks to Dr. Abdollah Safari, University of Tehran, and Amin

Adibi for setting the foundational work of this thesis and providing extensive analytic advice. I

also thank Harry Tae Yoon Lee and Joseph Emil Amegadzie for their unwavering support in the

forms of advice, assistance, and most importantly, friendship.

Finally, I owe thanks to my friends and family that have been supporting me since the beginning,

as producing this thesis would not have been possible without them. My father Khiet Ho, my

mother Chieu Tran, my sister Lynn Ho, and the entire Ho and Tran Family for their support. All

my friends in TSF who kept me grounded and reminded me of the purpose of this arduous but

fun professional journey.

Chapter 1: Introduction

1.1 Chronic Obstructive Pulmonary Disease

The clinical context of this thesis is chronic obstructive pulmonary disease (COPD). COPD is a

chronic lung condition affecting millions of Canadians.4 Its natural course is characterized by

symptoms of breathlessness and frequent coughing, gradual lung function decline, and episodes

of intensified disease activity referred to as exacerbations (or lung attacks).5 The two main risk

factors for COPD are smoking and aging;6 others include genetics, occupational exposure, and

poorly controlled asthma.6 Historically, COPD has been considered a disease for elderly men,

reflecting their high prevalence of smoking.7 However, due to recent smoking trends, and the

greater susceptibility of female lungs to inhaled toxins, COPD now greatly affects the wider

Canadian population.7

In addition, the global prevalence of COPD has greatly increased over the last decades.8 COPD

is a significant cause of morbidity and mortality and is associated with substantial healthcare

costs.9,10 Globally, COPD is the third leading cause of death.11 Some studies have projected that

by 2030, there will be a further 150% increase in the number of patients with COPD.12 In

addition, hospitalizations due to COPD were projected to increase by 182% in the same time

span.12 In 2011, COPD expenditure was estimated to cost the Canadian healthcare system $4.25

billion.13 In British Columbia (BC), the excess total costs of COPD from 2001 to 2010 were

$5,424 (2010 CAD) per patient-year with the majority of costs attributable to hospital

admissions.9 Given the growing prevalence of COPD, an aging population, and clear gaps in

care, the burden of COPD is expected to further increase.12 Thus, it is critical to address the

causes of hospitalization for COPD to alleviate its health and economic burden.

COPD exacerbations considerably affect quality of life and are a leading cause of

hospitalizations.12,14,15 Exacerbations can be defined in different ways. The symptom-based

definition relies on patient-reported worsening of symptoms such as increased dyspnea and

sputum production. Alternatively, the event-based definition is based on a change to the required

healthcare service in the presence of worsening respiratory symptoms. With the event-based

definition, exacerbations are categorized as mild (self-managed at home with short-acting

bronchodilators), moderate (requiring treatment with systemic corticosteroids or antibiotics), or

severe (requiring inpatient care). Major COPD guidelines generally utilize the event-based

definition.5,16–18

1.2 Management of COPD: Risk Stratification and Preventive Therapies

Guidelines recommend the diagnosis of COPD be based on spirometry testing.16,19 A forced

expiratory volume at one second (FEV1) to forced vital capacity (FVC) ratio of less than 0.7 is

typically used to diagnose COPD.16,19 Alternatively, an FEV1/FVC ratio below the lower limit of

normal, defined as the lower fifth percentile of a healthy reference population, is also considered

a diagnosis of COPD. The influential Global Initiative for Chronic Obstructive Lung Disease

(GOLD) guidelines stratifies patients into 4 groups depending on their exacerbation history and

symptom score (Modified Medical Research Council dyspnea scale or COPD Assessment Test

score).16 These 3 groups (A, B, and E) guide the therapeutic management of COPD to reduce

mortality and prevent exacerbations. The Canadian COPD management guidelines, developed by

the Canadian Thoracic Society (CTS), adopt a similar classification system to inform therapy for

Canadians.17 COPD severity is also divided into 4 levels (mild, moderate, severe, and very

severe) based on cut-offs of FEV1/FVC ratios.20

Inhaled pharmacotherapies are the cornerstone of preventive therapy for exacerbations in COPD.

The main drug classes are inhaled bronchodilators (short and long-acting β2 agonists),

corticosteroids (ICS), and long-acting muscarinic antagonists (LAMA). 16,17 COPD guidelines

and management strategies by GOLD and the Canadian Thoracic Society (CTS) recommend a

stepwise approach to therapy based on the ‘frequent exacerbator’ label.17,21 This is defined as

having > 2 moderate or > 1 severe exacerbation in the previous 12 months.17,21 Figure 1-1 shows

the current major COPD treatment algorithms from GOLD and CTS. COPD preventive therapy

is largely guided by categorizing patients into one of the risk groups based on their history of

exacerbation and a symptom score.

Figure 1-1. COPD Pharmacotherapy Management Guidelines

A: GOLD guidelines for COPD pharmacotherapy;21,22 B: CTS guidelines for COPD pharmacotherapy (adapted from

Bourbeau et al.17)

† Patients are considered at ‘low risk of AECOPD’ with < 1 moderate AECOPD in the last year (moderate

AECOPD is an event with prescribed antibiotic and/or oral corticosteroids), and did not require hospital

admission/ED visit; or at ‘high risk of AECOPD’ with > 2 moderate AECOPD or > 1 severe exacerbation in the last

year (severe AECOPD is an event requiring hospitalization or ED visit).17 The CTS definition of ‘high risk of

AECOPD’ is equivalent to the ‘frequent exacerbator’ label used by GOLD.17,21

* Blood eosinophil > 300/mL in patients with a history of AECOPD may be useful to predict a favorable response to

an ICS combination inhaler.

‡ Oral therapies = roflumilast, N-acetylcysteine, daily dose azithromycin

AECOPD = acute exacerbations of COPD; CAT = COPD Assessment Test; CTS = Canadian Thoracic Society;

GOLD = Global Initiative for Chronic Obstructive Lung Disease; FEV1 = forced expiratory volume at one second;

ICS = inhaled corticosteroid; LABA = long-acting β2 agonists; LAMA = long-acting muscarinic antagonist; mMRC

= Modified Medical Research Council; SABD = short-acting bronchodilator.

1.3 The Need for Accurate Risk Stratification in COPD

A current problem with major COPD guidelines is that they fail to consider the heterogenous

nature of COPD when risk-stratifying patients. The risk of experiencing an exacerbation greatly

varies across patients.23 Even amongst patients with the same history of exacerbation, there is

substantial heterogeneity in future exacerbation risk.24 The guidelines are overly reliant on

exacerbation history and use categorical labels to define patients as being either an ‘infrequent’

or a ‘frequent’ exacerbator.17,21 This binary classification does not quantify future risk and thus

cannot communicate individualized risk to patients. Although exacerbation history is the single

best predictor for future exacerbations,23,25 there is increasing evidence to suggest that its

predictability of future exacerbations, based on history alone, may be less reliable than

previously thought.26,27 A recent analysis showed that the ‘frequent exacerbator’ classification

can change by 45% over two consecutive years due to chance alone.28 The high variability of

exacerbation history within patients year-to-year, coupled with its uncertain predictive power,

raises serious concerns regarding its suitability to guide pharmacotherapy.26,27,29

1.4 Clinical Prediction Models

Clinical prediction models (also referred to as clinical prediction tools or clinical prediction

algorithms) are multivariable models that combine patient-specific characteristics to estimate the

risk of a clinically important outcome.30 They are designed to improve prognostic accuracy and

risk stratification.30 Prediction models are widely used and are an integral component of care

across different clinical domains. For example, the Framingham Risk Score used in

cardiovascular disease can estimate the 10-year risk of coronary heart disease.31 Unlike binary

classifiers, prediction models can quantify and communicate future risk with patients to enable

shared decision-making and personalization of care.30 Personalized risk prediction would enable

targeting therapies to those that would benefit most from them.

Prediction models can be developed by various means, such as, fitting a regression equation or

machine learning algorithm with clinical data. Steyerberg et al.32,33 proposed a systematic

checklist to ensure methodological rigor. The important steps in developing a model include

consideration of the prediction problem, predictor coding, model specification, model estimation,

model performance, model validation, and model presentation. Evaluating validity through

external validation is a critical step to assess model generalizability. Guidelines for reporting on

clinical prediction models have been proposed in the Transparent Reporting of a multivariable

prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement.34

1.5 Assessing the Performance of Clinical Prediction Models

The performance of clinical prediction models is generally evaluated with two key measures.33

Calibration relates to the degree to which the model’s predicted risk aligns with the observed

risk. Calibration can be evaluated through calibration plots, calibration intercept and slope, and

mean calibration. Perfect calibration would be represented with a calibration slope of 1 and an

intercept of 0. Discrimination relates to the model’s ability to discriminate those with the

outcome from those without the outcome. The concordance (c) statistic and the receiver

operating characteristic curve’s (ROC) area under the curve (AUC) are the most commonly used

measures of discrimination.33,35 The c-statistic is equivalent to AUC for binary outcomes. ROC

curves plot true positive rates (sensitivity) against false positive rates (1 – specificity). A high-

quality model has both good calibration and discrimination.

Calibration and discrimination are inherently statistical metrics for the performance of clinical

prediction models.36 Even if a model performs well in both metrics, its clinical utility should be

assessed. Ideally, clinical studies (randomized trials, before-after studies) would be used to

evaluate the model in a clinical setting. However, the ‘potential’ clinical utility of a clinical

prediction model can also be evaluated using the same data (e.g., external validation sample) that

is used to assess calibration and discrimination. The decision curve analysis (DCA) is a method

that can quantify the ‘net benefit’ of a prediction model and evaluate its potential clinical

utility.36 The DCA assigns a relative weight between true positive and false positive

classifications of the prediction model based on a specific treatment threshold value.36,37 True

positive cases are those who are classified as high-risk by the model and will experience an

outcome, and false positives are those classified as high-risk but do not experience an outcome.

For example, a treatment threshold of 50% implies that the decision-maker considers the benefit

of a true positive classification to be equal to the harm of a false positive classification; this

weight can be used to calculate the net benefit of the prediction model at a given threshold:

1.6 Updating Clinical Prediction Models

If the performance of a clinical prediction model is shown to be suboptimal in a target

population, it is possible to revise the model to improve its performance.38,39 When applied to

new patient populations, prediction models can suffer significant performance decreases which

can result in harmful decision-making.40 Model updates are often required and can drastically

improve performance to mitigate risks of harm.40 Such model revisions can take on any level of

complexity.41–43 For example, if a clinical prediction tool systemically underestimates or

overestimates the risk, and if the underlying model is based on a regression equation, the

intercept of the equation can be adjusted to correct for the biased estimation of average risk.41–43

If the model is egregiously miscalibrated and the slope of the calibration plot is significantly

different from 1, the equation slope can also be recalibrated. More complex model updating

involves re-estimation of predictor coefficients and even the addition or removal of predictors.41–

43 A stepwise approach is recommended to avoid excessive revision and reduce the risk of

overfitting, especially when the new sample size is small.37 Continual model evaluation across

different settings is critical for the future success of prediction models when they are eventually

disseminated into clinical practice.

1.7 Clinical Prediction Models Under Investigation: ACCEPT and Bertens

There are many clinical prediction models for COPD exacerbations. However, a 2017 review of

COPD exacerbation risk prediction models found that most previously developed models did not

undergo robust development and were ultimately considered not ready for clinical

implementation.44 A model by Bertens et al.,3 presented in the review, was the only one with a

low risk of bias and could be considered potentially relevant. The Acute COPD Exacerbation

Prediction Tool (ACCEPT) was developed to address the shortcomings of previously proposed

prediction models and is the first one considered clinically ready.1,37,45 Both prediction models

enable individualized predictions of the future risk of moderate/severe exacerbations in patients

with COPD.1–3 These nuanced predictions allow clinicians to accurately risk-stratify two patients

with identical exacerbation histories to tailor preventive treatment.

ACCEPT was developed using pooled data from three randomized control trials which included

2,380 patients with a mean age of 64.7 years (SD 8.8 years).1 In all three trials, patients who had

a history of at least one exacerbation in the past 12 months were recruited.1,46–48 ACCEPT uses

up to 13 predictors which is detailed in its developmental studies.1,2 These include the number of

moderate and severe exacerbations in the previous 12 months, baseline age, sex, current smoking

status (y/n), post-bronchodilator FEV1% predicted, current statin use, domiciliary oxygen

therapy, and body mass index (BMI) as core predictors.1,2 The St. George’s Respiratory

Questionnaire (SGRQ)49 score (or COPD Assessment Test50), and current use of inhaled

pharmacotherapy, such as, LAMA, long-acting β2 agonists (LABA), and ICS are optional

predictors.

The model by Bertens et al.3 was developed using data from a COPD cohort which included 240

patients with a mean age of 73.6 years (SD 5.2 years). Patients aged 65 years or over and

diagnosed with COPD were selected from 51 primary care sites in the Netherlands. The model

uses 4 predictors which include the presence of a moderate or severe exacerbation in the past 12

months, smoking pack-years, FEV1% predicted, and history of vascular disease (i.e., stroke,

transient ischemic attack, or peripheral arterial disease).

1.8 Current Knowledge Gaps

Clinical prediction models are not widely adopted in the routine management of COPD despite

their potential benefits. By comparison, contemporary management of cardiovascular diseases is

based on multistep algorithms involving objective risk predictions across all levels.51 The current

standard risk stratification methods for COPD care fail to consider many important patient

characteristics that impact disease management. Traditionally, the integration of prediction tools

has largely been hindered by their ease of use, hampering their clinical sensibility.52 With the

increasing availability of electronic health records (EHR), real-time data retrieval and automatic

risk calculation should fully address this issue. There are no completed studies evaluating the use

of clinical prediction tools in a clinical setting for COPD management to date. Currently, there is

an ongoing cluster randomized clinical trial evaluating the impact of integrating clinical

prediction tools for COPD exacerbations into harmonized EHR in British Columbia.53

Before clinical implementation, a critical issue regarding the applicability of risk prediction

models across different clinical settings was recently brought to light in an extensive study of

104 unique cardiovascular disease risk prediction models.40 This study found that there were

significant decreases in model performance, even with externally validated models, when they

were naively applied to new patient cohorts.40 This concern is increasingly relevant in the COPD

landscape as COPD prediction models are rarely externally validated.44 The heterogeneity

between different patient populations beyond the model’s included predictors can greatly affect

its performance. Thus, a crucial aspect of implementation is to evaluate the prediction model in

different patient cohorts and assess for factors that can finetune the model performance. The

practice of continual evaluation and model updating is of utmost importance to ensure models

yield maximal benefit in real-world practice.41–43

Model generalizability is especially important for COPD because exacerbation risk can vary

greatly across different subgroups.54–57 These subgroups encompass a combination of factors that

result in different exacerbation risks. For example, Calverley et al.54 showed notable differences

in exacerbation frequency internationally that could not be explained by exacerbation history or

differences in baseline characteristics of the patients recruited. These differences may also be

beyond what is captured by a prediction model’s included predictors. To address this, some

prediction models have incorporated setting-specific adjustments into their risk prediction to

enhance their applicability across different settings.38,39,58 Examples of variables that are often

not included in clinical prediction models are geographic region, specialty of care, race/ethnicity,

and gender roles.

Race in particular is a complex socio-cultural variable that is associated with a multitude of

exacerbation risk factors such as smoking, BMI, socioeconomic status, exposure to

environmental pollutants, and differences in quality of care received.55,56,59,60 Existing COPD risk

prediction models incorporate some of the predictors, such as smoking and BMI;1,3,44 previous

studies have reported greater symptom burden and lower lung function in Blacks and Hispanics

compared to Caucasians, which are predictors in some tools. 61–63 However, such differences

might reflect differences in disease progression, diagnosis, and care, not all of which are

manifested in the value of a few clinical indices. There may be many other factors associated

with race that may not be adequately accounted for in existing models. Ultimately, these

differences may affect the risk of COPD exacerbations and it is crucial to ensure that risk

prediction models can effectively capture these differences.

1.9 Objectives

The first objective of my thesis was to evaluate the predictive performance and clinical utility of

COPD exacerbation risk stratification algorithms, including exacerbation history alone (current

standard of care for risk stratification) and multivariable prediction models, across patient

cohorts with different background exacerbation risks. Following this, I aimed to determine

whether recalibrating the model with the background exacerbation risk within each cohort could

improve model performance and clinical utility.

Exacerbation risk can be different across race groups due to a variety of factors.55,56 It is

currently not known whether existing predictors in an exacerbation risk prediction model can

capture this difference or if race should be explicitly added to the model. The second objective of

my thesis was to assess whether adjusting a prediction model for race could improve its

predictive performance and clinical utility.

1.10 Thesis summary

The encompassing goal of this thesis was to evaluate the performance of COPD exacerbation

risk prediction models across different clinical settings and assess whether they required setting-

specific updates to provide acceptable performance.

In Chapter 2, I report on the generalizability of three risk stratification algorithms across three

sample cohorts representing populations at different levels of background exacerbation risk. I

examined ACCEPT and Bertens et al.’s 3 prediction model compared to exacerbation history

alone for predicting moderate/severe exacerbations in the next 12 months. I measured the

algorithms’ clinical utility using the DCA 36 and quantified predictive performance by measuring

discrimination and calibration. Lastly, I examined the effect of model recalibration on clinical

utility. The results of this chapter showed that although prediction models for COPD

exacerbation risk prediction had better predictive performance compared to exacerbation history

alone, model recalibration is required to confer higher clinical utility.

In Chapter 3, I report on the unadjusted and adjusted differences in exacerbation risk between

race groups. I also applied shrinkage methods to account for heterogeneity in race effects due to

sampling variability.39 I developed a race-adjustment factor for ACCEPT based on the shrunken

race effects and assessed whether an ‘ACCEPT + race’ model has improved discrimination,

calibration, and clinical utility compared to the ACCEPT model. My analyses showed ACCEPT

was well calibrated across the race groups. Further, it showed that most observed race/ethnicity

effects were due to differences in other predictors across race groups and sampling variability.

Thus, a race-adjustment factor did not improve predictive performance, clinical utility, or model

goodness-of-fit.

In Chapter 4, I conclude this thesis by summarizing my findings and discussing their

implications for implementing prediction models into routine COPD management. I identify the

strengths and limitations of my studies as well as considerations for future research in the

ongoing refinement of existing COPD prediction models to improve their quality and facilitate

adoption into care.

Chapter 2: Generalizability of Risk Stratification Algorithms for

Exacerbations in COPD

2.1 Introduction

Chronic obstructive pulmonary disease (COPD) is a common pulmonary disease whose course is

punctuated by acute episodes of worsening symptoms (breathlessness, excessive sputum, and

coughing), referred to as exacerbations.4 Exacerbation prevention is a cornerstone of

contemporary COPD management. According to the Global Initiative for Chronic Obstructive

Lung Disease (GOLD), exacerbation prevention is achieved by directing pharmacotherapy based

on patients’ prior 12-month history of moderate or severe exacerbations.5,16 This strategy is also

adopted by the Canadian Thoracic Society guidelines.17 While exacerbation history is the single

best predictor for future exacerbations,23,25 relying on history alone for risk prediction may be

sub-optimal as a growing body of evidence suggests that the predictability of exacerbations

based on history alone may be less reliable than previously believed.26,27 A recent study

demonstrated high variability in exacerbation history within patients from one year to another

due to chance alone, which raised serious doubts regarding the suitability of this approach in

guiding pharmacotherapy.29

Clinical prediction tools are multivariable models that combine several patient characteristics to

increase the accuracy of risk stratification.30 Unlike exacerbation history alone, they can quantify

(e.g., in risk %) and communicate future risk with patients to enable shared decision-making.

Importantly, prediction models are flexible and can be updated to accommodate background risk

in different settings.38,39 This is likely to be a critical issue in COPD exacerbation risk prediction,

as a recent study has demonstrated wide variability in exacerbation rates across the world, even

among patients who had the same 12-month exacerbation history at baseline.54 Given this,

models that are developed in high-risk settings may perform poorly when applied to low-risk

settings, and vice versa.

The sensitivity of model performance to background event rate was highlighted in a recent

review evaluating the performance of 104 unique cardiovascular disease risk prediction

models.40 This study found that when models were naively applied to new patient cohorts, there

was a significant drop in model performance, arising from poor calibration, which in some cases

had the potential to cause patient harm (the clinical utility of the model being lower than not

doing any risk stratification).40 In contrast, when models incorporated the background risk of the

outcome in the target population, clinical utility was significantly enhanced.40 This suggests that

risk stratification algorithms need to be flexible and adjustable for optimal clinical

implementation. The primary objective of this study was to evaluate the clinical utility of risk

stratification algorithms, including multivariable risk prediction models and exacerbation history

alone, across cohorts with different exacerbation risks. The secondary objective was to determine

whether model recalibration with the observed exacerbation risk within each sample can improve

their clinical utility.

2.2 Methods

2.2.1 Risk Stratification Algorithms

I compared the GOLD risk stratification label of ‘frequent exacerbators’ (defined as having ≥2

moderate or ≥1 severe exacerbation) with two published, validated COPD exacerbation risk

prediction models. The first model was from Bertens et al.3 (henceforth referred to as ‘Bertens’).

This was the only model that a 2017 comprehensive systematic review of COPD exacerbation

risk prediction models considered to have undergone robust development and external

validation.44 Bertens uses 4 predictors which include the number of exacerbations in the previous

12 months, forced expiratory volume at one second (FEV1) expressed as % predicted, pack-years

of smoking, and a history of vascular disease.3 The second model was the latest version of the

Acute COPD Exacerbation Prediction Tool (ACCEPT), which was developed to enable

individualized predictions of the rate and severity of exacerbations.1,2 ACCEPT uses up to 13

predictors including the number of non-severe and severe exacerbations in the previous 12

months, baseline age, sex, current smoking status (y/n), post-bronchodilator FEV1% predicted,

current use of statins as a surrogate for cardiovascular disease risk, domiciliary oxygen therapy,

and body mass index (BMI).1,2 The St. George’s Respiratory Questionnaire (SGRQ)49 score (or

COPD Assessment Test50), as well as current use of inhaled long-acting muscarinic receptor

antagonists (LAMA), long-acting β2 agonists (LABA), and inhaled corticosteroids (ICS) are

optional predictors. I used the full version of ACCEPT which requires all of the above-

mentioned predictors.

2.2.2 Sources of Data

I used data from three randomized clinical trials representing 3 levels of exacerbation risk: the

placebo arm of the Study to Understand Mortality and Morbidity in COPD (SUMMIT,

N=2,421)64, the Long-term Oxygen Treatment Trial (LOTT, N=595)65, and the placebo arm of

the Towards a Revolution in COPD Health (TORCH, N=1,091)66. Both treatment arms from

LOTT were used because, unlike the other two studies, there were no significant differences in

the exacerbation rate between the treatment arms. Patients in SUMMIT had on average a higher

post-bronchodilator FEV1 compared to patients in LOTT and TORCH. TORCH had the greatest

proportion of patients with a previous 12-month history of exacerbations, followed by LOTT and

then SUMMIT. These resulted in a gradient of exacerbation risk across SUMMIT (low risk),

LOTT (medium risk), and TORCH (high risk).

Missing predictor values were imputed with multiple imputation according to the same

methodology as described previously.2 In brief, I performed 10 repetitions and the final

prediction values were based on the mean predicted values of all repetitions. No participant

received a LAMA in TORCH as it was not widely available during the study period and

concomitant use was also not permitted for this study. However, setting the value of LAMA in

this dataset to zero would be inappropriate, as such form of non-use would not be tantamount to

not using the medication if it were available. Similar to the approach taken in a previous study,2

LAMA values were imputed for TORCH. SGRQ scores (used for ACCEPT) were the only other

missing predictor (SUMMIT had 1,708 participants and TORCH had 264 participants with

missing SGRQ scores). All analyses were done in R 4.1.2 (R Foundation for Statistical

Computing, Vienna, Austria). Ethics Approval was obtained from the University of British

Columbia’s Human Ethics Board (H22-01462).

2.2.3 Primary Outcome

The primary outcome was the prospective 12-month risk of a moderate/severe exacerbation.

Moderate exacerbations were those that required treatment with systemic corticosteroids and/or

antibiotics. Severe exacerbations were those that resulted in emergency department visits or

hospitalizations. These event-based definitions were used by all three trials and are in alignment

with the definition promulgated by GOLD.16

2.2.4 Discrimination of Risk Stratification Algorithms

Model discrimination (the ability of a risk stratification algorithm to distinguish high- versus

low-risk patients) was assessed by receiver operating characteristic (ROC) curves and calculating

the area under the curve (AUC). Time-dependent (at 12 months) ROC curves and AUCs were

used to account for the variability in follow-up time across patients and were compared using the

DeLong test.67–69

2.2.5 Clinical Utility of Risk Stratification Algorithms

Clinical utility was measured through net benefit calculations using the decision curve analysis

(DCA).36 Whereas discrimination evaluates the statistical performance, the DCA provides a

comprehensive assessment of the clinical utility of risk stratification to inform treatment

decisions.36 The underlying principle for the DCA is that a given treatment threshold specified to

separate high-risk from low-risk individuals implies a relative weight between true positive and

false positive classifications.36,37 Here, true positive cases are those who are classified as high-

risk and will actually experience an exacerbation in the next 12 months, and false positives are

those who are predicted to exacerbate but do not experience an exacerbation in follow-up. For

example, a treatment threshold of 50% implies that the decision-maker considers the benefit of a

true positive classification to be equal to the harm of a false positive classification. Such a weight

can then be used to calculate the net benefit of a risk stratification algorithm at a given threshold.

Because no treatment thresholds are formally identified for exacerbation risk prediction in

COPD, I evaluated the net benefit curve at three different thresholds (low:0.22, medium:0.38,

and high:0.52) which corresponds to the annual observed exacerbation risks in each cohort. The

net benefit of risk stratification algorithms should always be compared against two default

strategies that do not require risk stratification: treating no patients and treating all patients. A

risk stratification algorithm was considered harmful if it generated a lower net benefit than either

of the default strategies at a given threshold.40 To account for variable follow-up time, a time-

dependent decision curve analysis was conducted, using the methodology described by Vickers

et al.70 I did not report 95% confidence intervals for the decision curves because statistical

inference for a measure of clinical utility is not a relevant concept for decision making.71

2.2.6 Risk Prediction Model Recalibration

Because multivariable risk prediction models generate numerical estimates of risks, their

calibration (how well the predicted risks agree with the observed risks) can be evaluated and, if

necessary, updated. Model calibration was assessed by comparing the predicted and observed

risks through calibration plots. Individuals were first grouped into deciles based on their

predicted risk. The mean observed risk for each decile was then plotted against the predicted

risk. Each risk prediction model was separately recalibrated within each cohort by adjusting the

model intercept with a fixed odds-ratio transformation using the sample’s observed outcome risk

(recalibration-in-the-large).41,72 Of note, such monotonical transformation does not change the

AUC of the model.

2.3 Results

2.3.1 Participants

The baseline characteristics of the three study samples are summarized in Table 2-1. SUMMIT

included 2,421 patients (mean age 65.9 years, 75.1% male) and contributed 636 exacerbations.

LOTT included 595 patients (mean age 69.7 years, 72.9% male) and contributed 369

exacerbations. TORCH included 1,091 patients (mean age 65.5 years, 77.2% male) and

contributed 1,074 exacerbations. The annual risk of exacerbations in SUMMIT, LOTT, and

TORCH was 0.22, 0.38, and 0.52, respectively.

Table 2-1. Baseline Characteristics of the Study Sample of the Included Trials

SUMMIT

LOTT

TORCH

2,421

595

1,091

Follow-up time, yr

Mean (SD)

0.83 (0.24)

0.95 (0.15)

0.96 (0.13)

Age, yr

Mean (SD)

65.9 (7.9)

69.7 (7.3)

65.5 (8.2)

Males

N (%)

1,818 (75.1)

434 (72.9)

842 (77.2)

BMI, kg/m2

Mean (SD)

28.1 (5.7)

28.8 (6.3)

25.6 (5.3)

Current smoking status

N (%)

1144 (47.2)

134 (22.5)

462 (42.3)

Smoking pack-years

Mean (SD)

40.6 (24.5)

60.7 (32.7)

48.4 (26.5)

LAMA

N (%)

624 (25.8)

374 (62.9)

601* (55.1)

LABA

N (%)

1308 (54.0)

436 (73.3)

372 (34.1)

ICS

N (%)

1258 (52.0)

442 (74.3)

518 (47.5)

SUMMIT

LOTT

TORCH

SGRQ

Mean (SD)

43.7 (16.2)

48.7 (18.5)

45.7 (16.9)

FEV1 % predicted

Mean (SD)

58.3 (11.4)

46.7 (16.8)

44.9 (13.9)

History of > 1 moderate/severe exacerbation

0.23

0.38

0.48

Observed risk of > 1 moderate/severe

exacerbation

0.22

0.38

0.52

*LAMA use is based on imputed values.

BMI = body mass index; FEV1 = forced expiratory volume at one second; ICS = inhaled corticosteroid; LABA =

long-acting β2 agonists; LAMA = long-acting muscarinic receptor antagonists; SGRQ = St. George’s Respiratory

Questionnaire

2.3.2 Discrimination

A summary of the AUC values can be found in Table 2-2. The AUC for exacerbation history

alone in predicting future exacerbations in SUMMIT, LOTT, and TORCH was 0.59 (95%CI

0.57–0.61), 0.63 (95%CI 0.59–0.67), and 0.65 (95%CI 0.63–0.68), respectively. Bertens had a

higher AUC compared to exacerbation history alone in SUMMIT (increase of 0.10, P-value

<0.001), and TORCH (increase of 0.05, P-value <0.001), but not in LOTT (increase of 0.01, P-

value 0.84). ACCEPT had higher AUC compared with exacerbation history alone in all study

samples, by 0.08 (P-value <0.001), 0.07 (P-value 0.001), and 0.10 (P-value <0.001),

respectively. Compared to Bertens, ACCEPT had higher AUC by 0.06 (P-value 0.001) in LOTT

and 0.05 (P-value <0.001) in TORCH, whereas the AUCs were not different in SUMMIT

(change of -0.02, P-value 0.16). ROC curves can be found in Appendix A.1.

Table 2-2. Time-dependent AUC at 12 Months

Time-dependent AUC (95% Confidence Interval)

SUMMIT

LOTT

TORCH

Exacerbation

History

0.59

(0.57 - 0.61)

0.63

(0.59 - 0.67)

0.65

(0.63 - 0.68)

Bertens

0.69 a

(0.66 - 0.72)

0.64

(0.59 - 0.69)

0.70 a

(0.67 - 0.74)

0.67 a

(0.63 - 0.70)

0.70 a, b

(0.65 - 0. 74)

0.75 a, b

(0.72 - 0.78)

ACCEPT (Acute COPD Exacerbation Prediction Tool)

a Statistically significant compared to exacerbation history

b Statistically significant compared to Bertens

2.3.3 Calibration

Calibration plots of the average risk of exacerbations per decile are presented in Appendix A.2.

In SUMMIT, Bertens was well calibrated and showed good agreement between observed and

predicted risk (observed risk 0.22 vs predicted risk 0.20). Comparatively, ACCEPT

overestimated the risk with a predicted annual risk of 0.34. In LOTT, Bertens underestimated the

risk (observed risk 0.38 vs predicted risk 0.27) while ACCEPT overestimated the risk (predicted

risk 0.53). In TORCH, Bertens underestimated the risk (observed risk 0.52 vs predicted risk

0.28) while ACCEPT was well calibrated with a predicted risk of 0.51.

After model recalibration, the mean adjusted predicted risk of exacerbation for both Bertens and

ACCEPT matched the observed risks in the study samples. Because Bertens was already well

calibrated in SUMMIT, and ACCEPT in TORCH, the improvements were relatively minor for

each model in the respective studies. A summary of the unadjusted and adjusted risk compared to

the observed risk of exacerbation is presented in Appendix A.3.

2.3.4 Net benefit

The decision curves for all risk stratification algorithms are presented in Figure 2-1. The

algorithm with the highest net benefit at the three pre-specified thresholds within each sample are

provided in Table 2-3.

Figure 2-1. Decision Curve Analysis of Risk Stratification Algorithms

Decision curve analysis comparing the net benefit of the risk stratification algorithms when unadjusted and adjusted

with the sample-specific exacerbation risk. (A) Unadjusted prediction models in SUMMIT; (B) Adjusted prediction

models in SUMMIT; (C) Unadjusted prediction models in LOTT; (D) Adjusted prediction models in LOTT; (E)

Unadjusted prediction models in TORCH; (F) Adjusted prediction models in TORCH.

Table 2-3. Dominating Risk Stratification Algorithms at the Three Threshold Levels

UNADJUSTED

SUMMIT

LOTT

TORCH

LOW

MEDIUM

B/Hx

HIGH

ADJUSTED

LOW

A/B

MEDIUM

A/ Hx

A/B/Hx

A/B

HIGH

A/Hx

A/B

Unadjusted: Prediction models without adjustment for background exacerbation risk. Adjusted: Prediction models

adjusted for background exacerbation risk

A: ACCEPT (Acute COPD Exacerbation Prediction Tool), B: Bertens, Hx: Exacerbation History

In SUMMIT, Bertens and exacerbation history outperformed ACCEPT. Bertens dominated at the

low threshold, whereas exacerbation history dominated at the high threshold. In LOTT, no risk

stratification algorithm clearly dominated. ACCEPT was the best at the low threshold, Bertens at

the medium threshold, and exacerbation history at the high threshold. In TORCH, ACCEPT

dominated the other algorithms at all three threshold values.

All three risk stratification algorithms were associated with a risk of harm (their net benefit being

lower than that of treating no patients or treating all patients). Exacerbation history had lower net

benefit than treating all patients at the low threshold in LOTT, and at the low and medium

thresholds in TORCH. Bertens had lower net benefit than treating all patients at the low

threshold and treating no patients at the high threshold in LOTT. Bertens was also worse than

treating all patients at the low threshold in TORCH. ACCEPT had lower net benefit than treating

no patients at the medium threshold in SUMMIT and at the high threshold in LOTT.

The clinical utility of both prediction models greatly improved following recalibration. The

recalibrated ACCEPT either dominated or was no worse than exacerbation history across all

three thresholds and samples. Use of ACCEPT was no longer harmful at any of the thresholds in

the study samples. In SUMMIT, there was a significant improvement in ACCEPT’s net benefit

at all threshold levels. In LOTT, both prediction models had improved clinical utility at different

thresholds. Bertens showed improvement at the low threshold, ACCEPT at the medium

threshold, and both at the high threshold. In TORCH, the recalibrated Bertens showed significant

improvements and was now tied with ACCEPT at all thresholds. The use of Bertens was no

longer harmful at any threshold.

2.4 Discussion

Accurate prediction of exacerbation risk is essential for COPD management. Contemporary

guidelines are reliant on a 12-month exacerbation history to risk-stratify patients for the choice of

preventive therapies. However, it has been shown that exacerbation history alone cannot account

for the variability in exacerbation risk imposed by other factors.24,26,27 Multivariable risk

prediction models, which combine other characteristics with exacerbation history, have been

developed to improve predictive powers. This study evaluated the statistical performance and

clinical utility of three risk stratification algorithms in three clinical cohorts representing patients

with varying levels of exacerbation risk. I found that risk prediction models generally had better

discriminatory performance compared with exacerbation history alone. However, when

considering clinical utility, no algorithm emerged as universally better than others. Critically, it

was found that all three risk stratification algorithms had the risk of causing harm.

These results have clinical implications. First, in high exacerbation risk settings, use of

exacerbation history alone to guide therapeutic choices may cause harm when the therapy has a

low to medium treatment threshold, such as therapies which have a low risk of adverse events

and are relatively inexpensive (e.g., LAMA+LABA therapy). In such instances, the clinical

utility of exacerbation history alone might be below that of providing such low-risk therapies to

all patients. The only common clinical scenario in which exacerbation history may be useful is in

guiding therapeutic choices for therapies that have high treatment thresholds (i.e. fraught with

significant side effects) such as azithromycin or oral roflumilast.73,74

Another important finding is that prediction models cannot be universally applied; rather, they

should be adapted for different patient populations based on the overall exacerbation risk. A

recent study examining international differences in the frequency of COPD exacerbations

showed that there were large variations in exacerbation risk even among individuals who had the

same exacerbation history.54 Evidence from other specialties indicates that in the face of such

heterogeneity, risk stratification algorithms can be associated with low clinical utility. Gulati et

al.40 found that even widely used prediction models for cardiovascular diseases, such as the

Framingham Risk Score, can be harmful in settings where the risk of the targeted event was

significantly different from that of the derivation cohort.40 I showed that when Bertens, which

was developed in a cohort with a low risk of exacerbation (annual observed exacerbation risk:

0.16),3 was applied to a high exacerbation risk cohort, it performed poorly with a significant risk

of causing harm to patients. ACCEPT, on the other hand, was developed using data from clinical

trials that only recruited patients with a positive exacerbation history,1,2 and as such performed

sub-optimally in patients with low risk of exacerbations. In addition to causing harm, a model

that misestimates risk can lead to over or under-treatment compared with guidelines.

The capability of being adaptable to different local settings is a key feature differentiating risk

prediction models from binary classifiers like exacerbation history. Such model revision can take

any level of complexity.41–43 A stepwise approach to model updating is recommended to avoid

extensive revision and the dangers of overfitting in new samples.43 The most accessible approach

is through an intercept adjustment based on an estimate of the outcome risk in the target

population.41,43 When I applied this methodology, there were major improvements to model

calibration, and subsequently, improvements to the clinical utility of both risk prediction models.

Similar to the study by Gulati et al.40, I showed that the risk of harm was substantially mitigated

by adjusting the models for background risk of outcome in the cohort. The updated risk

prediction models were mostly superior to exacerbation history alone at all selected treatment

thresholds in all three cohorts. These results highlight the importance of model flexibility to

incorporate background risk as well as other local factors in generating predicted risk

estimates.41–43

The defining strength of this study is that it follows best practices and evaluates the performance

of existing prediction models in different patient populations rather than developing new

ones.42,75 Prediction models, specifically in COPD, are rarely externally validated in new

populations.76 In general, clinical prediction models that have been externally validated are only

evaluated once and it has been shown that model performance greatly varies when evaluated

across multiple cohorts.76,77 Model performance tends to decrease in new patient populations

which can be detrimental to their clinical utility.40 However, the adaptability of risk prediction

models means that it is possible to substantially improve model performance and mitigate the

risk of harm in new patient cohorts by adjusting the model to the target population.

Our study had several limitations. I evaluated only two risk prediction models; therefore, our

results may not be generalizable to other models in this clinical domain. Missing data for

predictor values required imputation. However, the only missing predictor values were for

ACCEPT (LAMA and SGRQ score) and it has been shown that ACCEPT’s results are robust to

the absence of these values.2 Both prediction models incorporate cardiovascular risk into their

risk prediction, and as such, may have reduced discrimination in SUMMIT because all

participants had cardiovascular risk factors by inclusion criteria. Our net benefit analysis was

over a range of treatment thresholds because there are no formal treatment thresholds for COPD

exacerbations. Nonetheless, the flexibility of the DCA allows net benefit to be assessed across all

treatment thresholds and thus allows these results to be revisited once additional studies are

performed to identify relevant thresholds. Lastly, although intercept adjustment significantly

improved model performance, further studies are needed to examine more nuanced model

updating strategies capable of further improving model generalizability.

2.5 Conclusion

Current COPD management guidelines recommend preventive exacerbation therapy based on the

patients’ exacerbation history, but this strategy can be associated with harm in certain situations.

Multivariable risk prediction models, if well-calibrated, can provide a more accurate risk

prediction; however, they can suffer from miscalibration if applied to cohorts that are dissimilar

to ones that were used in their development. The clinical utility of these risk prediction models

can be significantly enhanced if they are calibrated to the background exacerbation risk in each

population. This highlights the importance of model evaluation and possible re-calibration before

they are deployed in a new clinical setting

Chapter 3: Assessing the Impact of Race on the Predictive Performance of a

COPD Exacerbation Risk Prediction Model

3.1 Introduction

In chronic obstructive pulmonary disease (COPD), exacerbations are major drivers of clinical

deterioration and mortality.4,14 The burden of COPD exacerbations on the healthcare system is

projected to worsen given an aging population, continued exposure to risk factors, and gaps in

care.12 Accurate assessment of risk and prevention of future exacerbations is crucial in the

contemporary management of COPD.5,16 Currently, exacerbation history is the single best

predictor of future exacerbations and is the primary tool for risk stratification in major COPD

guidelines.5,16 However, there are concerns regarding its reliability for risk stratification and

guiding therapy because exacerbation history can greatly vary from year to year.26,27,29

Efforts have been made to combine other predictors with exacerbation history in multivariable

prediction models to better predict prospective exacerbation risk.1,44 These models are important

tools for quantifying risk, providing clinical guidance for healthcare providers, and informing

patients. The Acute COPD Exacerbation Prediction Tool (ACCEPT) was a recently developed

model that predicts individualized risk of moderate/severe COPD exacerbations based on clinical

information.1,2 Although designed to improve accuracy in prediction, the applicability of

prediction models can be hampered by differences in their developmental and target settings that

are not captured by their included predictors.40,78 This heterogeneity in the ‘case mix’ can

decrease model performance and ultimately result in potential harm to patients when prediction

models are naively applied in different settings.40,78

In the previous chapter, I showed that even externally validated COPD prediction models could

be miscalibrated when applied to different settings.78 Such differences highlight the importance

of model flexibility to provide accurate setting-specific predictions. Part of the difference in

exacerbation risk across populations can be attributed to patient, setting, and system factors that

are not included in a prediction model. For example, a recent study evaluating clinical trial data

has shown significant variability in exacerbation risk between countries, even amongst the

relatively homogeneous samples of patients included in the clinical trials.54 This implies that

including a ‘country effect’ in COPD exacerbation risk prediction may improve model

performance by potentially accounting for a proportion of the unexplained heterogeneity. Other

factors that can explain such heterogeneity but are often left out of conventional risk-scoring

tools include socioeconomic status, gender roles (over and above sex as a variable), race, and

socioeconomic status.

In this chapter, I focused on race as a setting-specific adjustment factor for ACCEPT. Across

other disease domains, race has been used to adjust prediction outputs.79 There is ongoing

discussion regarding the ethics of including race in prediction algorithms, especially related to

the hotly debated issue of ‘algorithmic fairness’.79,80 Arguments for excluding race stem from the

lack of biologically substantiated evidence and the potential to perpetuate race-based health

inequities.79 Contrastingly, the exclusion of race may hamper prediction accuracy and have a

significant impact on recommended care.80,81 It is currently not known whether adjusting

ACCEPT for race can improve its predictive performance. The objective of my study is to

investigate the impact of race-adjustment, using a random-effects approach, on ACCEPT’s

discrimination, calibration, and clinical utility.

3.2 Methods

3.2.1 Sample Data

My sample included data from three randomized clinical trials, as was used in my previous study

(n=4,097): the placebo arm of the Study to Understand Mortality and Morbidity in COPD

(SUMMIT, N=2,421),64 the Long-term Oxygen Treatment Trial (LOTT, N=594),65 and the

placebo arm of the Towards a Revolution in COPD Health (TORCH, N=1,082).66 I used both

arms from LOTT because the exacerbation rate in the treatment and placebo arm was not

statistically significantly different, unlike the other included studies. Race and ethnicity data

were self-reported in all three clinical trials.

The race variable was categorized into 5 groups: Caucasian, Black, Asian, Hispanic, and

Indigenous. Indigenous was comprised of trial participants listed as ‘Native Hawaiian or Pacific

Islander’ and ‘American Indian or Alaska Native’. Hispanic was categorized as a ‘race’ in

TORCH but as a binary variable under ‘ethnicity’ in SUMMIT and LOTT; in these 2 trials,

individuals could be any other race plus Hispanic ethnicity. My study categorized all individuals

with Hispanic ethnicity as Hispanic race, similar to a widely cited study examining racial/ethnic

disparities in diabetes prevalence.82 Thus, all other race groups were considered non-Hispanic. I

categorized individuals with multiple races based on their non-Caucasian race; if Indigenous was

included in the combination, the individual was assumed to be Indigenous. There were no non-

Caucasian race combinations. Participants were excluded from my analysis if they were

categorized as ‘other’ for race and when no information on race was available.

The strategy for imputing missing data with multiple imputation is detailed in previous studies

with ACCEPT.2,78 I performed 10 iterations of imputation and used the mean prediction value of

the iterations. In TORCH, no participant received a LAMA because of lack of availability, and

concurrent use was not permitted during the study. Setting LAMA use to zero for all patients

would not be appropriate because this form of non-use would not be representative of non-use if

LAMAs were available. Thus, LAMA use was imputed for TORCH participants.2,78 St. George’s

Respiratory Questionnaire (SGRQ)49 scores were missing for patients in TORCH (n=264) and

SUMMIT (n=1,708) and were imputed. All analyses were done in R 4.1.2 (R Foundation for

Statistical Computing, Vienna, Austria). Ethics Approval was obtained from the University of

British Columbia’s Human Ethics Board (H22-01462).

3.2.2 Clinical Prediction Tool

I used the latest version of ACCEPT in my analysis.1,2 ACCEPT uses up to 13 predictors to

generate quantifiable predictions for moderate/severe exacerbations.1,2 The core predictors

include the 12-month history of moderate and severe exacerbations, age, sex, current smoking

status (y/n), post-bronchodilator forced expiratory volume in 1 second % predicted (FEV1%),

current statin use, domiciliary oxygen use, and body mass index (BMI).1,2 Optional predictors

include current use of COPD inhaled pharmacotherapy such as long-acting muscarinic receptor

antagonists (LAMA), long-acting β2 agonists (LABA), and corticosteroids (ICS) as well as

COPD symptom scores such as the SGRQ 49 score (or the COPD Assessment Test).50 I used the

full version of ACCEPT with all 13 predictors. Predictor values were collected at the beginning

of the trial period and outcome was assessed in the following 12-month period.

Exacerbations were categorized using the event-based definition, which is in accordance with the

Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines and the included

trials.16 Exacerbations that required systemic corticosteroids and/or antibiotics but did not result

in an inpatient visit were classified as moderate. Severe exacerbations required either an

emergency department visit or a hospitalization.

3.2.3 Racial Differences in Exacerbation and Model Adjustment

I estimated the unadjusted and adjusted racial differences in 12-month moderate/severe

exacerbation risk using fixed-effects and random-effects regression analyses that included

dummy variables for each race. I used generalized linear models with a binomial distribution and

a logit link function. The unadjusted analysis included race as the only independent variable and

observed exacerbation frequency as the dependent variable. The adjusted analyses accounted for

ACCEPT’s prediction output as an additional independent variable. I compared each race to the

unadjusted or adjusted average exacerbation risk. This was done by setting the regression

analyses offset variable as the log average exacerbation risk for the unadjusted analysis and as

the intercept of the adjustment model for the adjusted analysis.

To adjust ACCEPT’s prediction output for race, I developed an intercept adjustment factor using

a random-effects approach. Whereas the fixed-effects method estimates group differences in

isolation, a random-effects approach has been proposed to assess the distribution of outcome

differences across different settings.39,83–86 The random-effects approach uses information from

other settings to regress extreme estimates to the mean. For example, settings that have

extremely low/high exacerbation risk would actually be more likely to have true risks closer to

the average than to have more extreme outcomes. Random-effects updating has been applied in

cardiology and renal prediction models and is viewed as the method of choice when different

levels of effects are considered.39,83–86 I adapted Steyerberg et al.’s 39 methodology to quantify a

shrunk estimator for race which was used as the race-specific intercept adjustment factor for

ACCEPT (henceforth referred to as ‘ACCEPT + Race’).

3.2.4 Model Goodness-of-fit

To evaluate whether race-adjustment improved model fit, I used the likelihood-ratio χ2 test

(LRT) to compare the fixed-effect model of ACCEPT + Race with ACCEPT alone. I also

assessed a series of nested models, which were comprised of bivariate analyses with each of

ACCEPT’s predictors as the independent variable and observed exacerbation frequency as the

dependent variable. Using the LRT, the nested models were compared to their equivalent race-

adjusted model. This was done to assess which predictor impacted the effect of race. If adding

race statistically significantly improved the nested models’ goodness of fit, then that variable was

assumed to not effectively capture the observed racial differences.

3.2.5 Model Performance

I compared the performance of ACCEPT to ACCEPT + Race by evaluating discrimination,

calibration, and clinical utility. Discrimination is a prediction model’s ability to correctly

distinguish individuals who experience an outcome from those who do not. I measured

discrimination by using time-dependent receiver operating characteristic (ROC) curves and

calculating the area under the curve (AUC) to account for variable follow-up time across

patients. The DeLong test was used to assess statistical significance.67–69 Calibration is the extent

to which the model’s predicted risk aligns with the observed risks. I evaluated this with

calibration plots by first grouping individuals into deciles based on their predicted risk. The

mean observed risk for each decile was then plotted against the predicted risk. Calibration

intercept and slope were also examined to compare differences between the models.

Clinical utility was measured using the decision curve analysis (DCA). Both discrimination and

calibration are statistical measures that do not inform whether the model improves clinical

decisions.36,87 The DCA was developed to overcome this limitation and assess the model’s

performance in supporting decision-making.36 The core principle of the DCA is that a given

treatment threshold, at which treatment is recommended, implies a relative weight between true

and false positive classifications.36,37 For example, a treatment threshold of 50% implies an equal

value for the benefit of a true positive classification compared to the harm of a false positive

classification. This relationship can be used to quantify net benefit at any given treatment

threshold. I assess clinical utility using a time-dependent DCA to account for variable follow-up

time.70 Calibration, discrimination, and clinical utility were all assessed at the 12-month point in

follow-up.

3.3 Results

3.3.1 Participants

Table 3-1 summarizes the patient characteristics of the three samples. SUMMIT contributed

2,421 patients (mean age 65.9 years, 75.1% male, 77.9% Caucasian) and had an observed

moderate/severe exacerbation risk of 0.22. LOTT contributed 594 patients (mean age 69.7 years,

72.9% male, 84.5% Caucasian) and had an observed moderate/severe exacerbation risk of 0.38.

There was 1 patient in LOTT that had no race information and was excluded from the analysis.

TORCH had contributed 1,082 patients (mean age 65.6 years, 77.2% male, 81.1% Caucasian)

and had an observed moderate/severe exacerbation risk of 0.52. There were 9 patients in TORCH

that had no race information and were excluded from the analysis. Baseline characteristics

stratified by race can be found in Appendix B.1.

Table 3-1. Baseline Characteristics of Included Participants

SUMMIT

LOTT

TORCH

2421

594

1082

Mean (SD)

Follow-up time, yr

0.83 (0.24)

0.95 (0.15)

0.96 (0.13)

Age, yr

65.9 (7.9)

69.7 (7.3)

65.6 (8.1)

BMI, kg/m2

28.2 (5.7)

28.8 (6.3)

25.6 (5.3)

SGRQ

43.9 (10.4)

48.7 (18.5)

45.6 (16.9)

FEV1 % Predicted

58.3 (11.4)

46.7 (16.8)

45.0 (13.9)

SUMMIT

LOTT

TORCH

Count (%)

Males

1818 (75.1)

433 (72.9)

835 (77.2)

Current smoking

status

1144 (47.3)

134 (22.6)

458 (42.3)

LAMA

624 (25.8)

373 (62.8)

596 (55.0)

LABA

1308 (54.0)

435 (73.2)

369 (34.1)

ICS

1258 (52.0)

441 (74.2)

513 (47.4)

Statin

1562 (64.5)

256 (43.1)

655 (60.5)

Oxygen

69 (2.9)

90 (15.2)

75 (6.9)

Race

Caucasian

1887 (77.9)

502 (84.5)

877 (81.1)

Asian

349 (14.4)

3 (0.5)

154 (14.2)

Black

31 (1.3)

64 (10.8)

13 (1.2)

Hispanic

131 (5.4)

7 (1.2)

38 (3.5)

Indigenous

23 (1.0)

18 (3.0)

0 (0.0)

History of > 1

moderate/severe

exacerbation

0.23

0.38

0.47

Observed risk of > 1

moderate/severe

exacerbation

0.22

0.38

0.52

BMI = body mass index; FEV1 = forced expiratory volume at one second; ICS = inhaled corticosteroid; LABA =

long-acting β2 agonists; LAMA = long-acting muscarinic receptor antagonists; SGRQ = St. George’s Respiratory

Questionnaire

3.3.2 Racial Differences in Exacerbation Risk

Table 3-2 summarizes the unadjusted and adjusted racial differences in moderate/severe

exacerbations, as well as the shrunk estimator from the random-effects model. The unadjusted

effect of race on moderate/severe exacerbations was statistically significant in the LRT (P =

0.003). These differences ranged from risk ratios (RRS) of 0.96 for Caucasians to 1.57 for

Indigenous. After adjusting for ACCEPT, the estimated fixed-effect differences are reduced

corresponding to RRs between 0.97 and 1.28. The random-effects shrinkage further reduced the

differences to a range of 0.99 to 1.07. The more uncertain estimates experienced greater

shrinkage towards 1, as seen in the Indigenous group compared to Caucasians.

Table 3-2. Racial Differences in Exacerbation

Race

Unadjusted

95% CI

Fixed-

Effects

95% CI

Random-

Effects

Caucasian

0.96

0.92 - 1.01

0.97

0.92 - 1.02

0.99

Asian

1.21

1.06 - 1.37

1.16

1.02 - 1.31

1.07

Black

1.31

1.02 - 1.64

1.02

0.80 - 1.28

1.03

Hispanic

1.01

0.80 - 1.25

1.11

0.88 - 1.37

1.04

Indigenous

1.57

1.06 - 2.22

1.28

0.86 - 1.81

1.04

RRs for unadjusted, fixed-effects and random-effects adjusted exacerbation risk.

CI = Confidence interval; RR = Risk ratio

3.3.3 The Effect of Individual Predictors

Table 3-3 shows a series of nested models compared to its race-adjusted counterpart with regard

to goodness-of-fit. Race was a statistically significant predictor of exacerbation risk when

examined with each of ACCEPT’s predictors in isolation, with exacerbation history being the

only exception (P = 0.067). As well, race did not have an independent effect in the model with

ACCEPT’s risk score (P = 0.145).

Table 3-3. Model Series Evaluating Goodness-of-fit

Nested Model

Nested model

deviance

Race-adjusted

model deviance

Δ Deviance

P-value

Exacerbation

history

4567.5

4558.8

8.8

0.067

Sex

5634.0

5615.3

18.7

0.001

Age

5637.5

5621.8

15.7

0.003

Oxygen use

5529.6

5513.7

15.9

0.003

Current

smoker

5613.3

5600.3

13.0

0.011

BMI

5561.6

5551.7

9.9

0.042

FEV1 %

predicted

5257.4

5246.8

10.6

0.031

SGRQ

5503.3

5483.9

19.4

0.001

Statin use

5632.4

5614.1

18.3

0.001

Nested Model

Nested model

deviance

Race-adjusted

model deviance

Δ Deviance

P-value

LAMA use

5405.9

5386.4

19.5

0.001

LABA use

5640.0

5623.7

16.2

0.003

ICS use

5596.5

5580.4

16.1

0.003

4023.8

4016.9

6.8

0.145

A statistically significant improvement (P < 0.05) indicates that goodness-of-fit improved with the addition of race.

BMI = body mass index; FEV1 = forced expiratory volume at one second; ICS = inhaled corticosteroid; LABA =

long-acting β2 agonists; LAMA = long-acting muscarinic receptor antagonists; SGRQ = St. George’s Respiratory

Questionnaire

3.3.4 The Effect of Adjusting for Race on Model Performance

The AUCs for ACCEPT in predicting future exacerbations in SUMMIT, LOTT, and TORCH

were 0.67 (95%CI 0.63 – 0.70), 0.70 (95%CI 0.65 – 0.74), and 0.74 (95%CI 0.71 – 0.77),

respectively. The Δ AUC for ACCEPT-Race were <0.01 in all samples and not statistically

significant with P-values of 0.30, 0.20, and 0.17, respectively. Calibration plots of ACCEPT

compared to ACCEPT + Race are presented in Figure 3-1. There were no discernible changes in

calibration when ACCEPT was adjusted for race. In SUMMIT, LOTT, and TORCH, the Δ

calibration intercepts were 0.03, 0.01, 0.02, and the Δ calibration slopes were 0.007, 0.004,

0.002, respectively. The mean predicted exacerbation risk stratified by race can be found in

Table 3-4. ACCEPT appears well-calibrated in each race group with no notable improvements in

calibration with ACCEPT + Race. The decision curves comparing ACCEPT to ACCEPT + Race

are presented in Figure 3-2. There were no significant changes to the clinical net benefit of

ACCEPT after adjusting for race across all treatment thresholds.

Figure 3-1. Calibration plots of ACCEPT and ACCEPT-Race.

The plots compare per decile mean observed risk presented with 95% confidence intervals per decile. (A) SUMMIT;

(B) LOTT; (C) TORCH.

Table 3-4. Mean Exacerbation Risk by Race

Caucasian

Asian

Black

Hispanic

Indigenous

Observed

0.29

0.31

0.32

0.25

0.37

0.29

0.30

0.37

0.26

0.33

ACCEPT-

Race

0.29

0.31

0.37

0.27

0.38

Figure 3-2. Decision curves of ACCEPT vs. ACCEPT-Race.

The net benefit of the two models is compared in (A) SUMMIT, (B) LOTT, and (C) TORCH.

3.4 Discussion

The applicability of prediction models across different settings is an essential element to their

future success in clinical implementation. The adaptability of prediction models differentiates

them from binary risk classifiers because they can be revised through model updates. Often in

prediction modeling, the focus is largely placed on clinical variables representing disease status

and severity. However, health outcomes can also be affected by individual- and system-level

factors. One such factor is race. In this study, I identified an association between race and COPD

exacerbations. This association was largely mitigated when accounting for ACCEPT’s overall

risk predictions. By examining each of ACCEPT’s predictors individually, I showed that the

single most important predictor explaining this effect was exacerbation history. The association

between race and COPD exacerbations was further shrunken after applying a random-effects

approach. Concordantly, I showed that incorporating a race-adjustment factor into ACCEPT did

not notably improve traditional metrics of predictive performance or clinical utility. The

implications of my analyses suggest that the current combination of ACCEPT’s predictors

successfully captures the race effect for risk prediction of COPD exacerbations.

Race correction in the field of prediction algorithms has been a long-debated topic. There are

many concerns regarding the inclusion of race in prediction algorithms, with most arguments

stemming from the potential of these algorithms to propagate race-based medicine and

perpetuate health inequities across marginalized racial groups.79,88 There are still many prediction

algorithms that incorporate race as a feature. Although there is mounting evidence to suggest that

race is not a biological construct, it is still possible for the race variable to yield predictive

power. There are concerns that entirely ignoring race may decrease predictive performance

overall which could lead to substandard clinical decision-making and yield negative health

consequences for all races.80,81

My study has several strengths. Primarily, I presented a rigorous framework for assessing

different aspects of model performance following an update. I focused specifically on race as a

factor that could convey information on varying background exacerbation risks between

populations. However, this framework could be applied to any of the aforementioned individual-

or system-level factors. My analysis also did not entirely refit the model and instead utilized an

intercept adjustment to avoid overfitting the model in this sample. Nevertheless, I showed that

the racial differences after adjustment were negligible and, as such, race-adjustment may not be

necessary. My study also employed a random-effects method which is related to hierarchical

modeling and recognized as more rigorous for developing a setting-specific adjustment

factor.39,83–86 I illustrated that the random-effects method resulted in smaller racial differences

compared to the traditional fixed-effects method. By taking the uncertainty of the estimated

differences into account, this approach reduced any exaggerated differences between settings.39

This ‘shrinkage’ of estimates is preferred for predictive purposes.83,89–92

Limitations for my study should be acknowledged. The sample sizes for non-Caucasian groups

were relatively small as the sample was largely Caucasian. Nonetheless, a statistically significant

difference in exacerbation risk was still detected between race groups. Further examination of

setting-specific adjustment factors is warranted in larger and more diverse samples. Another

concern is the limitation of race data obtained from clinical trials. Broad assumptions are made

regarding how certain races are classified which can result in the potential misclassification of

race.93,94 Mismeasurement of race can also occur when races are generalized into broader groups

(e.g., Asian including Southeast Asian, South Asian, Japanese, etc.). Compared to existing race-

adjusted prediction algorithms, which typically only examine < 2 races, my study evaluates more

race classes. Lastly, missing data for predictor values requiring imputation was a limitation.

However, the only missing values were SGRQ scores and LAMA input for TORCH. ACCEPT

has been shown to be largely robust to these missing values.2

An important point of discussion is that the results of my study do not capture the nuances of

algorithm fairness and whether to or not include race in prediction algorithms based on these

considerations.79,95 Rather, my analyses strictly pertain to the statistical quality and predictive

performance of ACCEPT. The study of algorithm discrimination, biases, and fairness has

become more prominent with the emergence of prediction algorithms. The focus of this field is

to dissect algorithm biases and discuss whether the algorithm propagates discrimination

inherently or through the inclusion of variables that are entwined with biases.79,96,97 The

measurement of these biases is separate from measurements of predictive performance and

should be addressed for all prediction algorithms. Both aspects of model evaluation should be

considered, and thus future studies are needed to investigate algorithmic biases in ACCEPT.

In Chapter 2, I showed that there is significant unexplained heterogeneity in the background

exacerbation risk across clinical settings that mandate setting-specific model updating. When a

prediction model fails to accurately predict risk, it could be the result of a variable not accounted

for by the model that can explain part of such heterogeneity. In this chapter, I detected an

association between race and exacerbation risk and explored the variable race as a potential

candidate. I found that a race-adjustment factor did not notably improve ACCEPT’s predictive

performance or clinical utility. As such, race could not have played any substantial role in the

variability observed in the previous analysis. With regard to ACCEPT’s ability to produce valid

predictions, there may not be a need to include a racial component. The challenge still stands in

seeking and developing innovative model updating strategies to refine existing models before

implementation in real-world settings, as well as ensuring algorithmic fairness across subgroups

of patients with COPD.

Chapter 4: Conclusion

4.1 Overview and contribution

In this thesis, I assessed the generalizability of COPD exacerbation risk prediction models and

used model updating strategies to improve model applicability across different clinical settings.

Both chapters present a framework in which prediction model performance can be evaluated.

In Chapter 2, I examined 3 risk stratification algorithms, which included 2 externally validated

prediction models (ACCEPT and Bertens) compared to exacerbation history alone (status quo),

in 3 clinical trials. I measured the algorithms’ clinical utility with a decision curve analysis and

their predictive performance with discrimination and calibration. Following this, I recalibrated

both prediction models with a monotonical intercept adjustment using the background

exacerbation risk in each sample and evaluated whether there were improvements to their

clinical utility. The results of my analysis highlighted the potential risk of harm when risk

stratification algorithms are naively applied in different settings. Exacerbation history alone is

the current gold standard used by major COPD guidelines for risk-stratifying patients and

recommending therapy. However, exacerbation history is not capable of adapting to different

patient populations due to its inflexible nature. Comparatively, prediction models are capable of

being revised and my analysis showed that they provided more accurate risk predictions.

Although the prediction models also posed a risk of harm when miscalibrated, this was

substantially mitigated once they were recalibrated. These results remained consistent across the

3 sample cohorts which represented varying levels of background exacerbation risk. A major

clinical implication of my findings from Chapter 2 is that risk stratification algorithms should not

be universally applied. Rather, careful consideration should be taken in re-evaluating the

algorithm for different patient populations and adapting based on the overall outcome risk.

Ultimately, these results provide a cautionary tale for ‘off-the-shelf’ use of prediction models but

shed light on potential strategies to improve performance and ensure their future success when

implemented into COPD care.

Information on a population’s exact background exacerbation risk is not always available for use

to correct risk prediction models. Unexplained differences in exacerbation risk across

populations can also be attributed to patient-, system-, or setting-level variables not accounted

for by the prediction model’s included predictors. These variables often represent subgroups

(e.g., race, geographic region, socioeconomic status, gender roles) that encompass a combination

of factors resulting in different exacerbation risks. Thus, adjustment factors have been

incorporated into prediction models in other disease domains to account for this heterogeneity.

38,39,58 In Chapter 3, I focused on the effect of race on exacerbation risk and evaluated whether a

race-adjustment factor could improve ACCEPT’s model performance. Using data from the same

3 clinical trials, I first estimated the unadjusted and adjusted racial differences in exacerbation

risk. I showed that there was an association between race and exacerbation risk which

disappeared when I controlled for predicted exacerbation risk (using ACCEPT). This indicates

that the observed racial differences in exacerbation risk can be explained by the differences in

the distribution of predictors that are included in ACCEPT. By assessing each predictor

individually, I narrowed in on exacerbation history as the most important predictor explaining

this effect. Additionally, I utilized a random-effects methodology to demonstrate that the racial

differences could be further reduced when accounting for uncertainty in the estimated

differences. I developed a race-adjustment factor for ACCEPT, by applying this same

methodology, and compared model discrimination, calibration, and clinical utility between the

base and race-adjusted model. I found that there were no significant improvements to predictive

performance or clinical utility following race-adjustment. The implication of my findings is that

ACCEPT does not require a race-adjustment factor with regard to producing valid predictions.

Ultimately, the challenge remains to find a population-specific adjustment factor to enhance the

applicability of COPD exacerbation risk prediction models.

4.2 Strengths of this research

There are several strengths to my thesis. Compared to other clinical domains,40 the evaluation of

exacerbation risk prediction models in COPD is not well studied in new cohorts.44,76 My work

contributed to this area of study by evaluating existing COPD exacerbation risk prediction

models with the goal of refining them for future implementation in different clinical settings.

Consistent with findings from Gulati et al.’s 40 evaluation of cardiovascular disease risk

prediction models, I showed that risk prediction models in the COPD domain also exhibit

performance loss when applied to different cohorts. I also utilized state-of-the-art methodology

for model performance assessment and updating. I demonstrated in Chapter 2 that a simple

monotonical fixed-effect intercept adjustment given the background outcome risk is capable of

significantly improving the clinical utility of a risk prediction model. Chapter 3 explored a more

nuanced method by using a random-effects methodology to develop an intercept adjustment

factor for race, albeit the added predictability of exacerbation risk with the inclusion of race was

minimal. Another strength of this research is in the diversity of the data used to evaluate these

models. I used data from 3 landmark multicentered COPD clinical trials (with 2 international

trials) that represented different levels of exacerbation risk. Although there are limitations to

clinical trial data, the diversity of this sample should better reflect the heterogenous COPD

patient population. The model evaluation framework is one of the main strengths of this research.

I measured model performance with regards to both calibration and discrimination. These

measures inform on both the models’ ability to distinguish high-risk individuals as well as how

well the predicted risks align with the observed risks, respectively. In additional to statistical

measures, I also assessed clinical utility with respect to whether the prediction model improved

clinical decision-making. Overall, my thesis provides a thorough assessment of COPD

exacerbation risk prediction model performance and opens doors for potential model refinement

strategies.

4.3 Limitations of this research

While the well-phenotyped and rigorously collected clinical trial data ensures high internal

validity, the application of trial results to ‘real-world’ COPD patients remains less clear. The

strict inclusion and exclusion criteria used by clinical trials to upkeep their internal validity

ultimately restrict their generalizability. Individuals willing to participate in clinical trials can be

inherently different from those in the patient population that do not participate. COPD

exacerbation burden and management in the tightly controlled environment of a clinical trial also

differs from the real world. Missing predictor value data requiring imputation is often an issue

when using clinical trial data as well. However, in clinical practice, it is likely that there will be

many instances in which predictor values are missing. Thus, it is crucial for model predictions to

be robust under these circumstances. Defining and measuring a factor like race can be

challenging. Especially in the context of clinical trial data, there are concerns about

misclassifying individuals. Chapter 3 explores the race variable, which is broadly defined in the

included clinical trials, but is a complex construct. Unfortunately, no standard is universally used

for how detailed a variable such as ‘race’ or ‘region’ should be defined, and, as such,

assumptions are needed to classify such variables for interpretation. For example, the clinical

trials define Asian as a single broad classification. However, the Asian group is heterogenous as

it encompasses a variety of subgroups (e.g., South Asian, Southeast Asian, Korean, Japanese,

etc.). Additionally, because sample size was limited it was difficult to further investigate the

model performance within race subgroups, such as Asian males compared to Asian females.

My thesis does not address the nuances of algorithm fairness and whether race-adjustment is

appropriate in this context.79,95 Additional studies with specific outcome measures for algorithm

fairness are needed to assess if race-adjustment in COPD exacerbation risk prediction

perpetuates race-based health inequities. My analyses are limited to the model’s predictive

properties and potential clinical utility in the realm of decision-making. My research only

evaluated two COPD exacerbation risk prediction models, with further investigation into the

refinement of one model (ACCEPT), so the findings may not be generalizable to other models in

this domain. Currently, there are no other reviews evaluating model generalizability in the COPD

domain that is comparable to Gulati et al.’s 40 extensive review. Nonetheless, to still apply a

similar assessment framework to COPD exacerbation risk prediction models, I focused on

prediction models that had potential for use in clinical practice. Guerra et al.’s 44 systematic

review on COPD exacerbation risk prediction found that most models were at high risk of bias

from unsound statistical methods used to develop the model and an overall lack of external

validation. The models evaluated in this thesis were developed using sound statistical methods

and also externally validated.1,3,44,45 Lastly, a notable limitation of interpreting clinical utility

within this thesis lies in the use of a decision curve analysis without formal treatment thresholds

for the management of COPD exacerbations. Because the decision curves are plotted against the

entire possible range of thresholds, these results can be revisited once treatment thresholds are

formally determined.

4.4 Implications for practice

My findings have important implications for advancements in the management of COPD.

Whereas contemporary cardiovascular disease guidelines rely on prediction algorithms to

generate objective risk scores,51 major COPD guidelines still fail to capitalize on the potential of

quantitative risk prediction and rely on binary classifications, such as ‘frequent exacerbator’, to

inform treatment.5,17,21 There is mounting evidence to suggest that exacerbation history alone is

suboptimal for risk-stratifying patients.26,27,29 Building on this, my thesis provides supporting

evidence for the shortcomings of relying on exacerbation history and also presents certain

situations where it can be potentially harmful in the context of clinical decision-making. With

EHRs becoming increasingly prominent in routine care, the integration of prediction models into

COPD management becomes more clinically sensible. My results add to the current literature by

showing that COPD exacerbation risk prediction models are superior to the status quo in

predicting future risk. When objective risk prediction is adopted into routine COPD care, this

will open opportunities for conducting studies to determine specific risk thresholds for treatment

and updating major COPD guidelines with these thresholds.

Despite the promises, my results also show that caution must be taken when using prediction

models in different clinical settings as they can be miscalibrated and, as a result, potentially

harmful. Critically, it illustrates the importance of continual model re-evaluation to ensure that it

is performing up to standard. The silver lining is that prediction models are capable of being

updated and recalibrated to different settings whereas binary classifiers like exacerbation history

are static in nature. The integration of models into EHRs could allow them to be periodically

updated based on their setting-specific performance metrics, given the ability of EHR systems to

collect practice data over time.98 While the particular variable that I investigated, did not have a

strong effect on the predictability of future exacerbations, my study still raises the awareness for

consideration of non-clinical variables in future risk prediction models. It also promotes a state-

of-the-art quantitative framework for properly doing so. This can ultimately enhance the

applicability of COPD exacerbation risk prediction model across different settings without

needing to completely refit the model each time. Overall, this thesis provides perspective into the

nuances of risk prediction in COPD and is an additional stepping stone on the path to

incorporating quantitative risk prediction into routine care.

4.5 Implications for future research

My research only begins to highlight the abundant possibilities of predictive analytics in the

management of COPD. Compared to other disease domains, evidence-based ‘Precision

Medicine’ in COPD is still in its infancy. It serves as a primer for the COPD clinical and

research community to move beyond exacerbation history and embrace more nuanced

multivariable risk stratification strategies. I showed that it is unlikely for a single COPD

exacerbation risk prediction model, based on clinical variables alone, to be universally

applicable. Future research on COPD prediction models should be prepared to explicitly include

background risk as a predictor. The search for these variables remains and presents opportunities

to explore other factors, such as geographic region, socioeconomic status, or gender roles to fully

capitalize on the adaptability of prediction models.

As previously mentioned, a notable limitation in my thesis is the reliance on solely clinical trial

data to evaluate and update the models. Thus, evaluating the models in practice-generated data is

a crucial next step to assess their performance in a real-world setting. The decision curve analysis

is also only a prerequisite for examining clinical utility. Multicentered clinical trials using local

data would be ideal to assess whether the implementation of a risk prediction model into routine

care would yield greater clinical benefit compared to the current standard of care. Multiple

setting-specific trials would be needed to accurately capture the differences between each setting.

Additionally, future studies should also evaluate the cost-effectiveness of integrating an

exacerbation risk prediction model into routine care. There are costs associated with setting up

the infrastructure to support the prediction model in EHRs. For example, the initial setup of the

model into local systems, maintenance, training personnel, and time costs. Assessing the cost-

effectiveness of prediction models is essential to inform decision-makers on the value of

adopting the model into their healthcare system. Additionally, the integration of prediction

models into routine COPD management would present clinicians and patients with quantitative

risk scores. Future studies are needed to identify relevant treatment thresholds and incorporate

these thresholds into guideline management recommendations. This is an important next step in

advancing evidence-informed care and shared decision-making processes for COPD patients.

Lastly, there is a high ceiling for advancing predictive analytics in COPD care. Model prediction

accuracy will continue to improve as more clinical data becomes available to allow for model

updates. As seen in other sectors, prediction model revision can be automated to optimize

predictive accuracy. This opens future research opportunities to incorporate machine learning

into clinical prediction models and maximize their capabilities. There are many possible routes

for future research in prediction modeling. Whether it is determining a setting-specific

adjustment factor, assessing clinical benefit and value, or exploring opportunities for machine

learning, all routes ultimately lead to the advancement of personalized care for COPD patients.

References

1. Adibi, A. et al. The Acute COPD Exacerbation Prediction Tool (ACCEPT): a modelling

study. Lancet Respir Med 8, 1013–1021 (2020).

2. Safari, A. et al. ACCEPT 2·0: Recalibrating and externally validating the Acute COPD

exacerbation prediction tool (ACCEPT). EClinicalMedicine 51, 101574 (2022).

3. Bertens, L. C. M. et al. Development and validation of a model to predict the risk of

exacerbations in chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis

8, 493–499 (2013).

4. Evans, J., Chen, Y., Camp, P. G., Bowie, D. M. & McRae, L. Estimating the prevalence of

COPD in Canada: Reported diagnosis versus measured airflow obstruction. Health Rep 25,

3–11 (2014).

5. 2022 GOLD Reports. Global Initiative for Chronic Obstructive Lung Disease - GOLD

https://goldcopd.org/2022-gold-reports-2/.

6. Barnes, P. J. et al. Chronic obstructive pulmonary disease. Nat Rev Dis Primers 1, 15076

(2015).

7. Barnes, P. J. Sex Differences in Chronic Obstructive Pulmonary Disease Mechanisms. Am J

Respir Crit Care Med 193, 813–814 (2016).

8. Soriano, J. B. et al. Global, regional, and national deaths, prevalence, disability-adjusted

life years, and years lived with disability for chronic obstructive pulmonary disease and

asthma, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015.

The Lancet Respiratory Medicine 5, 691–706 (2017).

9. Khakban, A. et al. Ten-Year Trends in Direct Costs of COPD. Chest 148, 640–646 (2015).

10. López-Campos, J. L., Tan, W. & Soriano, J. B. Global burden of COPD: Global burden of

COPD. Respirology 21, 14–23 (2016).

11. World Health Organization. The top 10 causes of death. 2020.

https://www.who.int/en/news-room/fact-sheets/detail/the-top-10-causes-of-death.

12. Khakban, A. et al. The Projected Epidemic of Chronic Obstructive Pulmonary Disease

Hospitalizations over the Next 15 Years. A Population-based Perspective. Am J Respir Crit

Care Med 195, 287–291 (2017).

13. Najafzadeh, M. et al. Future impact of various interventions on the burden of COPD in

Canada: a dynamic population model. PLoS ONE 7, e46746 (2012).

14. Celli, B. R. & Barnes, P. J. Exacerbations of chronic obstructive pulmonary disease. Eur

Respir J 29, 1224–1238 (2007).

15. Hospital stays in Canada | CIHI. https://www.cihi.ca/en/hospital-stays-in-canada.

16. Vogelmeier, C. F. et al. Global Strategy for the Diagnosis, Management, and Prevention of

Chronic Obstructive Lung Disease 2017 Report. GOLD Executive Summary. Am J Respir

Crit Care Med 195, 557–582 (2017).

17. Bourbeau, J. et al. Canadian Thoracic Society Clinical Practice Guideline on

pharmacotherapy in patients with COPD – 2019 update of evidence. Canadian Journal of

Respiratory, Critical Care, and Sleep Medicine 3, 210–232 (2019).

18. Kim, V. & Aaron, S. D. What is a COPD exacerbation? Current definitions, pitfalls,

challenges and opportunities for improvement. Eur Respir J 52, 1801261 (2018).

19. 2021 report. Global Initiative for Chronic Obstructive Lung Disease - GOLD

https://goldcopd.org/2021-gold-reports/.

20. Johnson, J. D. & Theurer, W. M. A stepwise approach to the interpretation of pulmonary

function tests. Am Fam Physician 89, 359–366 (2014).

21. Singh, D. et al. Global Strategy for the Diagnosis, Management, and Prevention of Chronic

Obstructive Lung Disease: the GOLD science committee report 2019. Eur Respir J 53,

1900164 (2019).

22. 2023 GOLD Reports. Global Initiative for Chronic Obstructive Lung Disease - GOLD

https://goldcopd.org/2022-gold-reports-2/.

23. Hurst, J. R. et al. Susceptibility to exacerbation in chronic obstructive pulmonary disease. N

Engl J Med 363, 1128–1138 (2010).

24. Obeidat, M., Sadatsafavi, M. & Sin, D. D. Precision health: treating the individual patient

with chronic obstructive pulmonary disease. Med J Aust 210, 424–428 (2019).

25. Marott, J. L. et al. Exacerbation history, severity of dyspnoea and maintenance treatment

predicts risk of future exacerbations in patients with COPD in the general population.

Respir Med 192, 106725 (2022).

26. Han, M. K. et al. Frequency of exacerbations in patients with chronic obstructive

pulmonary disease: an analysis of the SPIROMICS cohort. Lancet Respir Med 5, 619–626

(2017).

27. Calverley, P. M. et al. Determinants of exacerbation risk in patients with COPD in the

TIOSPIR study. Int J Chron Obstruct Pulmon Dis 12, 3391–3405 (2017).

28. Sadatsafavi, M. Should the number of acute exacerbations in the previous year be used to

guide treatments in COPD? Eur Respir J Aug 27, (2020).

29. Sadatsafavi, M. et al. Should the number of acute exacerbations in the previous year be

used to guide treatments in COPD? Eur Respir J 57, 2002122 (2021).

30. Pencina, M. J. & Peterson, E. D. Moving From Clinical Trials to Precision Medicine: The

Role for Predictive Modeling. JAMA 315, 1713–1714 (2016).

31. D’Agostino, R. B., Grundy, S., Sullivan, L. M., Wilson, P., & for the CHD Risk Prediction

Group. Validation of the Framingham Coronary Heart Disease Prediction Scores: Results of

a Multiple Ethnic Groups Investigation. JAMA 286, 180 (2001).

32. Steyerberg, E. W. & Vergouwe, Y. Towards better clinical prediction models: seven steps

for development and an ABCD for validation. European Heart Journal 35, 1925–1931

(2014).

33. Steyerberg, E. W. Clinical prediction models: a practical approach to development,

validation, and updating. (Springer, 2019).

34. Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a

multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the

TRIPOD Statement. BMC Medicine 13, 1 (2015).

35. Pencina, M. J. & D’Agostino, R. B. Evaluating Discrimination of Risk Prediction Models:

The C Statistic. JAMA 314, 1063 (2015).

36. Vickers, A. J. & Elkin, E. B. Decision curve analysis: a novel method for evaluating

prediction models. Med Decis Making 26, 565–574 (2006).

37. Sadatsafavi, M. et al. Moving beyond AUC: decision curve analysis for quantifying net

benefit of risk prediction models. Eur Respir J 58, 2101186 (2021).

38. Smits, M. et al. Predicting intracranial traumatic findings on computed tomography in

patients with minor head injury: the CHIP prediction rule. Ann Intern Med 146, 397–405

(2007).

39. Steyerberg, E. W., Eijkemans, M. J. C., Boersma, E. & Habbema, J. D. F. Applicability of

clinical prediction models in acute myocardial infarction: a comparison of traditional and

empirical Bayes adjustment methods. Am Heart J 150, 920e11-e17 (2005).

40. Gulati, G. et al. Generalizability of Cardiovascular Disease Clinical Prediction Models: 158

Independent External Validations of 104 Unique Models. Circ Cardiovasc Qual Outcomes

15, e008487 (2022).

41. Steyerberg, E. W., Borsboom, G. J. J. M., van Houwelingen, H. C., Eijkemans, M. J. C. &

Habbema, J. D. F. Validation and updating of predictive logistic regression models: a study

on sample size and shrinkage. Stat Med 23, 2567–2586 (2004).

42. Moons, K. G. M. et al. Risk prediction models: II. External validation, model updating, and

impact assessment. Heart 98, 691–698 (2012).

43. Vergouwe, Y. et al. A closed testing procedure to select an appropriate method for updating

prediction models. Stat Med 36, 4529–4539 (2017).

44. Guerra, B., Gaveikaite, V., Bianchi, C. & Puhan, M. A. Prediction models for exacerbations

in patients with COPD. Eur Respir Rev 26, 160061 (2017).

45. Bhatt, S. P. COPD exacerbations: finally, a more than ACCEPTable risk score. The Lancet

Respiratory Medicine 8, 939–941 (2020).

46. Albert, R. K. et al. Azithromycin for Prevention of Exacerbations of COPD. N Engl J Med

365, 689–698 (2011).

47. Criner, G. J. et al. Simvastatin for the Prevention of Exacerbations in Moderate-to-Severe

COPD. N Engl J Med 370, 2201–2210 (2014).

48. Aaron, S. D. et al. Tiotropium in Combination with Placebo, Salmeterol, or Fluticasone–

Salmeterol for Treatment of Chronic Obstructive Pulmonary Disease: A Randomized Trial.

Annals of Internal Medicine 146, 545 (2007).

49. Jones, P. W., Quirk, F. H. & Baveystock, C. M. The St George’s Respiratory

Questionnaire. Respir Med 85 Suppl B, 25–31; discussion 33-37 (1991).

50. Jones, P. W. et al. Development and first validation of the COPD Assessment Test. Eur

Respir J 34, 648–654 (2009).

51. Rossello, X. et al. Risk prediction tools in cardiovascular disease prevention: A report from

the ESC Prevention of CVD Programme led by the European Association of Preventive

Cardiology (EAPC) in collaboration with the Acute Cardiovascular Care Association

(ACCA) and the Association of Cardiovascular Nursing and Allied Professions (ACNAP).

Eur J Prev Cardiol 26, 1534–1544 (2019).

52. Laupacis, A., Sekar, N. & Stiell, I. G. Clinical prediction rules. A review and suggested

modifications of methodological standards. JAMA 277, 488–494 (1997).

53. Michaux, K. D. et al. IMplementing Predictive Analytics towards efficient COPD

Treatments (IMPACT): protocol for a stepped-wedge cluster randomized impact study.

Diagn Progn Res 7, 3 (2023).

54. Calverley, P. M. A. et al. International Differences in the Frequency of COPD

Exacerbations Reported in Three Clinical Trials. Am J Respir Crit Care Med 206, 25–33

(2022).

55. Eisner, M. D. et al. Socioeconomic status, race and COPD health outcomes. J Epidemiol

Community Health 65, 26–34 (2011).

56. Ejike, C. O. et al. Contribution of Individual and Neighborhood Factors to Racial

Disparities in Respiratory Outcomes. Am J Respir Crit Care Med 203, 987–997 (2021).

57. Perez, T. A. et al. Sex differences between women and men with COPD: A new analysis of

the 3CIA study. Respiratory Medicine 171, 106105 (2020).

58. Steyerberg, E. W. et al. Perioperative mortality of elective abdominal aortic aneurysm

surgery. A clinical prediction rule based on literature and individual patient data. Arch

Intern Med 155, 1998–2004 (1995).

59. Chatila, W. M., Wynkoop, W. A., Vance, G. & Criner, G. J. Smoking patterns in African

Americans and whites with advanced COPD. Chest 125, 15–21 (2004).

60. Mamary, A. J. et al. Race and Gender Disparities are Evident in COPD Underdiagnoses

Across all Severities of Measured Airflow Obstruction. Chronic Obstr Pulm Dis 5, 177–

184 (2018).

61. Chatila, W. M. et al. Advanced emphysema in African-American and white patients: do

differences exist? Chest 130, 108–118 (2006).

62. Diaz, A. A. et al. Differences in Health-Related Quality of Life Between New Mexican

Hispanic and Non-Hispanic White Smokers. Chest 150, 869–876 (2016).

63. Aldrich, M. C. et al. Genetic ancestry-smoking interactions and lung function in African

Americans: a cohort study. PLoS One 7, e39541 (2012).

64. Vestbo, J. et al. Fluticasone furoate and vilanterol and survival in chronic obstructive

pulmonary disease with heightened cardiovascular risk (SUMMIT): a double-blind

randomised controlled trial. Lancet 387, 1817–1826 (2016).

65. The Long-Term Oxygen Treatment Trial Research Group. A Randomized Trial of Long-

Term Oxygen for COPD with Moderate Desaturation. N Engl J Med 375, 1617–1627

(2016).

66. Vestbo, J. & TORCH Study Group. The TORCH (towards a revolution in COPD health)

survival study protocol. Eur Respir J 24, 206–210 (2004).

67. DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or

more correlated receiver operating characteristic curves: a nonparametric approach.

Biometrics 44, 837–845 (1988).

68. Chiang, C.-T. & Hung, H. Non‐parametric estimation for time-dependent AUC. Journal of

Statistical Planning and Inference 140, 1162–1174 (2010).

69. HUNG, H. & CHIANG, C.-T. Estimation methods for time-dependent AUC models with

survival data. The Canadian Journal of Statistics / La Revue Canadienne de Statistique 38,

8–26 (2010).

70. Vickers, A. J., Cronin, A. M., Elkin, E. B. & Gonen, M. Extensions to decision curve

analysis, a novel method for evaluating diagnostic tests, prediction models and molecular

markers. BMC Med Inform Decis Mak 8, 53 (2008).

71. Vickers, A. J., van Calster, B. & Steyerberg, E. W. A simple, step-by-step guide to

interpreting decision curve analysis. Diagn Progn Res 3, 18 (2019).

72. Sadatsafavi, M., Tavakoli, H. & Safari, A. Marginal Versus Conditional Odds Ratios When

Updating Risk Prediction Models. Epidemiology 33, 555–558 (2022).

73. Taylor, S. P., Sellers, E. & Taylor, B. T. Azithromycin for the Prevention of COPD

Exacerbations: The Good, Bad, and Ugly. Am J Med 128, 1362.e1–6 (2015).

74. Yebyo, H. G. et al. Personalising add-on treatment with inhaled corticosteroids in patients

with chronic obstructive pulmonary disease: a benefit-harm modelling study. Lancet Digit

Health 3, e644–e653 (2021).

75. Moons, K. G. M. et al. Risk prediction models: I. Development, internal validation, and

assessing the incremental value of a new (bio)marker. Heart 98, 683–690 (2012).

76. Bellou, V., Belbasis, L., Konstantinidis, A. K., Tzoulaki, I. & Evangelou, E. Prognostic

models for outcome prediction in patients with chronic obstructive pulmonary disease:

systematic review and critical appraisal. BMJ 367, l5358 (2019).

77. Wessler, B. S. et al. External Validations of Cardiovascular Clinical Prediction Models: A

Large-Scale Review of the Literature. Circ Cardiovasc Qual Outcomes 14, e007858 (2021).

78. Ho, J. K. et al. Generalizability of Risk Stratification Algorithms for Exacerbations in

COPD. Chest S0012369222042167 (2022) doi:10.1016/j.chest.2022.11.041.

79. Vyas, D. A., Eisenstein, L. G. & Jones, D. S. Hidden in Plain Sight - Reconsidering the Use

of Race Correction in Clinical Algorithms. N Engl J Med 383, 874–882 (2020).

80. Manski, C. F. Patient-centered appraisal of race-free clinical risk assessment. Health Econ

(2022) doi:10.1002/hec.4569.

81. Diao, J. A. et al. Clinical Implications of Removing Race From Estimates of Kidney

Function. JAMA 325, 184–186 (2021).

82. Schneiderman, N. et al. Prevalence of Diabetes Among Hispanics/Latinos From Diverse

Backgrounds: The Hispanic Community Health Study/Study of Latinos (HCHS/SOL).

Diabetes Care 37, 2233–2239 (2014).

83. Van Houwelingen, H. C. & Thorogood, J. Construction, validation and updating of a

prognostic model for kidney graft survival. Stat Med 14, 1999–2008 (1995).

84. DeLong, E. Hierarchical modeling: its time has come. Am Heart J 145, 16–18 (2003).

85. Krumholz, H. M., Chen, J., Rathore, S. S., Wang, Y. & Radford, M. J. Regional variation in

the treatment and outcomes of myocardial infarction: investigating New England’s

advantage. Am Heart J 146, 242–249 (2003).

86. Austin, P. C., Tu, J. V. & Alter, D. A. Comparing hierarchical modeling with traditional

logistic regression analysis among patients hospitalized with acute myocardial infarction:

should we be analyzing cardiovascular outcomes data differently? Am Heart J 145, 27–35

(2003).

87. Vickers, A. J. Incorporating Clinical Considerations into Statistical Analyses of Markers: A

Quiet Revolution in How We Think About Data. Clin Chem 62, 671–672 (2016).

88. Braun, L., Wentz, A., Baker, R., Richardson, E. & Tsai, J. Racialized algorithms for kidney

function: Erasing social experience. Soc Sci Med 268, 113548 (2021).

89. Louis, T. A. & Shen, W. Innovations in bayes and empirical bayes methods: estimating

parameters, populations and ranks. Stat Med 18, 2493–2505 (1999).

90. Steyerberg, E. W., Eijkemans, M. J., Harrell, F. E. & Habbema, J. D. Prognostic modeling

with logistic regression analysis: in search of a sensible strategy in small data sets. Med

Decis Making 21, 45–56 (2001).

91. Van Houwelingen, J. C. & Le Cessie, S. Predictive value of statistical models. Stat Med 9,

1303–1325 (1990).

92. Steyerberg, E. W., Eijkemans, M. J. C. & Habbema, J. D. F. Application of Shrinkage

Techniques in Logistic Regression Analysis: A Case Study. Statistica Neerland 55, 76–88

(2001).

93. Campbell, M. E. & Troyer, L. The Implications of Racial Misclassification by Observers.

Am Sociol Rev 72, 750–765 (2007).

94. Feliciano, C. Shades of Race: How Phenotype and Observer Characteristics Shape Racial

Classification. American Behavioral Scientist 60, 390–419 (2016).

95. Mhasawade, V., Zhao, Y. & Chunara, R. Machine learning and algorithmic fairness in

public and population health. Nat Mach Intell 3, 659–666 (2021).

96. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an

algorithm used to manage the health of populations. Science 366, 447–453 (2019).

97. Cirillo, D. et al. Sex and gender differences and biases in artificial intelligence for

biomedicine and healthcare. NPJ Digit Med 3, 81 (2020).

98. Adibi, A., Sadatsafavi, M. & Ioannidis, J. P. A. Validation and Utility Testing of Clinical

Prediction Models: Time to Change the Approach. JAMA 324, 235–236 (2020).

Appendices

Appendix A

A.1. Receiver Operator Characteristic (ROC) Curves

ROC curves of the risk stratification algorithms in (A) SUMMIT, (B) LOTT, and (C) TORCH

A.2. Calibration Plots

Calibration plots of the risk prediction models comparing per decile average predicted and observed risk of

exacerbation before and after adjustment. (A) ACCEPT in SUMMIT; (B) Bertens in SUMMIT; (C) ACCEPT in

LOTT; (D) Bertens in LOTT; (E) ACCEPT in TORCH; (F) Bertens in TORCH

A.3. Unadjusted and Adjusted Annual Risk of Exacerbation of the Risk Prediction Models

SUMMIT

LOTT

TORCH

Observed risk

0.22

0.38

0.52

Unadjusted predicted risk

0.34

0.53

0.51

Bertens

0.20

0.27

0.28

Adjusted predicted risk

0.22

0.38

0.52

Bertens

0.22

0.38

0.52

Appendix B

B.1 Baseline Characteristics of Included Participants Stratified by Race

Caucasian

Asian

Black

Hispanic

Indigenous

3266

506

108

176

Mean (SD)

Follow-up time,

0.89 (0.21)

0.84 (0.23)

0.93 (0.18)

0.82

(0.23)

0.82 (0.26)

Age, yr

66.2 (8.1)

67.4 (7.6)

65.9 (7.8)

66.0

(7.5)

69.0 (6.8)

BMI, kg/m2

28.2 (5.7)

22.8 (4.0)

28.4 (7.6)

28.2

(5.7)

28.4 (5.7)

SGRQ

45.3 (14.3)

42.5

(10.9)

48.2 (16.2)

44.5

(10.6)

49.0 (12.5)

FEV1 %

Predicted

53.1 (14.2)

53.3 (15.2)

49.4 (15.8)

55.1

(14.1)

51.3 (16.7)

Count (%)

Males

2393 (73.3)

470 (92.9)

78 (72.2)

118

(67.0)

27 (65.9)

Current

smoking status

1456 (44.6)

156 (30.8)

39 (36.1)

73 (41.5)

12 (29.3)

LAMA

1312 (40.2)

166 (32.8)

57 (52.6)

41 (23.6)

16 (39.0)

LABA

1718 (52.6)

224 (44.3)

72 (66.7)

70 (39.8)

28 (68.3)

ICS

1781 (54.5)

243 (48.0)

72 (66.7)

89 (50.6)

27 (65.9)

Statin

2019 (61.8)

267 (52.8)

55 (50.9)

112

(63.6)

20 (48.8)

Oxygen

188 (5.8)

21 (4.2)

16 (14.8)

6 (3.4)

3 (7.3)

History of > 1

moderate/severe

exacerbation

0.30

0.36

0.38

0.28

0.41

Observed risk of

> 1

0.29

0.31

0.32

0.24

0.37

moderate/severe

exacerbation

BMI = body mass index; FEV1 = forced expiratory volume at one second; ICS = inhaled corticosteroid; LABA =

long-acting β2 agonists; LAMA = long-acting muscarinic receptor antagonists; SGRQ = St. George’s Respiratory

Questionnaire

2 views·76 pages

GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE PDF Free Download

GENERALIZABILITY OF RISK STRATIFICATION ALGORITHMS FOR ACUTE EXACERBATION OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE PDF free Download. Think more deeply and widely.

Uploaded by jennifer_docs on 3/6/2026

/76

100%