📦 mcguinlu / thesis

📄 05-Chp-CPRD-Analysis.Rmd · 774 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774---
bibliography: bibliography/references.bib
csl: bibliography/nature.csl
output:
  bookdown::pdf_document2:
    template: templates/brief_template.tex
    pandoc_args: !expr rmdfiltr::add_wordcount_filter()  
  bookdown::word_document2: 
      toc: false
      toc_depth: 3
      reference_docx: templates/word-styles-reference-01.docx
      number_sections: false 
  bookdown::html_document2: default
documentclass: book
---

```{block type='savequote', include=knitr::is_latex_output(), quote_author='(ref:cprdquote)', echo = TRUE}
When dealing with human beings controlled experiments frequently prove to be impracticable, so for a scientific basis for our assumptions we turn to past history to reconstruct the suspected causal chain of events - and then our statistical troubles may begin.
```
(ref:cprdquote) --- Harold F. Dorn, 1953 [@dorn1953]

# Primary analysis of lipid-regulating agents and dementia outcomes in the CPRD {#cprd-analysis-heading}

&nbsp;

\minitoc <!-- this will include a mini table of contents-->

```{r, echo = FALSE, warning=FALSE, message=FALSE}
source("R/helper.R")
knitr::read_chunk("R/05-Code-CPRD-Analysis.R")
doc_type <- knitr::opts_knit$get('rmarkdown.pandoc.to') # Info on knitting format
```

```{r load-files, echo = FALSE}
```

<!--------------------------------------------------------------------------->
::: {.laybox data-latex=""}
## Lay summary {-}

Electronic health record (EHR) databases are large collections of medical data, used to manage patient administration and care. Under these systems, whenever a patient attends their GP, their clinical data is recorded in a central database using a standardised coding system. These databases have several advantages over traditional methods of data collection, including the number of people they contain and the relatively low cost of data collection. This is particularly important when studying diseases such as dementia, which may begin to develop in patients long before symptoms are seen. 

&nbsp;

This analysis makes use of the Clinical Practice Research Datalink (CPRD), which contains the electronic medical records of more than 3 million people from general practices across England. Using this data, the analysis presented in this chapter examined whether lipid-regulating agents (treatments which change lipid levels) affect the risk of all-cause dementia and related outcomes (Alzheimer's disease, vascular dementia and other dementias).

&nbsp;

[No]{.correction} evidence for an effect of lipid-regulating agents on the risk of Alzheimer's disease was found, with the exception of a slightly increased risk in those prescribed a certain type of lipid-regulating agent called fibrates. In contrast, I found an increased risk of vascular and other (i.e., non-Alzheimer's) dementia was associated with lipid-regulating agent use. 

&nbsp;

This increased risk in outcomes with a vascular element (e.g., vascular dementia) is unexpected and is very likely to be due to the presence of bias in my analysis. This bias, called "confounding by indication", is caused when those who are prescribed a statin are more at risk of vascular dementia for a range of reasons, which makes it appear as if statins are harmful. Despite this limitation, the analysis provides an important source of information which will be used in later chapters.

:::

&nbsp;<!----------------------------------------------------------------------->  

## Introduction

In this chapter, I present an analysis of a large population-based electronic health record (EHR) dataset to investigate the relationship between lipid-regulating agent (LRA) use and dementia outcomes. The analysis aims to address two important limitations of the current evidence base as identified by the systematic review presented in Chapters \@ref(sys-rev-methods-heading) & \@ref(sys-rev-results-heading).

Firstly, it explicitly examines vascular dementia as an outcome. The systematic review presented in the previous chapter identified an evidence gap around the effect of lipid-regulating agents on the risk of vascular dementia. As triangulation exercises require as many diverse sources of evidence as possible, this analysis provides an additional source of information on this outcome. 

Secondly, and in a similar vein, the analysis intentionally takes a different analytical approach to that most commonly used to examine the effect of statins on dementia as identified by the systematic review. Specifically, this involved a concerted effort to address immortal time bias through use of a Cox proportional hazards analysis, incorporating a time-varying treatment indicator.[@suissa2008] By employing this approach, the analysis provides an evidence source at risk of a distinct bias, making it useful to the triangulation exercise presented in Chapter \@ref(tri-heading).

This chapter represents an extended version of a preprinted manuscript, a copy of which is available in Appendix \@ref(published-papers).

<!--------------------------------------------------------------------------->

&nbsp;

## Methods

### Study protocol

An *a priori* protocol for this study was published,[@walker2016] and amendments to this are recorded in Appendix \@ref(appendix-cprd-amendments).

<!--------------------------------------------------------------------------->

&nbsp;

### Data source {#cprd-data-source}

Previously known as the General Practice Research Database, the Clinical Practice Research Datalink (CPRD) is a large population-based EHR database.[@herrett2015] The database has been collecting primary care data from participating practices across England since 1987.[@williams2012; @wood2001revitalizing] It contains the records of more than 10 million primary care patients in England and is broadly representative of the UK population in terms of age, sex and ethnicity.[@herrett2015; @mathur2014]

To avoid the ambiguity of interpreting free-text clinical notes and to allow for easy analysis of the resulting data, the CPRD primarily collects data using a predefined coding system known as Read codes.[@booth1994] All clinical events, including  test results and diagnoses, can be identified by a specific Read code. The codes use a nested approach (see Table \@ref(tab:readExample-table)), with the initial characters defining broad diagnostic topics (e.g., Eu... - Mental and behavioural disorders), while subsequent characters provide additional information on the specific condition diagnosed (e.g., Eu001 - Dementia in Alzheimer's disease with late onset).

&nbsp;

<!--------------------------------------------------------------------------->
(ref:readExample-caption) __Example of CPRD Read code hierarchy__ - Broad topics are specified using the initial two alpha-numeric characters of the Read code, while subsequent characters are used to define specific conditions and context. The example shown illustrates how "Dementia in Alzheimer’s disease with late onset" (_Eu001_) is nested under the top-level of "Mental and behavioural disorders" (_Eu..._). 

(ref:readExample-scaption) Example of CPRD Read code hierarchy

```{r readExample-table, message=FALSE, results="asis", echo = FALSE}
```
<!--------------------------------------------------------------------------->

&nbsp;

The index events, exposures and outcomes used in this analysis were identified using predetermined code lists, which are available for inspection from the repository accompanying this analysis (data/code availability is discussed in Section \@ref(cprd-data-avail)).

<!-- TODO Make sure it is cited properly -->


&nbsp;

### Cohort definition

This analysis included all patients registered at a participating practice between 1 January 1995 and 29 February 2016 who had a flag for "research quality" data (as defined by the CPRD). Records pre-dating the 1995 cut-off were provided as part of the original CPRD extract obtained for this analysis. However, these older records were excluded from the analysis because data quality and reliability are thought to be higher after this date.[@wolf2019] Additionally, individuals with less than 12 months of continuous records prior to cohort entry were excluded, making the effective start date of the cohort 1 January 1996. 

Participants were included in the study cohort if their record contained any of the following index events:

* a Read code for a diagnosis of hypercholesterolemia or related condition; 
* a Read code for prescription of a lipid-regulating agent (such as statins); 
* a total cholesterol test result of >4 mmol/L; or
* an LDL-c test result of >2 mmol/L. 

The blood lipid cut-offs were based on NIHR-recommended levels at the time the protocol was written. These index events allowed me to define a population of participants who were either at risk of hypercholesterolemia, as indicated by the elevated total or LDL cholesterol test results, or had already been diagnosed with it, as indicated by a diagnostic code/related prescription.

The index date for a participant was defined as the date where the first relevant code or test result was recorded on their clinical record, and participants were followed up until the earliest of:

* an outcome of interest;
* death; 
* end of follow-up (29 February 2016);
* last registration date with their GP practice; or
* the last CPRD collection date for their practice.

Participants were ineligible for the cohort if they were less than 40 years of age (because these patients are less likely to be prescribed a LRA), had less than 12 months of "research quality" data, were simultaneously prescribed more than one lipid-regulating agent (due to the difficultly of assigning these to a single exposure group), or were diagnosed with an outcome of interest before or on the date of the index event (i.e., had less than one full day of follow-up). 

<!--------------------------------------------------------------------------->

&nbsp;

### Exposures

I considered seven lipid-regulating drug classes based on groupings in the British National Formulary,[@wishart2017] namely: statins, fibrates, bile acid sequestrants, ezetimibe, nicotinic acid groups, ezetimibe and statin (representing one treatment containing both drugs, rather than the two classes being prescribed concurrently), and omega-3 fatty acid groups.  
 
A participant's drug class was assigned based on their first recorded prescription, and any drug switching was ignored in an effort to mimic an intention-to-treat approach. I did however examine how often the initial drug class altered according to one of three criteria:

- __stopped__: defined as the last prescription of the primary class being followed by at least six months of observation; 
- __added__: defined as a second drug class being prescribed before the last prescription of the initial class; and 
- __switched__: defined as a second drug class being prescribed after the last prescription of the initial class.

<!--------------------------------------------------------------------------->

&nbsp;

### Outcomes {#cprd-outcomes}

I considered five outcomes as part of this analysis: probable  Alzheimer’s disease, possible Alzheimer’s disease, vascular dementia, other dementias, and a composite all-cause dementia outcome. When two or more outcomes were coded in a participant’s clinical record, a decision tree was used to differentiate between them (Figure \@ref(fig:decisionTreeFig)). The diagnosis date of the outcome was determined by the first record of a relevant code.

<!----------------------------------------------------------------------------->
(ref:decisionTreeFig-cap) __Decision tree for assigning dementia subtypes__ - Based on the presence of specific Read codes in the patient's record, a decision tree was used to classify dementia subtypes ("Probable" or "Possible" Alzheimer's disease, vascular dementia, and other dementia). Note that an outcome of "probable" or "possible" Alzheimer's disease (AD) requires the absence of any vascular outcome codes.

(ref:decisionTreeFig-scap) Decision tree for assigning dementia subtypes

```{r decisionTree, include = FALSE}
```

```{r decisionTreeFig, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:decisionTreeFig-cap)', out.width='100%', fig.scap='(ref:decisionTreeFig-scap)'}
knitr::include_graphics(file.path("figures","cprd-analysis","decision_tree.png"))
```
<!----------------------------------------------------------------------------->

&nbsp;

### Covariates

A range of additional variables were included in the analysis. These covariates were adjusted for in the analysis in an attempt to address the different distributions of potential confounding factors between those who were prescribed a lipid-regulating agent and those who were not. These are discussed in detail below and summarised in Table \@ref(tab:covariateDef-table).

&nbsp;

<!----------------------------------------------------------------------------->
(ref:covariateDef-caption) __Definitions of covariates adjusted for in the analysis__ - The code lists used to define the majority of these covariates were originally created for use in a previously published analysis.[@walker2020] while others were built on or adapted from previous published work.[@khan2010; @taylor2016; @wright2017]

(ref:covariateDef-scaption) Definitions of covariates adjusted for in the analysis

(ref:covariateDef-cell1) Charlson index implemented using Read code lists.[@khan2010] Code lists based on those by Taylor et al.[@taylor2016]

(ref:covariateDef-cell2) Most recent of recorded value (current, former or never) or Read code indicating a recorded value. Code lists based on those by Wright et al.[@wright2017]

```{r covariateDef-table, message=FALSE, results="asis", echo = FALSE}
```
<!----------------------------------------------------------------------------->

&nbsp;

Demographic covariates adjusted for included age and gender. Age in years was calculated at date of entry into the cohort, using the 1st of January of a patient's birth year (the exact date of birth was not provided by CPRD), and was adjusted for via its use as the time axis for the Cox model (see Section \@ref(cprd-time-axis)). Socioeconomic status was proxied using the Index of Multiple Deprivation (IMD) 2010. The IMD draws on seven domains (income; employment; education, skills and training; health and disability; crime; barriers to housing and services; living environment) to create an overall deprivation score for each of 32844 statistical geography areas in England. To help preserve patient privacy, IMD score is only available from the CPRD in twentiles, with 1 indicating the least deprived and 20 indicating the most deprived. Smoking and alcohol use was determined at index, and usage was categorised as current, former, or never. 

Body mass index (a summary measure calculated as $\frac{weight}{height^2}$), baseline total cholesterol and baseline LDL cholesterol measures were obtained, using the last recorded value prior to the index date. A variable indicating grouped year of entry into the cohort (<=2000, 2001-2005, 2006-2010, >2010) was included to allow for changes in prescribing trends across the lifetime of the cohort. To assess healthcare utilisation, I adjusted for the average annual number of consultations between the beginning of a patient's data and their entry into the cohort.

Finally, presence of a range of related conditions at baseline were accounted for, including cardiovascular disease, coronary bypass surgery, coronary artery disease, peripheral arterial disease, hypertension, chronic kidney disease, and Type 1 and Type 2 diabetes. In addition to adjusting for these covariates individually, a Charlson co-morbidity index (CCI) score was calculated for each participant. The CCI is a weighted index that uses presence and severity of a number of conditions to enable adjustment for the general health of a participant in terms of their mortality risk.[@charlson1987new] The conditions considered under the index are: AIDS; cancer; cerebrovascular disease; chronic pulmonary disease; congestive heart disease; dementia; diabetes; diabetes with complications; hemiplegia; metastatic tumour; mild liver disease; moderate liver disease; myocardial infarction; peptic ulcer disease; peripheral vascular disease; renal disease; and rheumatological disease. Inclusion of this index allowed me to adjust for the general health of patients included in the analysis.

Code lists for all covariates can be found in the archived data repository accompanying this analysis (see Section \@ref(cprd-data-avail)).

&nbsp; 

<!--------------------------------------------------------------------------->

### Missing data 

Missing data are a recognised problem in electronic health records databases.[@wells2013strategies] These databases are created from administrative data, collected primarily for the purposes of patient management and care rather than academic research.

In this analysis, missing data was handled using a multiple imputation approach.[@sterne2009] Variables with missing observations were identified for inclusion in the imputation model. Nominal variables with missing values were modelled using multinomial logistic regression, while continuous variables were modelled using linear regression. As per best practice, all variables used in the analytic model, including the outcome, were included in the imputation model.[@moons2006] Using the MICE (Multiple Imputation by Chained Equations) command in STATA 16,[@statacorp2019] 20 imputed datasets were created.

Missing data was only considered problematic for variables where a numerical test result was expected (e.g., BMI), or where a code existed for the absence of the condition (e.g., categorical smoking status). This approach was necessary because absence of a code for other treatments or conditions (e.g., statin use or dementia) was assumed to indicate absence of the treatment/condition rather than being considered missing.[@wells2013strategies]

&nbsp;
<!--------------------------------------------------------------------------->

### Estimation methods

A Cox proportional hazards (PR) model was used to estimate the effect of statins on dementia outcomes. Cox PR models are defined in general terms as:


\begin{equation}
  h(t) = h_o(t) \times exp(b_1x_1 + b_2x_2 + ... +b_px_p)
  (\#eq:cox-model)
\end{equation}

where:

* $t$ is the survival time;
* $h(t)$ is the hazard function;
* $x_1,x_2,...,x_p$ are the covariates which determine the hazard function, while $b_1,b_2,...,b_p$ are the coefficients for each covariate; and
* $h_o(t)$ is the baseline hazard - when all $x_i$ are zero, the $exp()$ function resolves to 1.

A Cox PR model was chosen for this analysis as it inherently accounts for the length of time participants spend in each exposure group. Using this approach, time-at-risk can be properly attributed to the appropriate exposure group, thus mitigating the impact of immortal time bias. This is discussed in detail in the following section.

<!----------------------------------------------------------------------------->
&nbsp;

### Immortal time bias and time-varying treatment indicators {#cprd-immortal-time-bias}

Immortal time bias describes two distinct but related types of bias, considered here in relation to statin use (Figure \@ref(fig:immortalTimeBias)). The first presentation, the selection bias aspect (Panel A), occurs when time prior to statin initiation is excluded leading to the statin and control groups being followed up from different time points.[@levesque2010] The unexposed group are followed from the date of an index event (e.g., diagnosis of hypercholesterolemia), while the statin group is followed from date of a statin initiation. In this scenario, the time between the index event and statin initiation is missing, and any events that occur in the exposed group prior to the prescription will be inappropriately excluded from the analysis.

The second presentation of immortal time bias is as a type of misclassification bias (Panel B, Figure \@ref(fig:immortalTimeBias)). It occurs when the exposure time and events prior to statin initiation are inappropriately assigned to the statin group.

&nbsp;<!----------------------------------------------------------------------->  

(ref:immortalTimeBias-cap) __Immortal time bias__ - The two presentations of immortal time bias are illustrated. Panel A shows the selection bias presentation, where time prior to a statin exposure is excluded from the analysis and outcomes in this time period are lost. Panel B shows the misclassification presentation, there time (and events) prior to statin exposure is incorrectly assigned to the exposed group.

(ref:immortalTimeBias-scap) Immortal time bias

```{r, immortalTime, include = FALSE}
```

```{r immortalTimeBias, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:immortalTimeBias-cap)', out.width='100%', fig.scap='(ref:immortalTimeBias-scap)'}
knitr::include_graphics(file.path("figures","cprd-analysis","immortal_time.png"))
```

&nbsp;<!----------------------------------------------------------------------->  

This second presentation appears to be common in the existing literature on the relationship of statins and dementia. Several of the studies included in the systematic review performed as part of this thesis were identified as being at risk of immortal time bias following formal risk of bias assessment using the ROBINS-I tool (see Section \@ref(risk-of-bias-res)).

Following a recommended approach to address the second form of immortal time bias, I employed a time-varying treatment indicator to correctly allocate time-at-risk to the exposed and unexposed groups.[@levesque2010] Under this approach, all patients are followed from a common index date, defined as earliest of:

* date of raised cholesterol test results; 
* date of hypercholesterolemia diagnosis; or 
* date of LRA prescription. 

Patients start in the unexposed group and contribute time-at-risk until they are prescribed a lipid-regulating agent, at which point they move into the exposed group. Note, patients for whom prescription of a lipid-regulating agent was the index event only contribute time to the exposed group (i.e., they enter the cohort and move into the exposed group on the same day). 

&nbsp;<!----------------------------------------------------------------------->  

### Time axis {#cprd-time-axis}

As part of a Cox proportional hazard model, there is the option to use either absolute time in cohort or participants' age as the time scale of interest.[@lamarca1998; @gail2009; @pencina2007] A model using age as the time axis inherently accounts, or adjusts, for participants age as a potential confounder of the exposure-outcome relationship. As such, the analyses presented all used age as the time axis.

&nbsp;
<!--------------------------------------------------------------------------->

### Sensitivity analyses

The primary analysis examined the effect of a lipid-regulating agent on dementia risk, stratified by outcome and drug class. To assess the robustness of the results, a number of sensitivity analyses were performed. These are described in the following sections.

&nbsp;

#### Complete case vs imputed data

Using multiple imputation to handle missing data is an alternative to a "complete case" approach,[@pigott2001] where participants missing any covariate are dropped from the dataset. As a recommended sensitivity analysis,[@hughes2019] I performed and compared the results of both methods, to investigate the impact of multiple imputation on the results.

&nbsp;

#### Control outcomes

In addition to the primary outcomes of interest (described in Section \@ref(cprd-outcomes)), I extracted data on three additional control outcomes. The inclusion of control outcomes in observational analyses are a useful technique to assess the strength of uncontrolled confounding,[@lipsitch2010] and these outcomes are usually classed as either "negative" or "positive" outcomes. 

Negative outcomes are defined as those without a likely causal path between the exposure and outcome (see Figure \@ref(fig:negativeOutcome) for a directed acyclic graph, or DAG, describing an ideal negative outcome). Conversely, positive control outcomes are those with a known causal association with the exposure of interest, ideally sourced from large well conducted randomised controlled trials. Positive control outcomes are also useful in observational epidemiology because if the analysis can reproduce a known result for the control outcome, confidence in the result for the outcome of interest is increased. Due to the wealth of data available on statins as a lipid-regulating agent, I chose three control outcomes in reference to this drug class: back pain (negative control), ischemic heart disease (positive protective control), and Type 2 diabetes (positive harmful control).

&nbsp;<!----------------------------------------------------------------------->  
(ref:negativeOutcome-cap) __Causal diagram for an ideal negative outcome__ - The directed acyclic graph shows the relationship between the exposure $X$, outcome $Y$, confounders (measured $C$ and unmeasured $U$) and an ideal negative outcome $N$. Note the absence of any arrow between $X$ and $N$. In this scenario, any association observed between $X$ and $N$ is due to the presence of uncontrolled confounders $U$ (assuming $C$ has been adjusted for).

(ref:negativeOutcome-scap) Causal diagram for ideal negative outcome

```{r negativeOutcome, echo = FALSE, results="asis", fig.pos = "H", fig.align='center', fig.cap='(ref:negativeOutcome-cap)', out.width='80%', fig.scap='(ref:negativeOutcome-scap)', message=FALSE, warning = FALSE}
library(dagitty)
library(ggdag)

ggplot2::ggsave(filename = file.path("figures",
                                     "cprd-analysis",
                                     "negativeOutcome.png"),
  dagitty::dagitty("dag {
    U -> N
    C -> N
    X <- U -> Y
    X <- C -> Y
    X -> Y
    bb =\"-2.368,-2.972,2.597,2.726\"
    C [pos=\"-0.3,0\"]
    N [pos=\"0.5,1\"]
    U [pos=\"-0.15,1.5\"]
    X [exposure,pos=\"0,-1\"]
    Y [outcome,pos=\"0.5,-1\"]
  }"
) %>% ggdag_classic() +
  theme_dag()              
                
,
height = 3
)

knitr::include_graphics(file.path("figures","cprd-analysis","negativeOutcome.png"))

```
&nbsp;<!----------------------------------------------------------------------->  

Despite observational analyses suggesting a link between statins and muscular pain (as opposed to more serious complications such as myopathy),[@selva-ocallaghan2018] an effect was not observed in either systematic reviews of adverse events of statin use[@collins2016] or N-of-1 trials explicitly exploring the association of statin use with muscle pain.[@herrett2021] Based on this evidence, I used muscular backpain as a negative control outcome in this analysis. Under this approach, if statin use is found to be associated with muscular backpain in this analysis, this suggests the presence of residual confounding and reduces my confidence in the results for the dementia outcomes.

Similarly, incident ischemic heart disease and Type 2 diabetes were included as protective and harmful positive control outcome, respectively. The protective effect of lipid-lowering via statins on the risk of ischemic heart disease is well-established, [@collins2016] while there is growing evidence for an increased risk of Type 2 diabetes with statin use.[@collins2016; @macedo2014; @smit2020] Similar to the negative outcome, if the analysis strategy can reproduce these known associations, this will provide evidence that potential confounders have been sufficiently adjusted for.

&nbsp;<!----------------------------------------------------------------------->  

#### Impact of additional covariates

To observe the effect of adjusting for additional covariates, I ran two additional models unadjusted except for: (a) age; and (b) age and gender. The results of these models were then compared the results with the fully adjusted model.

&nbsp;<!----------------------------------------------------------------------->  

#### Sensitivity cohorts

Two sensitivity cohorts were also created. The first stratified by year of entry into the cohort in an attempt to assess for time period effects. The second removed participants who may have been pregnant (coded as under 55) to assess the robustness of the estimates, as statins are contraindicated in pregnancy.[@karalis2016]

&nbsp;

#### Statin properties

As detailed in the introduction, the properties of statins may be important due to the ability of lipophilic statins to cross the blood brain barrier (see Section \@ref(intro-statins)).[@sierra2011] As such, I expected that any effects of statins on dementia outcomes would be stronger in the lipophilic versus the hydrophilic statin subgroup. To investigate this, I further stratified the statin exposure group into lipophilic (Atorvastatin, Lovastatin, Simvastatin, Cerivastatin) and hydrophilic (Pravastatin, Rosuvastatin, Fluvastatin) statins. 

&nbsp;<!----------------------------------------------------------------------->  

#### Impact of using different code lists for defining dementia outcomes

To explore the impact of the code lists used to define dementia outcomes, I created alternative Alzheimer's disease and non-Alzheimer's dementia outcomes using code lists from a published study by Smeeth _et al_.[@smeeth2009] The intended purpose of this analysis was to assess the robustness of my results to the choice of code list.

This published analysis used a propensity matching approach to estimate the association of statins with a range of outcomes in The Health Improvement Network database, an alternative source of English electronic health records which has substantial overlap with the CPRD.[@carbonari2015] The code lists used in this analysis were obtained through correspondence with the authors of that study, and are available for inspection (see Section \@ref(cprd-data-avail)).

&nbsp;<!----------------------------------------------------------------------->  

## Results

### Patient characteristics

```{r characteristics, echo = FALSE}
```

```{r cprdCharacteristics-setUp, echo = FALSE}
```

```{r attritionFigure, include = FALSE}
```

Of the `r comma(attrition[1,1])` participants included in the extract, `r comma(attrition[14,1])` met the inclusion criteria (Figure \@ref(fig:cprdFlowchart)), with a total follow-up of `r comma(total_followup)` patient years at risk. 

&nbsp;

<!--------------------------------------------------------------------------->
(ref:cprdFlowchart-cap) __Attrition of CPRD participants__ - Patients were excluded from the study cohort for several reasons, and the number of patients excluded as each eligibility criterion was applied is shown above. Note that the largest cause of attrition was the absence of an index event of interest.

(ref:cprdFlowchart-scap) Attrition of CPRD participants

<!-- This figure needs to have comma seperated numbers in order to agree with text. -->

```{r cprdFlowchart, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:cprdFlowchart-cap)', out.width='100%', fig.scap='(ref:cprdFlowchart-scap)'}
knitr::include_graphics(file.path("figures","cprd-analysis","cohort_attrition.png"))
```
<!--------------------------------------------------------------------------->

&nbsp;

The median participant age at index was `r age_text` and participants were followed up for a median of `r fu_text`. During follow-up, an all-cause dementia diagnosis was recorded for `r comma(p1[1,2])` patients  (`r comma(p1[2,2])` probable AD, `r comma(p1[3,2])` possible AD, `r comma(p1[4,2])` vascular dementia, `r comma(p1[5,2])` other dementias).

The number of events, time-at-risk and crude rates for each drug class, tabulated by dementia outcome, are shown in Table \@ref(tab:followUp-table). A substantial majority (`r percentage.statins`) of participants prescribed a lipid-regulating agent were prescribed a statin. I excluded the "Ezetimibe and statins" (N =`r eze_sta_n`) and "Nicotinic acid groups" (N = `r nag_n`) classes from subsequent class-based subgroup analyses based on the extremely small number of participants in these groups. Note that the "Ezetimibe and statins" treatment group represent those prescribed a single treatment containing both ezetimibe and statins, rather than those where the two treatments were prescribed concurrently.

\blandscape

<!----------------------------------------------------------------------------->
(ref:followUp-caption) __Crude rates, stratified by outcome and drug class of interest__ - The number of events, years at risk and crude rates per 100,000 participant-years-at-risk stratified by dementia outcome and drug class of interest are presented below.

(ref:followUp-scaption) Crude rates, stratified by outcome and drug class of interest

```{r followUp-table, message=FALSE, results="asis", echo = FALSE}
```

(ref:cprdCharacteristics-caption) __Patient characteristics by drug class__ - Summary statistics are presented as "% (N)" unless otherwise specified in the variable name.

(ref:cprdCharacteristics-scaption) Patient characteristics by drug class

```{r cprdCharacteristics-table, message=FALSE, results="asis", echo = FALSE}
```
<!--------------------------------------------------------------------------->
\elandscape

The distribution of baseline characteristics across the remaining seven drug classes can be seen in Table \@ref(tab:cprdCharacteristics-table). Note due to the experimental design, the median year of entry is expected to be later for those not prescribed an LRA. This is because the unexposed group is more likely to include those who entered into the cohort towards the end of study window, and so had less follow-up time in which to be prescribed an LRA.


The stopping, addition and switching of drug classes was common across all drug classes (Table \@ref(tab:cprdSSA-table)).

&nbsp;<!----------------------------------------------------------------------->  
(ref:cprdSSA-caption) __Summary of treatment change__ - The number of participants who stopped, switched or added treatments was calculated, stratified by initial LRA type.

(ref:cprdSSA-scaption) Summary of treatment change

```{r cprdSSA-table, message=FALSE, results="asis", echo = FALSE}
```
&nbsp;<!----------------------------------------------------------------------->  

### Missing data

```{r missing-data, echo = FALSE}
```

```{r azd-text, echo = FALSE}
```

Full covariate information was available for `r missingtext`. Six key variables had some missing data: IMD 2010 score was missing for `r imdtext` because it is only recorded for English practices; alcohol status was missing for `r alcoholtext`; smoking status was missing for `r smokingtext`; BMI, or a calculated BMI from height and weight measurements, was missing for `r bmitext`; baseline total cholesterol was missing for `r choltext`; and baseline LDL cholesterol was missing for `r ldltext`.

&nbsp;

### Primary analysis

The results of the primary analysis using the fully adjusted Cox proportional hazards model with participant age as the time scale are presented for each drug/outcome combination in Figure \@ref(fig:cprdPrimary).

For each outcome, the overall "Any drug class" estimate was driven by the statin subgroup, based on its large size relative to the other drug classes.

&nbsp;

<!--------------------------------------------------------------------------->
(ref:cprdPrimary-cap) __Results from primary analyses of CPRD data__ - Effect estimates obtained from the fully-adjusted time-varying model with participant age as the time scale, stratified by lipid-regulating agent and dementia outcome.

(ref:cprdPrimary-scap) Results from primary analyses of CPRD data    

```{r p1Figure, include = FALSE}
```

```{r cprdPrimary, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:cprdPrimary-cap)', out.width='100%', fig.scap='(ref:cprdPrimary-scap)'}
knitr::include_graphics(file.path("figures","cprd-analysis","forester_p1.png"))
```
<!--------------------------------------------------------------------------->

&nbsp;

#### Alzheimer's disease

My results show little evidence was found for an effect of lipid-regulating agents on probable (`r probad_text`) and possible (`r possad_text`) Alzheimer's disease when compared to no treatment, with the sole exception of fibrates on probable Alzheimer's disease (`r probad_fib_text`). 

&nbsp;

#### Non-Alzheimer's disease dementias

In contrast to the findings for Alzheimer's disease outcomes, an association between lipid-regulating agents and an increased risk of a subsequent diagnosis of vascular dementia (`r vasdem_text`) or other dementias (`r othdem_text`) was observed. Again this effect was driven mainly by the statin subgroup, but there was some evidence that ezetimibe was associated with an increased risk of vascular (`r vasdem_eze_text`) and other (`r othdem_eze_text`) dementia. 

&nbsp;

#### All-cause dementia

For the composite all-cause dementia outcome, I found treatment with a lipid-regulating agent was associated with a slightly increased risk (`r anydem_text`), which lies between the associations for the Alzheimer and non-Alzheimer dementia outcomes as would be expected. There was also some evidence that fibrates were associated with increased risk of all-cause dementia (`r anydem_fib_text`). 

&nbsp;

### Sensitivity analyses

The results of the sensitivity analyses are described in the following sections.

#### Complete case versus imputed data

In almost all cases, the use of imputed data resulted in a marginal attenuation of the effects observed when using a complete case analysis. It should be noted that due to the large amount of missing data (e.g., `r ldltext` were missing a baseline LDL cholesterol measure), the number of participants included in the complete case analysis was substantially smaller than that included when using imputed data. In this case, though the overall position of the effect estimates does not change substantially when using the imputed dataset, there is a noticeable gain in power.[@sterne2009] 

&nbsp;

<!----------------------------------------------------------------------------->
(ref:completeCaseFig-cap) __Sensitivity analysis: complete case vs. imputed data__ - Comparison of analyses using the complete case versus imputed cohorts, indicating broad agreement between the two approaches. Note that the analyses using imputed data gave more precise estimates due to the low proportion of patients with complete covariate data.

(ref:completeCaseFig-scap) Sensitivity analysis: complete case vs. imputed data

```{r completeCase, include= FALSE}
```

```{r completeCaseFig, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:completeCaseFig-cap)', out.width='100%', fig.scap='(ref:completeCaseFig-scap)'}
knitr::include_graphics(file.path("figures","cprd-analysis","forester_complete_case.png"))
```
<!----------------------------------------------------------------------------->

&nbsp;

#### Control outcomes

```{r controlOutcomesText, echo = FALSE, message = FALSE}
```

```{r controlOutcomes, echo = FALSE, message = FALSE}
```

The fully adjusted model was also used to estimate the effect of treatment with a statin on three control outcomes: back pain (negative), ischemic heart disease (positive protective) and Type 2 diabetes (positive harmful). The results of this analysis are presented in Figure \@ref(fig:controlOutcomeFig).

&nbsp;



<!----------------------------------------------------------------------------->
(ref:controlOutcomeFig-cap) __Sensitivity analysis: control outcomes__ -  Effect estimates obtained from the fully-adjusted time-varying model with participant age as the time scale for three control outcomes: backpain (negative), ischemic heart disease (positive protective), and diabetes (positive harmful).

(ref:controlOutcomeFig-scap) Sensitivity analysis: control outcomes

```{r controlOutcomeFig, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:controlOutcomeFig-cap)', out.width='100%', fig.scap='(ref:controlOutcomeFig-scap)'}
knitr::include_graphics(file.path("figures","cprd-analysis","forester_control_outcomes.png"))
```
<!----------------------------------------------------------------------------->

&nbsp;

For the negative control, there was some evidence that treatment with a statin was associated with an increased risk of back pain (`r backpain_text`), suggesting there may be some residual confounding. However, statin prescription was also associated with a substantially increased risk of ischemic heart disease (`r ihd_text`) and Type 2 diabetes (`r dm_type2_text`).

&nbsp;

#### Impact of additional covariates {#cprd-impact-additional-covar}

The results of three models adjusted for age only, age and sex, and full covariates respectively, are presented in Figure \@ref(fig:unadjustedComparisonFig). These models were used to estimate the impact of adjustment for additional covariates. Note that obtaining a completely unadjusted model is not possible because age was used in the Cox model as the time scale.

Adjustment for additional covariates beyond age and sex had a limited impact on the observed effect estimates, with the exception of the probable AD outcome. In this case, adjustment for the full set of covariates attenuated the protective effect observed when adjusting only for age and sex to the null.

&nbsp;

<!----------------------------------------------------------------------------->
(ref:unadjustedComparisonFig-cap) __Sensitivity analysis: covariate comparison__ - The results obtained using three different sets of covariates (age only, age + sex, all covariates) are shown for each dementia outcome. Note that the x-axis cutoffs (0.5,2) are different compared to other plots (0.3,3) to enable greater comparison between the different models.

(ref:unadjustedComparisonFig-scap) Sensitivity analysis: covariate comparison

```{r unadjustedComparison, include=FALSE}
```

```{r unadjustedComparisonFig, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:unadjustedComparisonFig-cap)', out.width='100%', fig.scap='(ref:unadjustedComparisonFig-scap)'}
knitr::include_graphics(file.path("figures","cprd-analysis","forester_unadjusted.png"))
```
&nbsp;<!----------------------------------------------------------------------->  

#### Sensitivity cohorts: Entry year

When stratifying based on year of entry to the cohort, I observed no variation in risk by time period in any subgroup except for probable Alzheimer’s disease (Figure \@ref(fig:diagnosisTypeFig)). 

&nbsp;

```{r cohortEntry, include=FALSE}
```

<!----------------------------------------------------------------------------->
(ref:diagnosisTypeFig-cap) __Sensitivity analysis: year of entry__ - Effect estimates, obtained from the fully-adjusted time-varying model with participant age as the time scale, for the association of the any LRA class with the probable AD outcome, stratified by grouped year of cohort entry.

(ref:diagnosisTypeFig-scap) Sensitivity analysis: year of entry


```{r diagnosisTypeFig, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:diagnosisTypeFig-cap)', out.width='100%', fig.scap='(ref:diagnosisTypeFig-scap)'}
knitr::include_graphics(file.path("figures/cprd-analysis/forester_cohort_entry.png"))
```
<!----------------------------------------------------------------------------->

&nbsp;

On the assumption that this variation could be caused by changes in the frequency of probable AD diagnoses in the cohort over time, I performed a _post-hoc_ investigation of the frequency of each dementia outcome by year of entry (Table \@ref(tab:diagnosisType-table)). While the frequency of outcomes declines in more recent strata, likely due to the limited follow-up inherent to these groups, this decline in frequency is relatively constant across the dementia subtypes.

&nbsp;

<!----------------------------------------------------------------------------->

(ref:diagnosisType-caption) __Dementia diagnoses by year of entry__ - As part of a _post-hoc_ investigation of the time period effect observed in the probable AD outcome, the frequency of dementia diagnoses by grouped year of cohort entry was calculated.

(ref:diagnosisType-scaption) Dementia diagnoses by year of entry

```{r diagnosisType-table, message=FALSE, results="asis", echo = FALSE}
```
<!----------------------------------------------------------------------------->

&nbsp;

#### Sensitivity cohorts: Pregnancy

In the second sensitivity cohort, removing patients who may have been pregnant (coded as aged 55 and under at index) from the analysis had minimal effect on the effect estimates (Figure \@ref(fig:pregnancyFig)).

&nbsp;<!----------------------------------------------------------------------->  

(ref:pregnancyFig-cap) __Sensitivity analysis: pregnancy cohort__ - Comparison of analysis using main cohort and a cohort with potentially pregnant participants (coded as any participant under 55 years of age) removed.

(ref:pregnancyFig-scap) Sensitivity analysis: pregnancy cohort

```{r pregnancy, include = FALSE}
```

```{r pregnancyFig, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:pregnancyFig-cap)', out.width='100%', fig.scap='(ref:pregnancyFig-scap)'}
knitr::include_graphics(file.path("figures","cprd-analysis","forester_pregnancy.png"))
```
&nbsp;<!----------------------------------------------------------------------->  

#### Statin properties

In the cohort, statins with lipophilic properties were much more frequently prescribed than hydrophilic statins (Table \@ref(tab:statinTypeTable-table)). Additionally, there is evidence for an increasing tendency to favour hydrophilic statins in recent years with the proportion of lipophilic statins prescribed falling from 18.2% in 1996-2000 to <1% in 2011-2016.

&nbsp;<!----------------------------------------------------------------------->  

(ref:statinTypeTable-caption) __Summary of statin properties__ - Number of patients prescribed lipophilicity vs hydrophilicity statins, by grouped year of prescription.

(ref:statinTypeTable-scaption) Summary of statin properties

```{r statinTypeTable-table, message=FALSE, results="asis", echo = FALSE}
```

&nbsp;<!----------------------------------------------------------------------->  

When stratifying by statin properties, hydrophilic statins were less harmful in the any, vascular and other dementias outcomes compared to lipophilic statins (Figure \@ref(fig:statinTypeFig)). Additionally, in the AD outcomes, hydrophilic statins were associated with a small reduction in risk, compared to the weak evidence for an effect for lipophilic statins.

&nbsp;

<!----------------------------------------------------------------------------->
(ref:statinTypeFig-cap) __Sensitivity analysis: statin properties__ - Effect estimates obtained from the fully-adjusted time-varying model with participant age as the time scale, stratified by statin properties (hydrophilic vs lipophilic). 

(ref:statinTypeFig-scap) Sensitivity analysis: statin properties

```{r statinType, include = FALSE}

```

```{r statinTypeFig, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:statinTypeFig-cap)', out.width='100%', fig.scap='(ref:statinTypeFig-scap)'}
knitr::include_graphics(file.path("figures","cprd-analysis","forester_sta_type.png"))
```
<!----------------------------------------------------------------------------->

&nbsp;

#### Impact of dementia code lists {#comparing-codelists}

```{r smeethText, include = FALSE}
```

When using the Smeeth _et al._ code lists to define dementia outcomes, hazard ratios of `r smeeth_azd_text` and `r smeeth_oth_text` were obtained for the Alzheimer's disease and non-Alzheimer's ("other") dementia outcomes, respectively. While direct mapping to the outcomes used in this analysis was not possible, the most comparable outcomes are "probable Alzheimer's disease" (`r probad_text`) and "other dementia" (`r othdem_text`).

&nbsp;<!----------------------------------------------------------------------->  

## Discussion

### Summary of findings

Lipid-regulating agents showed little evidence of an association with probable and possible Alzheimer's disease when compared to no treatment but were associated with increased risk of the all-cause dementia, vascular dementia and other dementias diagnoses. The estimate observed in each case was driven by the effects observed in the statin subgroup because a substantial majority of participants prescribed an LRA were prescribed a statin. For the other drug classes, no association was found with any outcome, with two exceptions. Ezetimibe was associated with increased risk of vascular and other dementias, while fibrates were associated with increased risk of all-cause dementia and probable Alzheimer's disease.

The effect estimates were robust to the exclusion of potentially pregnant participants, and for all outcomes except probable AD, no variation across grouped year of entry was observed. When looking at the statin subgroup alone, statin properties appeared to have a modifying effect, with hydrophilic statins being less harmful in the any, vascular and other dementias outcomes compared to lipophilic statins.

&nbsp;<!----------------------------------------------------------------------->

### Interpretation of results

This section will expand on a potential explanation for the observed results detailed above. However, as the comparison of evidence across different sources is the aim of the triangulation exercise presented in later chapters, a detailed comparison with other published literature will not be provided here, except where needed to illustrate a methodological point. For a detailed comparison of the results presented above with the existing evidence base identified by the systematic review, see Chapter \@ref(tri-heading).

&nbsp;<!----------------------------------------------------------------------->  

#### Confounding by indication {#cprd-confounding-by-ind}

A likely explanation for the observed increased risk of vascular and other dementias with lipid-regulating agent use is residual "confounding by indication", which represents an important limitation of this analysis. While the term has been used to describe different sources of bias in epidemiological analyses,[@salas1999] it is used here to described the role of risk factors that both prompt treatment and increase the risk of the outcome, thus causing a distorted positive association between the treatment and outcome (see Figure \@ref(fig:indicationBias)). In causal inference nomenclature, statins and dementia are said to be _d_-connected, as there is an open "backdoor" path between them via the uncontrolled confounders.[@suttorp2015] In the context of this analysis, this means a confounding variable (or, more likely, variables) both prompts prescription of statins and also represents a risk factor for the development of the vascular dementia. A similar confounding structure likely exists for ezetimibe, another hypercholesterolemia treatment, providing an explanation for the association of vascular/other dementia but not Alzheimer's disease with this drug.

&nbsp;<!----------------------------------------------------------------------->  
(ref:indicationBias-cap) __Confounding by indication causal diagram__ - Directed acyclic graph illustrating how confounding by indication could induce a positive association between statins and vascular dementia.

(ref:indicationBias-scap) Confounding by indication causal diagram

```{r dags, include=F}
```

```{r indicationBias, echo = FALSE, results="asis", fig.pos = "H",fig.align='center',fig.cap='(ref:indicationBias-cap)', out.width='80%', fig.scap='(ref:indicationBias-scap)', message = FALSE}

knitr::include_graphics(file.path("figures","cprd-analysis","indicationBias.png"))
```
&nbsp;<!----------------------------------------------------------------------->  

Conditioning entry into the study on being either “at-risk” or already diagnosed with hypercholesterolemia was employed in a pre-emptive attempt to mitigate confounding by indication, but evidence from the control outcomes suggests this was unsuccessful. The slight harmful effect observed for the backpain outcome is substantially smaller than that observed for the ischemic heart disease outcome, indicating that the majority of the uncontrolled confounding is likely related to vascular factors. The slight effect observed for the negative control of backpain could be due to incomplete control for socioeconomic status, as deprivation data was provided in twentiles to preserve patient privacy.[@boruzs2016; @ikeda2019]

In line with this, an increasingly harmful effect is observed when moving from the probable and possible Alzheimer's disease outcomes to the other dementia outcome, and finally to the vascular dementia outcomes. This pattern suggests that the strength of the residual confounding by indication increases as the proportion of cases with a vascular component in an outcome definition increases. Given confounding related to vascular factors, this pattern is also expected based on the decision tree for assigning outcomes in the presence of greater than one dementia code. Under this system, the Alzheimer's disease outcomes require a "pure" condition and the presence of any vascular or other dementias codes excludes participants from this group (Figure \@ref(fig:decisionTreeFig)).

A review of other available literature suggests that this observation (a harmful effect of lipid-regulating agents on vascular-related outcomes) is not unusual. Using a conventional epidemiological technique, a previous analysis also found an increased risk of coronary heart disease (analogous to the ischemic heart disease outcome used in this analysis) in those taking statins (`r estimate(1.31,1.04, 1.66, type = "HR")`).[@danaei2013] In that study, controlling for confounding by indication through the use of a trial emulation analysis gave an estimate of 0.89 (95%CI: 0.73-1.09), a more comparable though less conclusive estimate to that observed in RCTs of statin use (`r estimate(0.73,0.67,0.80,type="")`).[@taylor2013] 

Given the absence of vascular dementia in the published literature, as highlighted in the previous chapter, the unexpected increase in vascular dementia risk with statin use is particularly interesting. If previous research encountered similar methodological issues to this analysis, it is possible their results did not make it into the evidence base via a publication bias mechanism where unexpected or assumedly incorrect results are less likely to be submitted or accepted for publication.

&nbsp;<!----------------------------------------------------------------------->  

#### Statin properties

This analysis found that hydrophilic statins were less harmful in the any, vascular and other dementia outcomes compared to lipophilic statins, and were associated with a small reduction in the risk of the probable and possible AD outcomes. The increased precision of the estimates for lipophilic versus hydrophilic statins is expected, as the two most commonly prescribed statins are lipophilic (simvastatin and atorvastatin).[@newman2019]

A widely discussed concept in the literature surrounding statin use and cognitive outcomes is the fact that lipophilic statins are more likely to be able to cross the blood brain barrier, and so have a more potent protective effect by directly lowering brain cholesterol.[@shepardson2011] My findings that hydrophilic statins appear to be more protective/less harmful than their lipophilic counterparts runs counter to this assertion.

An initial interpretation of the different associations observed in the two groups was that the lipophilic statins may be more potent, and so are prescribed to patients with a higher underlying vascular load, leading to increased confounding by indication in this group. However, the statin with the strongest lipid lowering effect that is available via the NHS, rosuvastatin, is hydrophilic.

&nbsp;<!----------------------------------------------------------------------->  

#### Impact of code lists

As part of a sensitivity analysis exploring the impact of outcome code-lists, I used definitions for Alzheimer's disease and other dementias obtained from a previously published paper (Smeeth _et al._).[@smeeth2009] Using these lists in my analytical set-up, I found a harmful association of statin use with both outcomes.

This finding disagrees with the results of the original analysis [reported by Smeeth _et al_,]{.correction} [@smeeth2009] which found evidence for a protective effect of statin use on all-cause dementia (`r estimate(0.81, 0.69,0.96,type = "HR")`) and non-AD dementia (`r estimate(0.82,0.69, 0.97,type = "HR")`), but little evidence of an effect on AD (`r estimate(0.81,0.49, 1.35, type="HR")`).

However, comparison of the results obtained using the two sets of code lists was deemed less useful following a detailed comparison of the codes used. While all of the codes used to define Alzheimer’s in the Smeeth _et al._ paper are included in the probable Alzheimer’s code-list (see Figure \@ref(fig:smeethComparison)), I included several additional codes used to define this outcome (including, for example, “Eu00013: [X]AD disease type 2”). Additionally, several of the codes used to define “Possible Alzheimer’s” in this analysis are included in the “Other dementia” code list used by Smeeth. 

&nbsp;

<!----------------------------------------------------------------------------->
(ref:smeethComparison-cap) __Comparison of code lists__ - Sankey diagram comparing the codes used in this analysis with those used in the Smeeth _et al_ paper.[@smeeth2009] The outcomes and number of codes contributing to each are presented (the Smeeth _et al_ outcomes are on the right-hand side of the figure). The joining lines showing the overlap between the categories in the two analyses. 

(ref:smeethComparison-scap) Comparison of code lists

```{r sankeydiagram, include=FALSE}
```

```{r smeethComparison, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:smeethComparison-cap)', out.width='100%', fig.scap='(ref:smeethComparison-scap)'}
knitr::include_graphics(file.path("figures/cprd-analysis/sankey_diagram.png"))
```
<!----------------------------------------------------------------------------->

This analysis serves to illustrative the importance of the code lists chosen to define the outcomes of interest in EHR, particularly if they are used to define competing outcomes (e.g., AD vs non-AD dementia). The different codes used by Smeeth _et al._, in addition to an analytical approach that adjusted for covariates defined after the index date, may go some way to explaining why my analysis obtained different results despite the substantial overlap in the data sources used.

<!--------------------------------------------------------------------------->
&nbsp;

### Strengths and limitations {#cprd-limitations}

The primary strength of this analysis is the relative size of the CPRD. Having reviewed the existing literature, as identified by the systematic review element of this thesis, this analysis of `r comma(attrition[14,1])` participants is one of the largest available studies of this research question. Additionally, this analysis followed LRA users and non-users from a common index date, using a time-updating treatment indicator to correctly assign time-at-risk to the exposed and unexposed groups. This approach has been less commonly used in the literature and allows for the mitigation of potential immortal time bias. It is also one of few studies that provide evidence on the association of LRA use and vascular and other dementias. Finally, it used negative and positive controls to assess the potential for residual confounding.

However, the findings of this analysis are subject to several limitations in addition to the confounding by indication discussed above. There is a strong possibility of differential misclassification of dementia-related conditions based on the exposure.[@porta2014] As an illustrative example, those with memory complaints may be more likely to be classified as vascular dementia than Alzheimer's disease if their medical records contain prescriptions for lipid-regulating agents. Further, there is potential for general non-differential misclassification of the outcome due to the varying positive predictive value of electronic health record code lists to identify dementia cases.[@wilkinson2018; @mcguinness2019validity]

Misclassification of outcomes is not the only issue introduced by the use of EHR codes to define outcomes. Comparing and contrasting between different studies is particularly difficult because of the impact that the use of different code lists can have on the analysis. This problem is illustrated by the discrepancy between the results when using the code lists defined for this study and those used by Smeeth _et al_. This presents a particular challenge in comparing research across different time-periods and coding systems.

A further limitation stems from the possibility of uncontrolled confounding due to genetic factors. The number of _Apo_$\mathcal{E}4$ alleles represents the strongest genetic risk factor for Alzheimer's disease, but also substantially increases LDL cholesterol levels,[@bennet2007] potentially prompting treatment with a statin or other lipid regulating agent. I was unable to control for _Apo_$\mathcal{E}$ genotype in this analysis because I did not have access to genetic data on participants. As a result, any protective association between LRA use and the Alzheimer's disease outcomes may be masked by residual negative confounding by _Apo_$\mathcal{E}$. 

Finally, as with many studies of dementia, there is a risk of reverse causation in my analysis. Dementia and associated conditions have a long prodromal period, during which preclinical disease could cause indications for the prescription of a lipid-regulating agent. Enforcing a minimum period of follow-up would address this limitation, but the use of a time-varying treatment indicator in this analysis prevents the use of this approach (see Appendix \@ref(cprd-min-fu) for a fuller discussion of this topic).

<!--------------------------------------------------------------------------->
&nbsp;

### Enabling easy synthesis of this analysis {#cprd-data-avail}

The raw data supporting this analysis is not publicly available because access to the CPRD data is controlled by a data monitoring committee. However, when data are not readily available, sharing the analysis code and summary statistics represents a way for readers to validate the findings.[@goldacre2019]

In light of this and my own experiences in attempting to extract information for papers assessing preventative treatments, as documented in Section \@ref(sys-rev-open-data), the outputs from this analysis have been made readily available. All code, Read code lists and summary statistics (namely the tables presented in this chapter plus summary tables of effect estimates) can be downloaded in a machine readable format from the archived repository for this project ([https://github.com/mcguinlu/CPRD-LRA](https://github.com/mcguinlu/CPRD-LRA)). This open approach should enable easy inclusion of this analysis in future evidence synthesis exercises, allowing new work to build on that presented here.

<!--------------------------------------------------------------------------->
&nbsp;

## Summary

* In this chapter, I produced new evidence on the association of lipid-regulating agents with incidence of all-cause dementia, Alzheimer’s disease, vascular dementia, and other dementia. 

* I found [very weak]{.correction} evidence for an effect of lipid-regulating agents on probable or possible Alzheimer’s disease. However, lipid-regulating agent use was associated with an increased risk of all-cause, vascular and other dementias. In all cases, the estimated associations for the "any LRA" analyses were driven by those observed in the large statin subgroup.

* I attempted to account for important sources of bias through use of a time-varying treatment indicator. However, the control outcomes included in the analysis provided evidence for only partially controlled confounding by indication, likely related to vascular factors. Additionally, there was the potential for differential misclassification of dementia subtype on the basis of the exposure. Combined, these biases reduce the confidence in my findings, in particular the unexpected increase in risk of vascular dementia associated with statin use.

* Findings from this analysis are used as an additional source of evidence in the triangulation exercise presented in Chapter \@ref(tri-heading).