thesis /
06-Chp-IPD-Analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445---
bibliography: bibliography/references.bib
csl: bibliography/nature.csl
output:
bookdown::pdf_document2:
template: templates/brief_template.tex
bookdown::word_document2:
toc: false
toc_depth: 3
reference_docx: templates/word-styles-reference-01.docx
number_sections: false
bookdown::html_document2: default
documentclass: book
---
```{block type='savequote', include=knitr::is_latex_output(), quote_author='(ref:intro-quote)', echo = TRUE}
These particular systematic reviews [individual participant data reviews] remain yardsticks against which the quality of other reviews continues to be judged.
```
(ref:intro-quote) --- Ian Chalmers, 1993[@chalmers1993]
# Individual participant data meta-analysis of blood lipid levels and dementia outcomes {#ipd-heading}
\minitoc <!-- this will include a mini table of contents-->
```{r, echo = FALSE, warning=FALSE, message=FALSE}
source("R/helper.R")
knitr::read_chunk("R/06-Code-IPD-Analysis.R")
doc_type <- knitr::opts_knit$get('rmarkdown.pandoc.to') # Info on knitting format
```
```{r cohortNumbers, message=FALSE,echo = FALSE}
```
::: {.laybox data-latex=""}
## Lay Summary {-}
This chapter examines the raw data pertaining to participants in previously published, relevant studies, a method of analysis called Individual Participant Data (IPD) meta-analysis. It uses this data to investigate the relationship between lipid levels and dementia risk, which is unique from the previous chapter because that chapter used published results rather than individual-level data.
I applied for access to `r n_applied$total` unique data sources for this analysis, however, only a small proportion of these data sources (`r fmt_applied(n_accessed$total)`) provided the requested data.
The resulting analysis of these data sources did not suggest a relationship between any blood lipid and any dementia outcome. The sole exception was an increased risk of vascular dementia in those with higher triglyceride levels. In addition, the participants age and sex did not appear to influence the relationship of lipid fractions and dementia outcomes.
The reasons for the low response rate to requests for data are explored in this chapter. Finally, I suggest future studies to investigate how data access rates could be improved.
:::
<!----------------------------------------------------------------------------->
## Introduction
Individual participant data (IPD) meta-analyses are considered to be the gold standard form of evidence synthesis, allowing for the application of a common selection model and analytical approach across all identified cohorts.[@riley2010] They are particularly useful when investigating the impact of participant-level characteristics, something that is not possible with aggregate data unless the results are stratified by the characteristic of interest.[@riley2010;@thompson2005] Knowledge of which groups a treatment will benefit most (or harm least) is a core aim of the move towards personalised medicine.[@riley2020;@hingorani2013]
IPD analyses also offer a mechanism by which previously unanalysed datasets can be incorporated into an analysis, thus expanding the evidence base for a particular research question.
Previous work has suggested a difference in the effect of lipids on dementia risk based on participant age and sex.[@ancelin2013; @mielke2010] The systematic review presented in Chapters \@ref(sys-rev-methods-heading) & \@ref(sys-rev-results-heading) could not investigate this effect because the number of included studies in the lipid fraction meta-analyses was small, due to both the relatively small number of studies reporting on lipids and the poor reporting of summary statistics of participant characteristics in those that did. Additionally, best practice guidance recommends against basing the decision to perform an IPD meta-analysis on between-study heterogeneity in a summary data meta-analysis, because similar distributions of participant covariates across studies may mask a true effect.[@riley2020]
As such, the aims of this analysis are two-fold. Firstly, I plan to perform an IPD meta-analysis across identified cohorts to examine the impact of participant age-at-measurement and sex on the relationship between lipids and dementia risk. Secondly, I aim to expand the evidence base for the effect of lipids on dementia outcomes by obtaining estimates from previously unanalysed cohorts available via the Dementia Platform UK, a large consortium of dementia cohorts.
<!----------------------------------------------------------------------------->
## Methods
### Eligibility criteria
#### Study design
Eligible data sources for this analysis were prospective cohort studies (see Section \@ref(ipd-apply-access) for details on how these studies were identified). Data sources which were cross-sectional, either by design or due to the available data (e.g., a study recorded data on participants at multiple time-points or "waves", but only data from a single wave could be accessed) were excluded. Similarly, due to the time and cost restraints within the scope of my thesis project, studies making use of population-level electronic health records were ineligible. These studies are problematic in that they often require extensive project proposals in order to gain access to the data.
No restrictions were put on the number of participants or the length of follow-up, though it was a requirement that participants were dementia free (or assumed to be dementia free, based on age at entry) at baseline.
<!----------------------------------------------------------------------->
#### Exposures/outcome definition
I considered four blood lipid fractions as part of this analysis, namely: total cholesterol (TC), low-density lipoprotein cholesterol (LDL-c), high-density lipoprotein cholesterol (HDL-c) and triglycerides (TG). Cohorts were eligible for inclusion if they contained data on at least one of these lipid fractions, recorded as a continuous variable (i.e., studies with binary "hypercholesterolemia" exposure would be excluded).
In line with the analyses presented in other chapters, eligible cohorts were those containing data on at least one outcome of interest, namely: all-cause dementia, Alzheimer's disease, or vascular dementia.
<!----------------------------------------------------------------------->
### Applying for data access {#ipd-apply-access}
Potentially eligible data sources were identified via two approaches, each with a distinct focus. The approaches are described in detail in the following sections. For both approaches, the number of cohorts responding to the request for data access and, where applicable, the reasons given for a refusal were recorded.
<!----------------------------------------------------------------------->
#### Cohorts identified by the systematic review
The first approach focused on previously analysed observational prospective cohort studies examining the effect of blood lipid levels on dementia outcomes, as identified by the systematic review presented in Chapters \@ref(sys-rev-methods-heading) & \@ref(sys-rev-results-heading). The data sources used in each of these cohort analyses were screened against the criteria listed in the previous section, and eligible cohorts were approached for data access. In the first instance, the first/corresponding author of the publication was emailed in Autumn 2020. If this approach did not elicit a response within two months, the last author was contacted, on the basis that the first/corresponding author may have been a more junior member of the research group who had moved to a different institution.
<!----------------------------------------------------------------------->
#### Cohorts contained in Dementia Platform UK
The second approach focused on incorporating relevant, previously unanalysed data into the analysis, thus providing additional evidence on the relationship between blood lipids and dementia risk. This was achieved through the Dementia Platform UK (DPUK), a collaborative grouping of existing dementia cohorts established by the Medical Research Council which works with data owners to make their data readily accessible for secondary analysis.[@bauermeister2020] It provides access to 42 cohorts with over 3 million participants, and makes use of a central streamlined application process for all cohorts, with the intent of making it easier to access data from existing data sources.
Cohorts included in the DPUK were assessed against the eligibility criteria, and in Autumn 2020, an application for access to a subset of 17 cohorts was made via the common-access procedure.
<!----------------------------------------------------------------------->
### Primary analysis
#### Data cleaning and harmonisation
Where data on one of the exposures (TC, LDL-c, HDL-c, TG) was missing, it was inferred from the other three fractions (where available) using the Friedwald formula [as detailed in]{.correction} Equation \@ref(eq:total-cholesterol-formula). Lipid levels were retained as continuous variables rather than dichotomising into a binary hypercholesterolemia exposure, given the additional statistical power this adds to the meta-analysis.[@riley2020;@ensor2018] Where lipid measurements were reported in _mg/dL_, these were converted to _mmol/L_.
Across all cohorts, data cleaning was performed in a similar manner, standardizing to commonly named variables so that a single model could be applied using functional programming.[@wickham2016func] The advantage of this approach is that it reduces the likelihood of errors in model mis-specification if changes are required in variables names from cohort to cohort. Following data cleaning, summary statistics for each data source were calculated and compared with publicly available statistics to ensure no errors were introduced in the data cleaning process, in line with best practice.[@levis2021]
<!----------------------------------------------------------------------->
#### Covariate definition
A range of additional variables were included in the analysis, intended to address the potential for confounding. With an awareness that discrepancies are common in the set of available covariates across cohorts included in an IPD analysis, I defined an idealised set of covariate domains to be age, sex, education, BMI, _Apo_$\mathcal{E}4$ status, smoking/alcohol status, ethnicity, and prevalent diabetes or cardiovascular disease. This set covers key risk factors for dementia/Alzheimer's disease in addition to general cardiovascular risk factors. Details on how these variables were coded, given the available data, are presented in Section \@ref(ipd-covar-definition).
<!-- TODO Need to go through and make sure that I have updated the covariate list appropriately -->
<!----------------------------------------------------------------------->
#### Missing data
Missing data in this analysis was classified as either relating to missing values (a data on a variable of interest was available in a cohort, but some values were missing) or missing variables (a cohort did not collect data on a variable of interest). Variables with missing values were identified, and 20 imputed datasets were created.[@sterne2009] Imputation was performed using MICE (Multiple Imputation by Chained Equations) and was implemented in R using the `mice` package in R.[@Van_Buuren2011-nc]
Missing variables were originally intended to be addressed using a previously published method.[@fibrinogenstudiescollaboration2009] Here, the correlation between the fully-adjusted and partially-adjusted estimate in cohorts with the full set of covariates is used to estimate the fully-adjusted effect in those cohorts missing covariates. However, this method requires several large cohorts to contain the full set of covariates, a condition this analysis failed to meet given the very low response rate. As such, two other common approaches were employed.[@fibrinogenstudiescollaboration2009] In the primary analysis, all cohorts were adjusted for the set of common covariates across cohorts (Model 1: age, sex, smoking, alcohol, education, diabetes). As a sensitivity analysis, cohorts with the full complement of covariates (Whitehall II and EPIC) were analysed using a maximally-adjusted model (Model 2: Model 1, further adjusted for ethnicity, prevalent ischemic heart disease and BMI). Results between the common-set-adjusted (Model 1) and maximally-adjusted (Model 2) models were then compared.
<!----------------------------------------------------------------------->
#### IPD analysis
In terms of the analytic approach taken, a two-stage IPD analysis was used. Under a two-stage approach, estimates for the effect of each lipid fraction on incident dementia were first calculated for each data source. More specifically, a logistic regression model adjusted for relevant covariates (as detailed in the above section) was used to quantify the effect of a 1 _mmol/L_ increase in each lipid fraction on each dementia outcome.
Results were expressed as odds ratios (OR). Examination of the data available via the DPUK indicated that the common time-to-event approach used in studies of dementia outcomes would be precluded by the absence of detailed time-to-event data. An overall effect estimate was then produced by combining data-source-specific estimates in a random-effects meta-analysis (see Section \@ref(meta-analysis-methods) for a broader discussion of meta-analysis methods).
A two-stage IPD approach was employed for a number of reasons. Firstly, and most importantly, a two-stage approach allows for siloed data, enabling researchers who are unwilling/unable to provide their data to obtain the effect estimates themselves, following a specified analysis plan, and share these with the IPD team.[@riley2010] Secondly, it allows for the production of forest plots of within-cohort estimates, something which is useful for the triangulation exercise in Chapter \@ref(tri-heading). Finally, a two-stage approach is simpler to model because it uses standard well-documented summary effect estimate meta-analysis techniques, and automatically accounts for methodological issues such as clustering within cohorts[@abo-zaid2013] and the potential for ecological bias.[@burke2017]
<!-- REVIEW You need to justify why RE and FE for main and interaction terms -->
<!----------------------------------------------------------------------->
#### Investigating the effect of participant-level covariates
In order to investigate the interaction of participant-level characteristics (age and sex) with lipid levels, lipid-covariate interaction terms were extracted and synthesised using a fixed effects meta-analysis.[@fisher2017] All interaction analyses were performed using the common-set-adjusted model (Model 1). To avoid ecological bias, where the between-study association does not reflect the within-study associations, cohorts where there was no within-study variation in the covariate of interest were excluded from the interaction analysis for that covariate.[@burke2017] As an example, it is impossible to estimate the impact of sex on the lipid/dementia relationship in a study which contains only female participants.
<!-- TODO Explain why a fixed effect model was used. -->
<!----------------------------------------------------------------------->
## Results
```{r ipdSummStat, include=F}
```
```{r prepIPDFigures,include=F}
```
### Data access
Of the `r n_applied$total` studies to which I applied for data access, only three (`r paste0(comma((n_accessed$total/n_applied$total)*100),"%")`) were included in the final analysis. Figure \@ref(fig:cohortFlowchart) details whether the cohorts eventually included in the review were identified by the systematic review or via the DPUK portal. In addition, the reasons for cohorts being from the analysis are presented, stratified by application approach.
In summary, the requests for data from cohorts identified by the systematic review were characterised by a very low response rate (N = 5, 25%). For the five cohorts that did respond, common reasons given by authors for not sharing the data included that they: no longer worked with the same group and did not know if the data was available or how to obtain it; no longer had access to the data; or were currently performing, or intended to perform, a similar analysis as the one proposed.
With respect to the application to DPUK cohorts, where a dedicated project manager liaises with data owners on the applicants' behalf, the overall response rate was higher. However, even using this streamlined approach, a positive response was obtained for approximately half (N =9, 53%) of the approached cohorts.
<!----------------------------------------------------------------------->
```{r cohortFlowchartSetup, include = FALSE}
```
(ref:cohortFlowchart-cap) __Flowchart detailing sources of data for the IPD analysis__ - The number of cohorts approached and the reasons for exclusion are stratified by identification method (systematic review vs DPUK).
(ref:cohortFlowchart-scap) Flowchart detailing sources of data for the IPD analysis
```{r cohortFlowchart, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:cohortFlowchart-cap)', out.width='100%', fig.scap='(ref:cohortFlowchart-scap)'}
knitr::include_graphics(file.path("figures/ipd/cohortFlowchart.png"))
```
<!----------------------------------------------------------------------->
As highlighted in Figure \@ref(fig:cohortFlowchart), there was little overlap between cohorts identified by the systematic review and those contained in the DPUK (`r n_applied$DPUK_sysrev`), indicating that the DPUK is a useful source of unanalysed data with respect to this question. A single cohort (Whitehall II) was identified by the systematic review and was also present in the DPUK.
Nine cohorts (8 DPUK only, 1 Systematic review + DPUK) responded positively to requests for data access. However, on inspection of data provided, six of these cohorts were excluded and the reason for exclusion in each case is shown in Table \@ref(tab:dataExcluded-table). In summary, two cohorts contained were memory-complaint cohorts, containing participants unlikely to be free of dementia at baseline (BRACE, MEMENTO). Two cohorts, despite being designed as multi-wave studies, only shared data related to a single wave and so represented cross-sectional data (Generation Scotland, NICOLA). Finally TRACK HD is a genetic cohort where cohort owners advised me that dementia will inevitably develop in mid-life driven by the Huntington's (HTT gene) mutation and that this effect would likely far outweigh the effect of lipids. Finally, the ELSA study was excluded based on the dichotomous definition of the exposure variable.
<!----------------------------------------------------------------------->
(ref:dataExcluded-caption) __Exclusion reasons for cohorts providing data__ - Number of cohorts approach and the reasons for a lack of data access, stratified by whether the cohort was identified by the systematic review or via the DPUK.
(ref:dataExcluded-scaption) Exclusion reasons for cohorts providing data
```{r dataExcluded-table, message=FALSE, results="asis", echo = FALSE}
```
<!----------------------------------------------------------------------->
### Included data sources
The three data sources used in this analysis are described in detail in the following sections. Of note, all included data sources were based in the United Kingdom. This is due to the majority of included datasets being identified via the Dementia Platform UK route (Figure \@ref(fig:cohortFlowchart)), which as implied by the name, has a narrow geographical focus.[@bauermeister2020]
<!----------------------------------------------------------------------->
#### Caerphilly Prospective Study
The Caerphilly Prospective Study (CaPS) is a longitudinal study of men in South Wales, UK.[@zotero-15398;@elwood2013] Blood lipids (TC, LDL-c, HDL-c and TG) were measured at baseline in 1979-1983, and from Phase III (1989-1993) onwards, a battery of cognitive tests was introduced. Dementia outcomes were determined using data obtained during Phase V (2002-2004), giving between 19-25 years of follow-up. The cohort contains data on dementia outcomes sub-classified as vascular and non-vascular dementia. Data was available for all covariates of interest except for ethnicity, BMI, prevalent IHD and _Apo_$\mathcal{E}4$ status. While the CaPS study collected data on height/weight (from which BMI can be calculated as $\frac{weight}{height^2}$) and prevalent heart disease, these were not available from the DPUK version of the CaPS data.
<!----------------------------------------------------------------------->
#### Epic Norfolk
The European Prospective Investigation of Cancer (EPIC) - Norfolk is a population-based cohort, containing men and women recruited from 35 general practices in Norfolk between 1993 and 1998.[@riboli1997; @riboli2002] Dementia was ascertained at the 5th Health Check-up (2016-2018), providing between 18-25 years of follow-up. The added evidential value of the EPIC cohort is small, given the fact that the data obtained contains only 8 dementia events. The cohort contains no information on dementia subtype, while all covariates of interest bar _Apo_$\mathcal{E}4$ were available.
<!----------------------------------------------------------------------->
#### Whitehall II
The Whitehall II study is a prospective cohort study of men and women recruited between 1985 and 1989 from the civil service in London.[@marmot2005] Lipid measurements were available from the third wave of the study, conducted between 1991-1993. The cohort is linked with the Hospital Episode Statistics (HES) database, a database containing details of participant events at NHS hospitals in England. which was used to capture dementia outcomes.[@zotero-15403] HES data was available up to March 2015, providing between 22-24 years of follow-up. Information was available on dementia type, which was classified as all-cause dementia, Alzheimer's disease or vascular dementia. This data source contained details on all covariates of interest except for _Apo_$\mathcal{E}4$ status.
Of note, the Whitehall II cohort was analysed in one of the included studies identified by the systematic review presented in Chapters \@ref(sys-rev-methods-heading) & \@ref(sys-rev-results-heading),[@tynkkynen2018] meaning that a comparison between the published result and the analysis reported here was possible.
<!----------------------------------------------------------------------->
### Covariate definition & missing data {#ipd-covar-definition}
Based on available data across cohorts, smoking was classified as never/ever/current, while alcohol consumption was classified as never/ever. Ever use in this case refers to participants who do not currently smoke/drink alcohol but did so in the past. Education was categorised into 4 levels (None, O-levels, A-levels, Degree). BMI was treated as a continuous variable while presence of vascular co-morbidities was treated as dichotomous.
A key consideration in the definition of covariates across cohorts was the classification of age. The Whitehall II study, the largest cohort to which I had access, only shared age data in five-year age bands (e.g., 40-44, 45-49, etc.). To ensure comparability across the cohorts, I created identical categories in the CaPS and EPIC data. This grouped age variable was then used in all subsequent analyses.
Missing values in collected variables was common across the cohorts. A matrix of covariates for each included cohort, describing both missing variables and the proportion of missing values within collected variables, can be seen in Appendix \@ref(appendix-ipd-covariate-matrix). The Whitehall II and EPIC cohorts contained data on all but one covariates of interest, while three missing covariates were identified in the CaPS cohort, namely education, prevalent ischemic heart disease, BMI and ethnicity. No included cohort provided information on the _Apo_$\mathcal{E}4$ status of participants, and so this variable could not be adjusted for in the analysis.
<!----------------------------------------------------------------------->
### Analytical results
#### Descriptive statistics
Across the three cohorts, `r n_total_ipd` participants were included in this analysis (Whitehall II = 8208, EPIC = 1115, CaPS = 2512). All cohorts contained data on the four lipid fractions of interest (or sufficient data from which to calculate them) and on all-cause dementia outcomes. The only other dementia outcome examined across cohorts was vascular dementia, which was reported in the CaPS and Whitehall II studies. The definitions of dementia outcomes used across cohorts can be seen in Appendix \@ref(appendix-ipd-dementia-def). Cumulatively, there were `r n_ipd_dementia` cases of all-cause dementia, with `r n_ipd_vasdem` further classified as vascular dementia. Summary statistics for each cohort are provided in Table \@ref(tab:covariateSummary-table).
<!----------------------------------------------------------------------------->
(ref:covariateSummary-caption) __Characteristics of IPD cohorts__ - Summary of characteristics for cohorts included in the IPD analysis. Variables not available from a cohort are denoted by "-". See Appendix \@ref(appendix-ipd-covariate-matrix) for details on the proportion of missing data within collected variables.
(ref:covariateSummary-scaption) Summary of characteristics of IPD cohorts
```{r covariateSummary-table, message=FALSE, results="asis", echo = FALSE}
```
<!----------------------------------------------------------------------->
<!-- TODO Note here about how the IHD variable in CAPS has a note saying there might be an issue with the variables, while the Whitehall study uses narrow range of codes to define CHD. EPIC is self-reported angina/mi/arrythmia -->
#### Main effects
The results from the main effect analysis across the varying lipid fractions on each dementia outcome considered can be seen in Figures \@ref(fig:mainEffectDem) & \@ref(fig:mainEffectVad), respectively. There was weak evidence for an association of any lipid level with either all-cause or vascular dementia, with the exception of a harmful association between raised triglycerides and vascular dementia (`r ipd_vasdem`, Figure \@ref(fig:mainEffectVad)). For the sole cohort containing data on the Alzheimer's disease outcome (Whitehall II), there was weak evidence for an association of this outcome with any lipid fraction (results shown in comparison with previous analysis of this cohort in Figure \@ref(fig:whitehallComparisonAd))
<!----------------------------------------------------------------------------->
(ref:mainEffectDem-cap) __IPD meta-analysis of all-cause dementia__ - Using Model 1 (adjusted for age, sex, smoking, alcohol, education and diabetes), an IPD random-effects meta-analysis was applied to investigate the association of a 1-SD increase in each lipid fraction with all-cause dementia outcomes.
(ref:mainEffectDem-scap) IPD meta-analysis of all-cause dementia
```{r mainEffectDem, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:mainEffectDem-cap)', out.width='100%', fig.scap='(ref:mainEffectDem-scap)'}
knitr::include_graphics(file.path("figures/ipd/main_Dementia.png"))
```
<!----------------------------------------------------------------------------->
<!----------------------------------------------------------------------------->
(ref:mainEffectVad-cap) __IPD meta-analysis of vascular dementia__ - Using Model 1 (adjusted for age, sex, smoking, alcohol, education and diabetes), an IPD random-effects meta-analysis was applied to investigate the association of a 1-SD increase in each lipid fraction with vascular dementia outcomes. Note that the vascular dementia outcome was only available in the CaPS and Whitehall II cohorts (see Table \@ref(tab:covariateSummary-table)).
(ref:mainEffectVad-scap) IPD meta-analysis of vascular dementia
```{r mainEffectVad, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:mainEffectVad-cap)', out.width='100%', fig.scap='(ref:mainEffectVad-scap)'}
knitr::include_graphics(file.path("figures/ipd/main_vasdem.png"))
```
<!----------------------------------------------------------------------------->
Estimates from the common-set-adjusted model (Model 1: adjusted for age, sex, smoking, alcohol, education and diabetes) were comparable to the fully-adjusted model (Model 2: Model 1 further adjusted for ethnicity, prevalent ischemic heart disease, and BMI) for the effect of lipids on all-cause dementia in cohorts reporting all covariates of interest (Figure \@ref(fig:ipdModelComparison)).
<!----------------------------------------------------------------------------->
(ref:ipdModelComparison-cap) __Comparison of partially- and maximally-adjusted model__ - Results from the common-set-adjusted (Model 1) and fully-adjusted (Model 2) analyses for the all-cause dementia outcome were compared for the two cohorts containing a full set of covariates (EPIC, Whitehall II).
(ref:ipdModelComparison-scap) Comparison of partially and maximally adjusted results
```{r ipdModelComparison, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:ipdModelComparison-cap)', out.width='100%', fig.scap='(ref:ipdModelComparison-scap)'}
knitr::include_graphics(file.path("figures/ipd/main_model_comparison.png"))
```
<!----------------------------------------------------------------------------->
<!----------------------------------------------------------------------->
#### Interaction effects
Given the minimal effect of further adjustment in the maximally-adjusted model, the common-set-adjusted model (Model 1) was used to perform the interaction analyses.
The maximally-adjusted model (Model 2) could have been employed in the analysis of the effect of sex, as both EPIC and Whitehall II had a full complement of covariates (CaPS contains only men and so was excluded from this analysis). However, I decided to employ the same underlying model across the interaction analyses to aid comparability between the age and sex interaction estimates.
For all-cause dementia, there was no evidence of an interaction of lipid levels with either age group or sex (Figures \@ref(fig:interactionDementiaAge) & \@ref(fig:interactionDementiaSex)).
<!----------------------------------------------------------------------------->
(ref:interactionDementiaAge-cap) __Meta-analysis of age-lipid interaction terms for all-cause dementia__ - Age was grouped into 5-year age bands, and the effect estimates presented represent the OR per 1-step increase in age group. Estimates were obtained using the common-set-adjusted model (Model 1: age, sex, smoking, alcohol, education, diabetes).
(ref:interactionDementiaAge-scap) Meta-analysis of age-lipid interaction terms for all-cause dementia
```{r interactionDementiaAge, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:interactionDementiaAge-cap)', out.width='100%', fig.scap='(ref:interactionDementiaAge-scap)'}
knitr::include_graphics(file.path("figures/ipd/interaction_age_dementia.png"))
```
<!----------------------------------------------------------------------------->
<!----------------------------------------------------------------------------->
(ref:interactionDementiaSex-cap) __Meta-analysis of sex-lipid interaction terms for all-cause dementia__ - Estimates were obtained using the common-set-adjusted model (Model 1: age, sex, smoking, alcohol, education, diabetes) and refer to the effect of male gender on the lipid/all-cause dementia association. The CaPS cohort was excluded from this analysis because it contains only a single sex and therefore provides no information on the lipid-sex interaction.
(ref:interactionDementiaSex-scap) Meta-analysis of sex-lipid interaction terms for all-cause dementia
```{r interactionDementiaSex, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:interactionDementiaSex-cap)', out.width='100%', fig.scap='(ref:interactionDementiaSex-scap)'}
knitr::include_graphics(file.path("figures/ipd/interaction_sex_dementia.png"))
```
<!----------------------------------------------------------------------------->
<!-- TODO Using fixed effect meta-analyses, there is now an effect of male gender on LDL/all-cause dementia relationship. Need to discuss if sticking with fixed -->
For the consideration of vascular dementia, I was only able to explore the effect of age, as only the CaPS and Whitehall cohorts contained details on vascular dementia as an outcome. As discussed above, the CaPS data contained a single sex which meant it was excluded from the exposure-sex analysis, leaving Whitehall II as the sole eligible study.
<!----------------------------------------------------------------------------->
(ref:interactionVascularAge-cap) __Meta-analysis of age-lipid interaction terms for vascular dementia__ - Age was grouped into 5-year age bands, and the effect estimates presented represent the OR per 1-step increase in age group. Estimates were obtained using the common-set-adjusted model (Model 1: age, sex, smoking, alcohol, education, diabetes).
(ref:interactionVascularAge-scap) Meta-analysis of age-lipid interaction terms for vascular dementia
```{r interactionVascularAge, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:interactionVascularAge-cap)', out.width='100%', fig.scap='(ref:interactionVascularAge-scap)'}
knitr::include_graphics(file.path("figures/ipd/interaction_age_vasdem.png"))
```
<!----------------------------------------------------------------------------->
<!----------------------------------------------------------------------->
## Discussion
### Summary of findings
This analysis requested data from `r n_applied$total` data sources, but only obtained data from three, all of which were based in the United Kingdom. No evidence for an effect of lipids on the risk of dementia or related outcomes was identified, except for a harmful association of raised triglycerides with risk of vascular dementia. Similarly, there was weak evidence for an interaction of the effect of lipid levels on dementia outcomes with participants' age (grouped into 5-year bands) or sex.
A detailed comparison of the findings presented above with the existing evidence base identified by the systematic review (Chapters \@ref(sys-rev-methods-heading) & \@ref(sys-rev-results-heading)), is presented as part of the triangulation exercise in Chapter \@ref(tri-heading). As such, this discussion will not provide a detailed comparison of the results of this analysis with other published literature, except to compare between this analysis and previously published results using the same data source.
<!----------------------------------------------------------------------->
### Limitations
#### Low response rate to request for data
The obvious key limitation of this analysis is the very low response rate to requests for data access, which may bias the results if there are systematic differences in the association of lipids with dementia outcomes between cohorts that share data and those that do not.[@ahmed2012] Whether or not to press ahead with an IPD analysis in the absence of all (or even most) data is a personal decision, and some previous analyses have highlighted where they decided not to pursue an IPD analysis.[@jaspers2014] For the purposes of this thesis, the decision was made to conduct the IPD analysis because it provided training in application of IPD methods in addition to providing new evidence that will be incorporated into the triangulation exercise detailed in Chapter \@ref(tri-heading).
A low response rate is not unexpected, given that a review of IPD studies published between 1987 and 2015 found that fewer than half managed to obtain data from greater than 80% of studies, and that in many cases, the exact percentage of studies for which data was obtained was not accurately reported.[@nevitt2017] However, it is assumed that the ~10% response rate encountered in this analysis is at the lower end of the scale.
There are many likely reasons for this low response rate to requests for data access. In general terms, there are several well-documented barriers that prevent data from being made readily available, including concerns regarding participant privacy, fear of "scooping" or "parasitic" behaviour, and a lack of trust between primary and secondary researchers.[@vanpanhuis2014] More specific to this analysis, IPD meta-analysis including studies other than randomised controlled trials have less success in obtaining IPD from previous published studies.[@nevitt2017] Additionally, while no evidence is available on whether the characteristics of the researcher who is requesting data access influences the response rate and eventual decision, there is the possibility that my position as a PhD student meant I was less likely to elicit a positive response than a well-known senior academic might. The timing of the requests for data access, coinciding with a global pandemic, may also have affected the response rate as researchers prioritised COVID-related work.
Finally, in investigating the potential reasons for the low response rate, I discovered that the method of contact used (email) has been shown to be less successful in eliciting responses from authors when compared with telephoning (which was not used in this analysis).[@danko2019] One of the reasons for this may be that the email addresses reported on publications are more likely to be out of date for older publications. Anecdotally, post-hoc investigation of a subset of cohorts revealed that several corresponding/first authors were no longer at the same institution as when the study was reported, and as a result, were unlikely to have access to the institutional email address listed on the study publication. Despite attempts to track authors as they move between institutions, out-of-date contact details may have contributed to the low response rate.
The obstacles to data access described above are in theory what the DPUK was built to address. However, even with the help of the streamlined application process afforded by the DPUK, accessing sufficient data was a challenge. The response rate among DPUK cohorts a year after application was just 50%. In addition, some cohorts responded saying that the proposed study question was already under investigation by another group, and that they would not share the data on this basis. In light of this, a centralised database of ongoing analyses being performed using DPUK data would be of enormous help, reducing research waste and providing opportunities for collaboration. Finally, the DPUK process would be aided by a clearer distinction between those cohorts that are "DPUK native" (i.e., where a copy of the data is already held on DPUK servers) versus externally hosted, seeing as the response time for externally hosted cohorts is likely to be much longer.
<!----------------------------------------------------------------------->
#### Uncontrolled confounding
A key limitation of this analysis is the potential for uncontrolled confounding. Across all cohorts, adjustment for _Apo_$\mathcal{E}4$ was not possible as I did not have access to genetic data on participants. _Apo_$\mathcal{E}4$ is a strong risk factor for both increased LDL-c levels and Alzheimer's disease,[@bennet2007;@safieh2019] and failing to adjust for this factor means residual confounding in the LDL-c fraction results is likely.
More generally, systematically missing variables required a trade-off between inclusion of cohort data and appropriate control for confounding. The final choice of the common-set-adjusted model for use in the interaction analyses means there is the potential for residual confounding. However, sensitivity analysis comparing the common-set-adjusted (Model 1) and fully-adjusted (Model 2) models in cohorts with a full complement of covariates indicated that further adjustment for BMI, ethnicity, education and prevalent IHD had minimal impact on the effect estimates. Of note, this is comparable with the analysis of the CPRD data presented in the previous chapter, where adjustment for variables beyond age and sex had a limited impact on the observed effect estimates (see Section \@ref(cprd-impact-additional-covar)).
<!----------------------------------------------------------------------->
#### [Regression dilution bias]{.correction}
[A related issue is regression dilution bias, which occurs when random measurement error in the exposure variable biases the obtained effect estimates towards the null.]{.correction}[@macmahon1990; @hutcheon2010] [In the case of this analysis, the use of a single measurement to define lipid levels may introduce random measurement error into the analysis. As a result, the observed relationship between baseline lipid levels and dementia outcomes could be underestimated. In future research, this bias could be addressed through the averaging of multiple measurements over a given time period to more accurately define the baseline exposure.]{.correction}[@hutcheon2010]
<!----------------------------------------------------------------------->
#### Comparison with a previous analysis
For the single cohort (Whitehall II), a previously published analysis was available. Tynkkynen _et al._[@tynkkynen2018] analysed the association of blood lipids and risk of all-cause dementia and Alzheimer's disease across several cohorts, including the Whitehall II cohort. In an attempt to validate my approach, I compared the results from the maximally-adjusted model (Model 2) in this analysis with those reported by Tynkkynen _et al._ for both all-cause dementia and Alzheimer's disease.
The results for the Alzheimer's disease outcome were comparable (Figures \@ref(fig:whitehallComparisonAd))). However, for the vascular dementia outcome (Figure \@ref(fig:whitehallComparisonDementia)), I identified a discrepancy in the association of triglycerides with all-cause dementia estimated by the two analyses. Tynkkynen _et al._[@tynkkynen2018] found a protective effect of triglycerides on this outcome (`r estimate(0.69,0.56,0.85,"HR")`) while this analysis found evidence for a harmful effect (`r estimate(1.26,1.12,1.41,"OR")`).
<!----------------------------------------------------------------------------->
(ref:whitehallComparisonDementia-cap) __Comparison of two analyses of the Whitehall II cohort (all-cause dementia)__ - A published analysis by Tynkkynen _et al._[@tynkkynen2018] had previously examined the association of lipid fractions and all-cause dementia in the Whitehall II cohort. The results of the two analyses are shown above and were broadly comparable, except for the triglyceride fraction. Potential reasons for this discrepancy are discussed in the main text.
(ref:whitehallComparisonDementia-scap) Comparison of two analyses of the Whitehall II cohort (all-cause dementia)
```{r whitehallComparisonDementia, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:whitehallComparisonDementia-cap)', out.width='100%', fig.scap='(ref:whitehallComparisonDementia-scap)'}
knitr::include_graphics(file.path("figures/ipd/whitehall_comparison_dementia.png"))
```
<!----------------------------------------------------------------------------->
<!----------------------------------------------------------------------->
<!----------------------------------------------------------------------------->
(ref:whitehallComparisonAd-cap) __Comparison of two analyses of the Whitehall II cohort (Alzheimer's disease)__ - A published analysis by Tynkkynen _et al._[@tynkkynen2018] had previously examined the association of lipid fractions and Alzheimer's disease in the Whitehall II cohort. The results of the two analyses are shown above and were broadly comparable.
(ref:whitehallComparisonAd-scap) Comparison of two analyses of the Whitehall II cohort (Alzheimer's disease)
```{r whitehallComparisonAd, echo = FALSE, results="asis", fig.pos = "H", fig.cap='(ref:whitehallComparisonAd-cap)', out.width='100%', fig.scap='(ref:whitehallComparisonAd-scap)'}
knitr::include_graphics(file.path("figures/ipd/whitehall_comparison_ad.png"))
```
<!----------------------------------------------------------------------------->
Investigation revealed several possible reasons for this discrepancy. The first is that while this analysis uses multiple imputation to address missing data, the Tynkkynen _et al._ analysis does not describe how missing data was handled. However, the reduced precision in the reported estimates suggest a complete case analysis. This interpretation is supported by a comparison of summary statistics, which illustrated that my analysis had substantially more dementia events (N =287 in this analysis vs. n=114 in Tynkkynen _et al._), suggesting that incomplete records were discarded in the previous analysis. [A further potential cause of the discrepancy is the use of different effect metrics in the two analyses (OR in this analyses, hazard ratios (HR) in Tynkkynen _et al._), though when the outcome is rare, the two measures are expected to be similar (see Section]{.correction} \@ref(sys-rev-analysis-overview) [).]{.correction} Finally, the covariates adjusted for in each analysis were similar, though the previous analysis had access to genetic data allowing it to adjust for _Apo_$\mathcal{E}4$ status. However, given that _Apo_$\mathcal{E}4$ is a risk factor for increased LDL-c rather than triglycerides,[@bennet2007] it seems unlikely that additional adjustment for this variable is responsible for the discrepancy in the findings observed.
<!----------------------------------------------------------------------->
### Strengths
While this analysis did not manage to obtain and analyse a large proportion of identified data, a central strength of this analysis is the use of a systematic approach to identify and attempt to contact relevant cohorts. Furthermore, it also enabled incorporation of two previously unanalysed datasets - the CaPS and EPIC Norfolk cohorts - thus providing additional evidence that is used in the triangulation exercise reported in Chapter \@ref(tri-heading).
A further strength is the ability of this analysis to investigate the effect of participant characteristics (namely age and sex) on the association of blood lipids and dementia outcomes. To the best of my knowledge, no previous review has examined the interaction of these factors with the observed associations.
Finally, this analysis provides new evidence on a previously unexplored outcome (vascular dementia), adding to the extremely small evidence base for this outcome identified by the systematic review presented in Chapters \@ref(sys-rev-methods-heading) & \@ref(sys-rev-results-heading). This is relevant even in the previously analysed Whitehall II cohort because the previous analysis by Tynkkynen _et al._ did not report on this outcome.
<!----------------------------------------------------------------------->
### Reflections on the process
In hindsight, attempting to undertake a large-scale IPD meta-analysis as part of a larger PhD project may have been overly ambitious. Data harmonization between cohorts in an IPD analysis is an often under-appreciated challenge,[@levis2021] and in line with this, data cleaning for this analysis took substantially longer than expected. While the cohort response rate was substantially lower than expected, given the time and resources required for the cleaning and harmonisation of data from just three cohorts, a situation in which all 37 cohorts responded positively would have been logistically challenging within the scope of a PhD.
<!----------------------------------------------------------------------->
### Future work
While it is tempting to suggest that an IPD analysis of lipid levels be reattempted, without empirically guided approaches to increase the response rate, this may just result in a similarly small set of studies as described here. In line with the limitations considered earlier in this chapter, future methodological work could formally consider the effect of requester characteristics (sex, location, career stage) on response rates in IPD analyses.
Additionally, the production of detailed guidance for handling cases where covariates are systematically missing would represent a useful contribution to the topic. Much of the literature around IPD analysis is focused on the synthesis of RCTs, where additional covariate information is needed primarily for the assessment of treatment-covariate interactions rather than the adjustment of the effect estimate for confounding. Given the wide availability of non-randomised cohorts, improved guidance on this challenge would support future work.
Finally, movement towards increased use of unique and persistent researcher identifiers, such as those offered by the ORCID programme,[@nature2009] would help with contact issues. Researchers move institutions regularly as their contracts come to an end, and so the institutional contact details provided on publications are frequently out of date.
<!----------------------------------------------------------------------->
## Summary
- In this chapter, I performed an IPD meta-analysis to investigate the effect of blood lipid levels on the risk of incident dementia. There was a very low response rate to requests for data access, resulting in the analysis of three relevant cohorts.
- I found weak evidence for an association of any lipid with either all-cause or vascular dementia, except for an increased risk of vascular dementia associated with raised triglycerides. Similarly, there was weak evidence that the association of blood lipids and dementia outcomes varied by participant age or sex.
- I discussed potential reasons for the low response rate and explored other limitations of this analysis. I highlighted the contribution of this work to the wider topic via the analysis of previously unexplored cohorts (CaPS & EPIC) and outcomes (vascular dementia). Finally, I recommended that future research could formally investigate the impact of the characteristics of the requesting researcher on data access rates.
- The new evidence produced by this analysis will be incorporated, along with evidence identified or produced in the previous chapters, into the triangulation analysis presented in the following chapter.