Acceptance and barriers pertaining to a decision support system for multiple clinical conditions
screen and requiring a user to actively open a recommendation. However, none of the
participants felt that the system fulfilled that promise, and the system was experienced
as interruptive despite our efforts to the contrary.
Topic Quote
Always on top “That was my biggest frustration, the [floating] bar on the left side.”
Always on top “...also because the thing ([CDSS]) kept getting in the way, just it being in the screen was interruptive...”
Integration with host “.. it has been said before but it would be great if the EHR remembers our actions [e.g. prescribing
system medication / ordering a test]. So that when you spend time thinking about your decision, that decision
actually ends up in our EHR.”
The third free text question, “What were the most important reasons for not opening the
recommendations?” was answered by more than 75% of GPs with an answer related to
“Lack of time”. The second most common answer was “Too many alerts”. Two responses
that were mentioned less often were “Recommendation not related to the patient or
presented at the wrong time” and “Recommendation was not useful”.
Information quality Chapter 7
Information quality was the second-most mentioned dimension, with the relevance of
the recommendations as the most important topic. Many GPs felt that recommendations
that weren’t based on a specific diagnosis, e.g. only based on age or gender, were often
irrelevant and took up too much time to be worthwhile. For example, recommendations
that triggered on age alone were perceived as less valuable than those limited to patients
with a particular diagnosis. These generic recommendations were usually preventative
in nature. Participants also noted that many of the simple recommendations overlapped
with protocolized and standardized care that was already being provided by physician
assistants, such as reminders relating to diabetes foot care. Therefore, these recommen-
dations did not improve care.
Topic Quote
Relevance
Relevance “There should be a distinction between alerts that are based on a diagnosis and alerts that are not”
Relevance “...or just [triggering] on age, when you have a healthy active older person in front of you, that is
completely useless.”
“...we should look at what care is already being offered by local nurses and physician assistants; I think
many of these [simpler] alerts should be removed because we are already aware of those patients.”
Service quality
Service quality was rarely mentioned. Participants seemed satisfied with both the docu-
mentation provided and the availability of support, although they indicated that the user
manual was too long (22 pages including many images).
99
Chapter 7
Topic Quote
Documentation
“Now, I must say that I found the instruction film very clear… and it was also fairly short, not too long, it
was very nice.”
Intended Use
The dimension “Intended Use” included barriers and facilitators to using the system and
following the system’s advice. The most important topic for this dimension was a lack
of time during the appointment to perform the suggested actions. Some participants
indicated that they had scheduled follow-up appointments to address the additional
recommendations. The number of recommendations was also mentioned as a factor,
although participants did not agree on whether the number of recommendations should
be limited as long as all recommendations were relevant.
Topic Quote
Time “Due to simple time pressure I often didn’t use it, and dragged the window away to a point where I no
longer saw it.”
Time “But that plays into a different problem… We have now, and function now, with increasingly complex
patients, and higher demand on care, more responsibilities too… and we still cram this into 10 minutes.”
Follow-up appointments “I thought that was good, the [feeling of] ‘Oh yeah, that’s a lot, that’s really polypharmacy, I should do
something about that.’ I used it as an alert and then asked people to come back to further explore the
issue.”
Net benefit
Participants felt that the system did offer some benefit to themselves and their patients
(10% of coded phrases). Participants felt that it reminded them to perform tasks such as
medication review, or inclined them to review the patient record more thoroughly.
Topic Quote
Better/worse care
“I liked being obliged to do it - [to know] where precisely you stand, is there something wrong - I have
Better/worse care everything in view. Done! But then you do want it to work flawlessly.”
“Now, I have the feeling that it did have some benefit. That I carefully reviewed the whole patient,
otherwise I wouldn’t have done that.”
Requested features
Participants mentioned several features that they felt would improve the system. Partici-
pants wanted positive feedback when they had completed a task recommended by the
system. The system would simply remove the recommendation from the list when the
task was completed, but participants said they would instead prefer if the recommenda-
tion changed colour or gave some other indication that the task had been done correctly.
Another required feature is the ability to add reminders to a “to do” list, so that they
could be dealt with outside of the time constraints of the patient visit. Finally, clinicians
would like to be able to customize the recommendations: removing recommendations
100
Acceptance and barriers pertaining to a decision support system for multiple clinical conditions
that they did not think were useful, and ideally adding their own recommendations to
the list.
Topic Quote
Positive feedback
“It would be nice if we would get a small confirmation that we’re doing the right thing, like a green
notification or something?”
Table 1 contains a summary of our findings and a justification.
Table 1. Summary of findings and justification
Recommendation Justification
Thoroughly test usability in final User experienced the “always on top” nature of the system to be highly annoying. This could
setting. have been prevented by more thoroughly testing the system with end users in the final setting.
Integrate, as tightly as possible, Users were disappointed by the fact that accepting a recommendation did not result in the
with the host system, while recommendation being effectuated in their system. They saw this as an opportunity to reduce
avoiding overlap with existing administrative load. Enabling this behaviour can increase user acceptance. Of note, studies have
alerts. shown that CDSS integrated in EHRs are less effective, this is likely due to existing alerts in the
EHR, thus overlap should be avoided.
Implement alerts based on Our users strongly agreed that alerts based on demographics alone were significantly less
specific criteria. useful than those based on diagnoses, medication or lab data. To reduce alert fatigue and
increase user acceptance, alert triggering criteria should be as specific as possible.
Allow user prioritization or alert Users indicated that simultaneously presenting alerts for different disease areas was not
selection. effective for them. They reported that they would rather switch from alert to alert for 1 month at
a time, and focus on one area.
Allow for different modes of Users indicated that they would often prefer to be able to go over the entire list of alerts that
presentation. was shown during the day, rather than only being presented with a list when the patient file was
open. They would have preferred to use some quiet time to go over all patient alerts and select
the ones that they felt were of importance.
Provide high quality training Our users appreciated the different training opportunities (face to face, online video, PDF). Chapter 7
material through different It allowed all users to be informed about the system in a way they preferred. Other studies
channels. showed that lack of knowledge about the CDSS can limit effectiveness, thus emphasizing on
training is an easy way to increase user acceptance.
Allow for positive feedback in Although of lesser importance than the other points mentioned, users indicated that they
the system. would also appreciate getting positive feedback to offset the alerts indicating they were “doing
something wrong”. Not all users agreed on this point, some indicated that providing high
quality care was enough reward in itself. We recommend therefore that this option is made
configurable.
D isc u ssion
This study used both qualitative and quantitative methods to identify barriers for usage
of a primary care clinical decision support system. Despite strenuous efforts to create a
non-interruptive CDSS, the system was nonetheless perceived as interruptive. Although
the window displaying the recommendations was generally minimized to a thin bar, users
did not like that it was “always on top”. They were divided on the utility of recording a
reason for declining a recommendation, which was required for some recommendations.
101
Chapter 7
They noticed that the CDSS was not fully integrated into their EHR, leading to the need to
do work twice (first accept the recommendation in the CDSS and then perform the action
in the EHR). Participants felt that there were too many recommendations and that the
relevance of the recommendations varied. They specifically noted that preventative care
recommendations based only on age tended to be irrelevant, while risk-based recommen-
dations based on more specific criteria such as a diagnosis were more relevant. The other
main barrier to use, although not directly related to the CDSS, was lack of time to perform
the suggested actions during the patient visit. One user handled this by scheduling addi-
tional visits. Despite these problems, users felt the system did make some improvement
to care, particularly by reminding users of tasks that would otherwise be forgotten and in
encouraging more thorough assessment of older patients. Users remained positive about
the utility of CDSS in their practice. Participants felt that the system would be improved
by adding specific, visible feedback for tasks performed correctly, allowing recommenda-
tions to be moved to a task list that could be handled outside of the patient visit, and
facilities for customizing which recommendations appeared.
Strengths and limitations
Two researchers performed all qualitative analyses and results were compared, reducing
the risk of bias in this process. Use of established frameworks for understanding system
usage (UTAUT and IS Success model [17, 20]) allows structuring of the results, while ad-
ditional use of emergent categories ensures that no important topics are missed. Using a
survey allowed us to efficiently gather feedback from all users, and following this with a
focus group allowed us to gather a more nuanced understanding of the users’ experience
and views.
However, this study does have some limitations. First, the survey instrument was not
validated, although it was based on the validated UTAUT model of system use and con-
tent validity was established by two GPs. Our sample size was fairly small (37 GPs in
the participating practices) and limited to a subset of the GPs in the country, but survey
response rate was very high (all but two GPs completed the survey, and all participants
invited to the focus group attended). We selected participants for the focus group based
on their responses to the survey rather than randomly. This likely excluded users who
were less interested in giving feedback about the system, but this was a conscious deci-
sion in order to maximize the utility of the focus group.
Comparison to other studies
Many studies on CDSS failed to show user uptake and effectiveness in daily practice.
‘NHGDoc’, an EHR integrated CDSS co-developed by the Dutch College of General Prac-
titioners, was implemented in over 65% of Dutch GP practices for various systems but
had a usage rate of only 0.24%.[22] A focus group revealed the main barrier in that study
102
Acceptance and barriers pertaining to a decision support system for multiple clinical conditions Chapter 7
was lack of awareness of the CDSS and its capabilities. Their large scale implementation
likely made it harder to properly train participants, something that wasn’t an issue in our
current study. Barriers that did play a part in usage of both CDSS related to high intensity
of recommendations and the number of recommendations perceived as irrelevant. Other
barriers that were mentioned by GPs in both studies included lack of customizability and
the system not functioning optimally, leading to participants quickly giving up on the
system. Although not explicitly mentioned by our participants, alert fatigue [23] might
have played an important role in the lack of usage we found. GPs received a median of
4 notifications per patient, and although unsolicited popups were not used, the notifica-
tions were visible at all times. A recent trial by Cook et. al. [24] also suspected alert
fatigue acted as a barrier for usage, as did Lugtenberg et. al.[22, 24] However, popups
can be more effective in reducing error, despite often being perceived as annoying. The
challenge lies in identifying what recommendations warrant popups.
Another possible factor for lack of effectiveness mentioned by recent reviews is that
CDSSs integrated in an EHR were less effective than stand-alone CDSS implementations,
in both CDSS related to drug prescribing and in general.[25] One explanation for this
finding is that EHR systems tend to have many alerts already. Users of these systems
may already be experiencing alert fatigue, and new alerts may be more likely to be
ignored than alerts introduced as part of an entirely new system.[10] The relative ease
of integrating CDSS into an EHR may also lead to a lower threshold for adding an alert
into the system, leading to the inclusion of less relevant alerts, which in turn aggravates
alert fatigue.[26] An additional reason could be that an integrated system uses EHR
data to provide recommendations and these data may be incomplete or inaccurate.[27]
Therefore, lack of effectiveness may also be related to repeated erroneous recommen-
dations.[28] To put it simply, without high quality data, CDSS cannot provide accurate
recommendations.
Interpretation, implications, impact
These results are likely to be useful to other CDSS researchers and system designers
who are attempting to address the problems of interruption and alert fatigue in decision
support.[29, 30] Although our system was carefully designed to be non-interruptive, it
was not perceived as such by the users. The “always on top” feature contributed to this
impression, as well as expanding on mouse over, which was perceived as a popup and
hence “interruptive”. An additional prototyping stage may have revealed these problems
earlier and allowed consideration of different design choices. Users were also aware of
the incomplete integration with the electronic patient record system, particularly when
the CDSS asked them to document care but that documentation was not carried over to
the EHR. Despite the fact that integrating CDSS in existing EHRs might reduce effective-
ness, we are confident that tight EHR integration is the way forward. Tight integration
103
Chapter 7
means only one system providing recommendations and the ability to perform actions
directly from a recommendation (e.g. ordering a blood test).
Perhaps most interesting is that the users differentiated general recommendations which
triggered on age alone (e.g. ‘vitamin D prescription for elderly patients’) from recommen-
dations relying on more specific criteria, such as a diagnosis. The latter were perceived
as much more relevant. There seemed to be some overlap between whether the GPs felt
they had enough time to handle the recommendations, whether there were “too many”
recommendations, and whether the recommendations were perceived to be relevant.
Participants were willing to make time to handle more recommendations if they were all
considered highly relevant to patient care. Thus, stricter selection of recommendations
and prioritizing recommendations may increase the perceived usefulness of the system.
Despite the problems with the system, users did see some value in it and felt it contrib-
uted to patient care. Further, they felt decision support was a good idea and indicated
that they would be willing to try an improved version of the system.
Conclusion
Decision support systems can meet an important need in future healthcare, with its many
guidelines and high administrative load, and participants in our study acknowledged the
potential these systems hold for healthcare. However, implementing these systems in
daily practice for multiple domains remains challenging. Prioritization, user customiza-
tion, tight EHR integration and strict selection of recommendations might improve CDSS
effectiveness. The lack of time to handle recommendations during the patient encounter
may be partly addressed by allowing users to move recommendations to a task list or
through other modes of presentation (e.g. e-mail). More focussed research on features of
multi-domain decision support systems is required to guide vendors towards effective
real world implementations.
Acknowledgements
P. Keijser and J.M. van Es, for providing GP demographics data
Funding
DA was funded by an unrestricted grant from Boehringer Ingelheim. AAH and SM were
funded by ZonMw (The Netherlands Organization for Health Research and Development)
for the ICOVE (#311020302) projects.
Competing interests
None
104
Acceptance and barriers pertaining to a decision support system for multiple clinical conditions
R eferences
1. Shortliffe EH, Cimino JJ. Biomedical informatics: computer applications in health care and Chapter 7
2. biomedicine: Springer Science & Business Media; 2013.
Roshanov PS, Misra S, Gerstein HC, Garg AX, Sebaldt RJ, Mackay JA, et al. Computerized clini-
3. cal decision support systems for chronic disease management: a decision-maker-researcher
partnership systematic review. Implement Sci. 2011;6 :92.
4. Hemens BJ, Holbrook A, Tonkin M, Mackay JA, Weise-Kelly L, Navarro T, et al. Computerized
5. clinical decision support systems for drug prescribing and management: a decision-maker-
researcher partnership systematic review. Implement Sci. 2011;6 :8 9.
6. Bright TJ, Wong A, Dhurjati R, Bristow E, Bastian L, Coeytaux RR, et al. Effect of clinical
7. decision-support systems: a systematic review. Ann Intern Med. 2012;1 57(1):2 9‑43.
8. Moja L, Kwag KH, Lytras T, Bertizzolo L, Brandt L, Pecoraro V, et al. Effectiveness of Computer-
ized Decision Support Systems Linked to Electronic Health Records: A Systematic Review and
9. Meta-Analysis. Am J Public Health. 2014;1 04(12):E12-E22.
Mutasingwa DR, Ge H, Upshur REG. How applicable are clinical practice guidelines to elderly
10. patients with comorbidities? Can Fam Physician. 2011;57(7):E253-E62.
Matui P, Wyatt JC, Pinnock H, Sheikh A, McLean S. Computer decision support systems for
11. asthma: a systematic review. NPJ Prim Care Respir Med. 2014;2 4:1 4005.
Jeffery R, Iserman E, Haynes RB, Team CSR. Can computerized clinical decision support sys-
12. tems improve diabetes management? A systematic review and meta-analysis. Diabet Med.
2013;30(6):739‑45.
13. Anchala R, Pinto MP, Shroufi A, Chowdhury R, Sanderson J, Johnson L, et al. The role of Deci-
sion Support System (DSS) in prevention of cardiovascular disease: a systematic review and
14. meta-analysis. PLoS One. 2012;7 (10):e 47064.
15. Roshanov PS, Fernandes N, Wilczynski JM, Hemens BJ, You JJ, Handler SM, et al. Features
of effective computerised clinical decision support systems: meta-regression of 162 ran-
16. domised trials. BMJ. 2013;346:f657.
17. Kawamoto K, Houlihan CA, Balas EA, Lobach DF. Improving clinical practice using clinical
decision support systems: a systematic review of trials to identify features critical to success.
BMJ. 2005;330(7494):765.
Medlock S, Wyatt JC, Patel VL, Shortliffe EH, Abu-Hanna A. Modeling information flows in
clinical decision support: key insights for enhancing system effectiveness. J Am Med Inform
Assoc. 2016.
van der Ploeg E, Depla MF, Shekelle P, Rigter H, Mackenbach JP. Developing quality indicators
for general practice care for vulnerable elders; transfer from US to The Netherlands. Qual Saf
Health Care. 2008;17(4):291‑5.
Arts DL, Abu-Hanna A, Buller HR, Peters RJG, Eslami S, van Weert HCPM. Improving stroke
prevention in patients with atrial fibrillation. Trials. 2013;14.
Eslami S, Askari M, Medlock S, Arts DL, Wyatt JC, van Weert HCPM, et al. From assessment to
improvement of elderly care in general practice using decision support to increase adher-
ence to ACOVE quality indicators: study protocol for randomized control trial. Trials. 2014;1 5.
fibrillation DCoGPGDGfA. Guideline Atrial fibrillation (second partial revision). Huisarts &
Wetenschap. 2013;56(8):3 92‑401.
Venkatesh V, Morris MG, Davis GB, Davis FD. User acceptance of information technology:
Toward a unified view. MIS quarterly. 2003:4 25‑78.
105
Chapter 7
18. Medlock S, Eslami S, Askari M, Brouwer HJ, van Weert HC, de Rooij SE, et al. Attitudes and
experience of Dutch general practitioners regarding computerized clinical decision support.
19. Stud Health Tech Informat. 2013;186:5 6‑60.
20. Zhang Y, Wildemuth BM. Qualitative Analysis of Content 2006. Available from: https://www.
21. ischool.utexas.edu/~yanz/Content_analysis.pdf.
22. DeLone WH, McLean ER. Information systems success: The quest for the dependent variable.
Information systems research. 1992;3(1):6 0‑95.
23. GmbH VSCS. MAXQDA, software for qualitative data analysis. 12 ed2016.
Lugtenberg M, Weenink JW, van der Weijden T, Westert GP, Kool RB. Implementation of
24. multiple-domain covering computerized decision support systems in primary care: a focus
25. group study on perceived barriers. BMC Med Inform Decis Mak. 2015;15:8 2.
Kesselheim AS, Cresswell K, Phansalkar S, Bates DW, Sheikh A. Clinical decision support
26. systems could be modified to reduce ‘alert fatigue’ while still minimizing the risk of litigation.
Health Aff (Millwood). 2011;3 0(12):2310‑7.
27. Cook DA, Enders F, Caraballo PJ, Nishimura RA, Lloyd FJ. An automated clinical alert system for
28. newly-diagnosed atrial fibrillation. PLoS One. 2015;10(4):e0122153.
Hemens BJ, Holbrook A, Tonkin M, Mackay JA, Weise-Kelly L, Navarro T, et al. Computerized
29. clinical decision support systems for drug prescribing and management: A decision-maker-
researcher partnership systematic review. Implement Sci. 2011;6(1):8 9.
30. McCoy AB, Waitman LR, Lewis JB, Wright JA, Choma DP, Miller RA, et al. A framework for
evaluating the appropriateness of clinical decision support alerts and responses. J Am Med
Inform Assoc. 2012;19(3):3 46‑52.
Mukherjee M, Wyatt JC, Simpson CR, Sheikh A. Usage of allergy codes in primary care elec-
tronic health records: a national evaluation in Scotland. Allergy. 2016.
Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PR, Bernstam EV, et al. Caveats for the use of
operational electronic health record data in comparative effectiveness research. Med Care.
2013;51(8 Suppl 3):S 30‑7.
Payne TH, Hines LE, Chan RC, Hartman S, Kapusnik-Uner J, Russ AL, et al. Recommendations
to improve the usability of drug-drug interaction clinical decision support alerts. J Am Med
Inform Assoc. 2015;2 2(6):1 243‑50.
Li SY, Magrabi F, Coiera E. A systematic review of the psychological literature on interruption
and its patient safety implications. J Am Med Inform Assoc. 2012;19(1):6-12.
106
Chapter 8
Development of a regression model
and neural network to predict stroke
and comparison to the CHA2DS2-vASc
in a large dataset
D.L. Arts, H.A. van den Ham, T. van Staa, H.C.P.M. van Weert, A. Abu-Hanna
Chapter 8
A bstract
Aims
International guidelines for atrial fibrillation use the CHA2DS2-VASc to determine anti-
coagulation strategies. This risk score performs poorly at classifying low risk patients,
leading to overtreatment which should be avoided due to the risk of adverse events
and increased healthcare cost. We aim to improve stroke risk prediction by using logistic
regression and neural networks. We will investigate if patients can benefit from more
precise predictions and thus more tailored treatment for atrial fibrillation. This in turn
should result in reduced overtreatment of low risk patients.
Methods and Results
We trained a logistic regression (LR) model and a neural network (ANN) to predict stroke
in a large cohort without anticoagulant treatment. We compared stroke predictions by
these models with CHA2DS2-VAsc based predictions. Low and high risk groups were cre-
ated to compare performance in these groups. The Net Reclassification Index (NRI) was
calculated to investigate the number of overtreatment cases that could be avoided by
using prediction models. The ANN performed best (c-index: 0.67 95% CI [0.63,0.70]),
followed by the LR model (c-index: 0.66 95% CI [0.63,0.70]) and the CHA2DS2-VASc
(c-index: 0.64 95% CI [0.60,0.67]). The ANN reduced overtreatment of low risk patients
with 59%, followed by the LR with 56%. Using the ANN for stroke prediction could have
saved 12 intracerebral hemorrhage and 58 major bleeding events per 100.000 patients
respectively. Undertreatment increased less than 1% for both models.
Conclusion
Logistic regression models and neural networks can improve stroke risk prediction and
reduce overtreatment when compared to the CHA2DS2-VASc.
108
Development of a regression model and neural network to predict stroke
I ntrod u ction
Patients with atrial fibrillation (AF) are at increased risk for stroke.[1] Oral anticoagulants Chapter 8
(OAC) such as warfarin and the new oral anticoagulants (NOACs), can reduce this risk,
but may have serious side-effects[2, 3]. Determining which patient should receive
this medication has been a topic of discussion for many years. Most recent guidelines
recommend prescribing OAC for all but the lowest risk categories to allow for easy imple-
mentation and optimal stroke risk reduction. To determine which patients should receive
OAC, stroke risk scores are used to predict individual risk of stroke for each patient and
balanced against possible side effects. Several such schemes have been developed in
the past and these are used in international guidelines.[4, 5]
Currently, the CHA2DS2-VASc is the most widely used stroke risk score: it consists of 7
variables that can add up to a total of 9 points and generally performs better at predicting
stroke than the CHADS2.[4] Due to the high net benefit of treatment with antithrombotic
medication, the stroke risk cut off for this treatment has been lowered over the past few
years.[6] The most recent ESC guidelines recommend treatment for patients scoring 1
point or more. The result is that almost any patient over 65 years old with AF should
receive antithrombotic medication. While this treatment strategy might be optimal for
most AF patients, it results in overtreatment for some, namely low risk patients. The issue
here is the coarseness of stroke risk scores.[7, 8] This lack of precision inevitably results
in guidelines recommending treatment for low risk patients that is not properly tailored
to their risk profile. A secondary issue with stroke prevention in AF is low guideline adher-
ence. Several studies have found that a large proportion of patients is not treated in
accordance with recent guidelines.[9, 10] While we recognize that this topic is of the
utmost importance, the scope for this study is limited to improving stroke prediction
accuracy.
Statistical methods like logistic regression (LR) and artificial neural networks (ANN) have
been around for many years and play an important role in certain fields of medicine.[11]
For example, ANN are currently used for the prediction of breast cancer and analysis of
MRI images.[12, 13] Furthermore these methods has have already been put to use in the
prediction of AF from ECGs, showing excellent accuracy.[14] Overall, however, its use
is rare due to a lack of expertise and experience. We hypothesize that the increasing
number of stroke risk factors allows for more accurate predictions when used in models
than in simple additions like the CHA2DS2-VAsc, as these models can better mimic the
complex real life associations between risk factors.
We aim to investigate if stroke risk prediction can be improved by developing a logistic
regression model and neural network. We will investigate if patients can benefit from
more precise predictions and thus more tailored treatment for atrial fibrillation. This in
turn should result in reduced overtreatment of low risk patients.
109
Chapter 8
METHODS
Figure 1 contains a visual overview of analyses performed for this study: We trained
a logistic regression model and a Neural Network and compared their predictions for
stroke with CHA2DS2-VAsc based predictions. Low and high risk groups were created to
compare performance in these groups.
Inital Impute missings Complete
dataset cases
Randomly split
dataset
85% 15%
Training Validation
AIC stepwise ANN Grid search for
selection optimal decay, size,
(backwards) inputs.
Train LR with Train ANN with Calculate CHA2DS2- Determine Low &
selected variables optimal setup VASc High risk groups
Trained ANN Predict stroke using C2V, Calculate NRI in Low and High
Trained logistic ANN, LR risk groups with C2V as old,
and ANN + LR as new model
regression Determine model
model performance (Calculate C-
Index + Brier score)
Figure 1. Structured overview of analyses performed.
Data source
The Clinical Practice Research Datalink contains anonymized information of over 11
million patients registered at over 600 general practices in the United Kingdom from
1998-2012. Information is continuously recorded for each patient, including a record
of each consultation, diagnoses, prescribed medicines, and basic demographic data.
The geographical distribution and size of general practices represented in the Clinical
Practice Research Datalink is largely representative for England and Wales, and people
110
Development of a regression model and neural network to predict stroke Chapter 8
registered in the database are representative of the UK population in terms of age and
sex. The quality of data is subject to rigorous checks and regular audits.[15, 16]About
50% of the practices in CPRD have been linked to other datasets in England such as the
Hospital Episode Statistics (HES). The HES includes records of inpatient hospitalizations,
such as date of admission and discharge, diagnoses and procedures done. The current
study only used the practices that have been linked to HES to ensure high quality data.
Risk factor were based on diagnostic codes unless otherwise specified below. Diagnostic
codes were extracted from both HES (ICD-10 codes) and CPRD (Read-codes). For renal
failure and proteinuria, lab test records were used in addition to diagnostic codes. Renal
dysfunction was defined as an estimated glomerular filtration rate (eGFR) of <60 ml/
min/1.73m2 or a diagnosis of renal failure. Patients that did not have a lab value or a
diagnostic code for renal failure or proteinuria were assumed to have normal values.
Smoking was defined as ‘current smoker’. All risk factors were assessed at baseline.
Study population
Patients aged ≥18 years with a record of AF and no prior warfarin/NOAC or heparin
prescription were included. Patients with a record of rheumatic valve disease or valvular
repair/replacement were excluded. The first occurrence of AF at least 12 months after
the inclusion in the database was used as an index date. The outcome of interest was
ischemic stroke recorded either in CPRD (according to Read coding) or HES (according
to ICD-10 codes). Patients were censored at either start of warfarin or NOAC use, stroke,
death or end of follow up.
Variable selection
Variables selection for our prediction model was based on recent literature on predictors
for stroke in AF and availability CPRD.
Associations with outcome measure were calculated in a univariate logistic regression
model using stroke during follow up as outcome.
Missing data
Missing data (BMI, smoking, ethnicity) were imputed with the MICE library for R using
Gibbs sampling, to allow for inclusion of all cases into the analyses.[17]
Risk prediction models
Logistic Regression (LR) and Artificial Neural Networks (ANN) were used to predict the risk
of stroke. Stroke during follow up was used as dependent variable.
In the logistic regression model, variable selection was performed using backwards
selection based on the Akaike Information Criterion (AIC). The model was trained on a
random 85% sample of the dataset (training set) and the remaining 15% was used to
111
Chapter 8
validate the model (validation set). This validation set was stored in a separate location
and was only used for final validation.
The main reason for using Neural Networks is the fact that they can automatically
model complex non-linear relationships. This can improve predictions when compared
to simpler models that pose stringent assumptions (e.g. linearity between covariate and
outcome).
We used the R library “NeuralNet” to create and train a neural network. The same training
and validation set that were used to train and validate the LR were used for the ANN. The
network was initially created with a default setup of 1 ‘hidden layer’ and 1 ‘node’. This
initial setup was optimized by way of a ‘grid search’. This approach systematically evalu-
ates the performance of a network architecture by modifying parameters that determine
how the network functions.
These models were compared to the CHA2DS2-VASc. The CHA2DS2-VASc uses 7 variables
to predict stroke (congestive heart failure, hypertension, age between 65 and 74 years
and age >75 years, prior stroke/TIA, vascular disease, diabetes mellitus, female sex).
Patients can score from 0 to 9 points. The European Society of Cardiology recommends
OAC (warfarin/NOACs) for patients scoring 1 point or more.[4, 18, 19]
Both methods used the same initial variables. The LR model and ANN were built using
R.[20]
Low and high risk groups
As net benefit of antithrombotic medication for patients at high risk for stroke is great,
most room for improvement is expected to be in low risk patients. To separate patients
into a low risk and a high risk group, we calculated a one year stroke risk using Cox
regression (figure 1). AF patients with an estimated 1 year stroke risk below 1.7% were
classified as low risk, the others as high risk. The 1.7% stroke rate is based on a study that
determined the cut-off annual stroke rate where OAC treatment leads to net benefit.[21]
Predictive performance analyses
To compare predictive performance, the CHA2DS2-VASc was used as a continuous vari-
able in a logistic regression model with stroke during follow up as outcome. Likewise the
continuous predicted values from the LR model and the ANN were used. The C-statistic,
weighted for follow up time, was used to compare the discriminating ability of each
method.[22] Furthermore, the net reclassification index (NRI) was calculated to provide
a more clinical and easier to interpret measure of model improvement. The Brier score
was calculated to assess the accuracy of the predictions. It measures the mean squared
difference between the predicted probability assigned to the possible outcomes and the
actual outcome. The lower the Brier score, the better the model accuracy.
112
Development of a regression model and neural network to predict stroke Chapter 8
Net Reclassification Index
The Net Reclassification Index (NRI) is a measure for evaluating the improvement in pre-
diction performance of prediction models. It reflects the reclassification of cases (event
NRI) and non-cases (non-event NRI) as result of a new model when compared to and old
model.[23]Although its use has been under discussion[24, 25], it remains one of the
easier to interpret measures for prediction performance. The NRI consists of two sub
scores: the event NRI and the non-event NRI. The event NRI resembles the number of
patients with the outcome of interest (cases) that have been correctly reclassified from
non-case to case, when comparing predictions from the old model to the new model.
The non-event NRI works the other way; it resembles correct reclassification of cases to
non-cases by the new model. The total NRI is a weighted combination of the sub scores.
The higher the NRI, the better the new model performs compared to the old model.
The R package “nricens” was used for generating NRIs. To calculate the NRI for the
CHA2DS2-VASc, a binary variable of the CHA2DS2-VASc was created, making all CHA2DS2-
VASc scores > 0 correspond with a binary value of 1, thus reflecting ESC-guideline recom-
mendations for prescribing OAC for scoring > 0 points.
The CHA2DS2-VASc was used as ‘old’ model, and the ANN and the LR model were used as
‘new’ models. Using this approach we ensured we assessed effects pertaining to the actual
clinical situation, i.e. whether or not a physician would prescribe OAC for a specific patient.
Reduction in adverse events
To calculate how many adverse events we can prevent by reducing overtreatment, we
assume a number needed to harm (NNH) of 909 and 208 for intracranial hemorrhage
(ICH) and major bleeding events respectively (based on the 2007 study by Hart et. al.)
[26]. We can calculate how many events can be prevented by taking the proportion of
patients correctly reclassified as non-event, multiplied by the number of patients in the
low risk group, divided by the NNH
R es u lts
A total of 60594 patients was included in our analyses with a total of 125297 patient
years and a mean follow up duration of 1 year. Patients were on average 74 years old (SD
12.11), over half was male (51%), approximately 15% had suffered a previous stroke
and 6% suffered a stroke during follow up (Table 1). The annual stroke rate was 3%.
The mean CHA2DS2-VASc was 3.30 (Median: 3.00; IQR:[2.00, 4.00]) with 93% of patients
scoring 1 point or more, meaning they should receive OAC according to ESC- guidelines.
1604 patients were classified as low risk (18%) and 7468 patients as high risk (82%) in
the validation set.
113
Chapter 8
Table 1. Characteristics of study population demographics
Follow up until censoring (years)*
Mean 2.07
Median 0.74
Interquartile range 0.13 - 3.12
No stroke during follow up Stroke during follow up
56843 3751
Age 74.0 12.2 79.8 9.0
Mean (SD) 11506 20% 236 6%
< 65 14245t 25% 678 18%
65-75 20019 35% 1618 43%
75-85 11073 19% 1219 32%
85+
Gender 29640 52% 1448 39%
Male 27203 48% 2303 61%
Female
Smoking 8913 16% 508 14%
Current smoker 47930 84% 3243 86%
Non-smoker or unknown
BMI 8134 14% 755 20%
Unknown 2534 4% 202 5%
< 20 14688 1037
20-25 18574 26% 1177 28%
25-30 12913 33% 580 31%
> 30 23% 15%
Ethnicity 44788 79% 3059 82%
White 773 1% 53 1%
Other*
Unknown 11282 20% 639 17%
Social economic status 12877 23% 825 22%
0 20% (most deprived) 14656 26% 941 25%
21-40% 12145 21% 782 21%
41-60% 10194 18% 709 19%
61-80% 12% 494 13%
81-100% (least deprived) 6971
114
Development of a regression model and neural network to predict stroke
Table 1. Characteristics of study population demographics (continued)
No stroke during follow up Stroke during follow up
CHA2DS2-Vasc score 3969 7% 45 1%
0 3%
95%
1 6405 11% 130 1.7
>1 46469 82% 3576
Mean (SD) 3.24 1.8 4.15
Medical history 7770 14% 1146 31%
History of stroke 17405 31% 1231 33%
Vascular comorbidity 30819 54% 2207 59%
Hypertension 12% 13%
Diabetes 6917 17% 480 19%
Congestive heart failure 9855 28% 716 32%
Renal failure 15773 1217
Proteinuria 1659 3% 3%
97
Study population characteristics. N and percentage of total population are reported unless other-
wise indicated.
Variable selection
Variable selection is displayed in Table 2. Gender, age, history of stroke and diabetes
were selected for every model. Hypertension was not selected for the LR model and ANN,
vascular disease was not selected by the LR and Cox models. Congestive heart failure
was only selected for the LR model. Only the LR model selected ethnicity as a relevant
predictor. Smoking was selected as a strong predictor for the ANN. Table 3 displays odds
ratios for the selected logistic regression model. Figure 2 shows a graphical representa-
tion of the ANN.
Chapter 8
115
Chapter 8
Table 2. Variable selection
Variables Model
Gender LR ANN COXPH CHADS2 CHA2DS2-VASc Univariate LR
Age
History of stroke xx x x P < 0.001
Vascular comorbidity
Hypertension xx x x x P < 0.001
Diabetes
Congestive heart failure xx x x x P < 0.001
Renal failure
Proteinuria x x P < 0.01
Social economic status
BMI (categorical) xx x P < 0.001
Smoking status
Ethnicity xx x x x P = 0.25
x x x P < 0.01
xx P < 0.001
P = 0.24
P < 0.05
x P < 0.001
xx P = 0.30
P =0.10
Variables for logistic regression (LR) and Cox proportional hazards models were selected using AIC
based stepwise backwards selection. Artificial Neural Network (ANN) input variables were selected using
a grid search, i.e. testing each combination of variables on a training set and evaluating performance.
Prediction and misclassification
In the validation dataset, the ANN performed best (weighted c-index: 0.67 95% CI
[0.63,0.70]), followed closely by the LR model (weighted c-index: 0.66 95% CI [0.63,0.70])
and the CHA2DS2-VASc (weighted c-index: 0.64 95% CI [0.60,0.67]) (table 4). These dif-
ferences were not statistically significant. All models had a Brier score of 0.06.
Table 5 shows the Net Reclassification Index. On the overall NRI, the LR model and the
ANN performed similarly (NRI: 0.19 95% CI [0.18, 0.2] for the LR and [0.19, 0.2] for the
ANN) compared to the CHA2DS2-VASc risk score. In the low risk group, the ANN performed
best (NRI: 0.58 95% [0.55, 0.6]) followed by the LR (NRI: 0.56 95% CI [0.54, 0.59]) com-
pared to the CHA2DS2-VASc risk score. The non-event NRI (NRIn) in the low risk group is
the most interesting statistic to study as it reflects patients that were accurately down-
coded in the low risk group, i.e. reduced overtreatment. The ANN showed an NRIn of 0.59
(95% CI [0.56, 0.61]), the LR model showed an NRIn of 0.57 (95% CI [0.55, 0.60]).
Using the ANN for stroke prediction could have saved 1 ICH events and 5 major bleeding
events in the validation group, and roughly 7 ICH events and 35 major bleeding events in
the total study population. These prevented events come at no additional cost (economi-
cally or harm). This is calculated by taking the proportion of patients correctly reclassified
as non-event (0.59) and multiplying that with the number of patients in the low risk
group (1604) and finally dividing by the NNH.
116
Development of a regression model and neural network to predict stroke
Table 3. Odds ratios for univariate associations with stroke during follow up of individual risk factors
Variables Odds ratio Confidence interval
History of stroke 2.78 [2.58-2.99]
Age in years (per year) 1.05 [1.05-1.05]
Diabetes 1.06 [0.96-1.17]
Congestive heart failure 1.12 [1.03-1.22]
Renal failure 1.25 [1.16-1.34]
Smoking status 0.86 [0.82-0.9]
BMI 0.96 [0.96-0.97]
Gender is male 0.58 [0.54-0.62]
Odds ratios and 95% confidence intervals were calculated using univariate logistic regression models
in R.
Table 4. Model performance
Weighted Confidence Brier Score Confidence interval
c-index interval
0.06 [0.05,0.06]
CHA2DS2-VASc 0.64 [0.60,0.67] 0.06 [0.05,0.06]
Logistic Regression 0.66 0.06 [0.05,0.06]
[0.63,0.70]
Neural network 0.67
[0.63,0.70]
Discriminatory ability of stroke prediction methods (C-indices) and model accuracy (Brier score),
weighted by follow up duration.
D isc u ssion
In this study we aimed to improve stroke prediction in atrial fibrillation overall and for Chapter 8
low risk groups. Neural networks performed best on a separate validation set, closely
followed by logistic regression compared to the CHA2DS2-VASc risk score, differences in
the weighted C statistic were not significant. When examining reclassification, the ANN
performed better in the low risk group, although differences with the LR model were
marginal.
As the CHA2DS2-VASc classifies most patients in the (moderate to) high risk groups
(93%), there is little to be gained in the high risk groups in terms of reclassification.
Consequently, much can be gained in the low risk group by downcoding non-events. In
the low risk group, the ANN correctly reclassified 59% of the CHA2DS2-VASc cases as a
non-case and the LR model 57% of CHA2DS2-VASc cases.
This study shows CHA2DS2-VASc performs poorly for low risk patients, resulting in
unnecessary (over)treatment in clinical practice. Relatively poor performance of the
CHA2DS2-VASc, and other stroke risk prediction schemes, has been found in several large
studies, showing similar c-indices.[7, 27, 28] Other studies report higher c-indices for
CHA2DS2-VASc. Differences in the predictive ability can be explained by different type of
117
Chapter 8
B1 B2
Previous stroke I1
Age I2 H1
Diabetes I3 H2
H3 O1 Stroke
Gender I4 H4
Renal failure I5 H5
Smoking I6
Vascular disease I7
Proteinuria I8
Figure 2. The final neural network, colors indicate the relative importance of each variable, red being
the most important, green the least. The thickness of each line indicates the (weight of) the influence
the nodes have on each other.
Table 5. Net reclassification of CHA2DS2-VASc to new models
All patients
Base model New model NRIe 95% CI NRIn 95% CI NRI 95% CI
0.19 [0.18, 0.2]
CHA2DS2-VASc LR 0 [-0.01, 0] 0.19 [0.19, 0.2] 0.19 [0.19, 0.2]
CHA2DS2-VASc ANN -0.01 [-0.01, 0] 0.2 [0.19, 0.21] 0.01 [0, 0.01]
LR ANN 0 [0, 0] 0.01 [0, 0.01]
Low risk group (yearly stroke risk < 1.7%)
Base New NRIe 95% CI NRIn 95% CI NRI 95% CI
[-0.01, 0] 0.57 [0.55, 0.6] 0.56 [0.54,0.59]
CHA2DS2-VASc LR -0.01 [-0.01, -0.01] 0.59 [0.56, 0.61] 0.58 [0.55, 0.6]
CHA2DS2-VASc ANN -0.01 [0, 0] 0.01 [0, 0.02] 0.01 [0-0, 02]
LR ANN 0
High risk group (yearly stroke risk > 1.7%) 95% CI NRIn 95% CI NRI 95% CI
Base New NRIe
CHA2DS2-VASc LR 0 [0, 0] 0.11 [0.1, 0.12] 0.11 [0.1, 0.12]
CHA2DS2-VASc ANN 0 [-0.01, 0] 0.12 [0.11, 0.12] 0.11 [0.11, 0.12]
LR ANN 0 [0, 0] 0.01 [0, 0.01] 0 [0, 0.01]
Higher scores indicate better performance of the new model when compared to the old model.
NRIn: The net percentage of persons without the event of interest correctly classified downward
NRIe: The net percentage of persons with the event of interest correctly classified upward
118
Development of a regression model and neural network to predict stroke Chapter 8
statistical analysis or differences in study population and data processing or a different
definition of stroke (with or without a TIA).[5, 29]
The main purpose of the CHA2DS2-VASc has been to identify truly low risk patients and
studies have shown warfarin and NOACs to be beneficial for all but the lowest risk catego-
ries (CHA2DS2-VASc = 0).[4] Nevertheless, this study shows that improvements can still be
made in stroke prediction despite the low cutoff for a net benefit. Both logistic regression
and neural networks are capable of making more accurate predictions, especially in low
risk groups, resulting in less overtreatment and thus less bleedings and lower costs.
[29] We found two studies that investigated potential benefits of using other statistical
methods for stroke prevention in AF. These studies showed that current diagnosis and
treatment can be improved by using modeling (in this case, neural networks and Bayes-
ian statistics).[30, 31] Due to the differences in methodology we cannot directly compare
our results to these studies.
Positive results using modeling methods are also confirmed in another study where the
Framingham risk prediction schemes were compared to more advanced methods.[32]
We attribute the lack of difference in performance between the LR and the ANN to the
fact that many available risk factors are binary, and these data do not have the ‘depth’
from which an ANN can benefit by making complex non-linear relationships between risk
factors. Thus the most likely reason for the enchanced predictive performance of the LR
and ANN is the more accurate weighing of variables that these models are capable of,
as opposed to a ‘sum’ of risk factors. Future studies on the use of these models in stroke
prediction should attempt to include more risk factors as continuous variables.
Variable selection
Studies have mentioned that consistency of AF predictors is suboptimal. Hypertension,
vascular disease, heart failure, diabetes and gender are all mentioned, in separate stud-
ies, as having lower predictive power than core attributes for stroke prediction, namely:
previous stroke and age.[7, 27, 33-35] In this study, age, gender, history of stroke and dia-
betes were consistently selected as predictors of stroke. Hypertension (treated), although
part of both CHADS2 and CHA2DS2-VASc, has been under discussion in other papers. Its
lack of consistency is often attributed to differences in interpretation of blood pressure
measurements or missing data.[7, 35] This could also explain the lack of consistency in
our results. Renal failure was selected in 2 of the 3 models and has been suggested as
a predictor by several other studies which our data appears to support.[5, 33, 36] Smok-
ing, which was selected for both the LR model and the ANN is not often mentioned in
the literature on stroke prediction and warrants further investigation. Lastly, proteinuria,
which has been mentioned as a potential predictor for stroke, was only selected for the
ANN. Again this could be due to interpretation of lab results or missing data. It should
be noted that renal function is usually tested when there is indication to do so, and we
119
Chapter 8
assume there are no renal issues when no tests are documented. This clearly is a strong
assumption.
The above leads us to conclude that to develop truly accurate and consistent stroke risk
prediction models, future research into predictors of stroke should focus on predictors
that can be consistently determined from routinely collected data. This is even more
relevant as we seek to incorporate biomarkers and patient experience into stroke predic-
tion models.[6] Consistently good external validation of complex prediction scores might
be a tall order, and the future may lie in locally optimized prediction models that are
based on the local interpretation of routinely collected data.[37]
Strengths and limitations
This study used a large cohort of 60595 AF patients, naïve for anticoagulation, to
compare different means of predicting stroke risk. It is the first study to compare the
CHA2DS2-VASc to complex prediction models for stroke prediction in a large, routinely
collected dataset. It therefore represents a real world scenario more accurately, as these
prediction methods will mainly be implemented in Electronic Health Record systems.
Linkage to HES data ensured only confirmed episodes of stroke were included in the
dataset. By assessing reclassification in low and high risk groups we mimicked the clinical
situation where only truly low risk patients stand to benefit from being withheld OAC, as
opposed to other studies that focused on prediction accuracy rather than net benefit.
The ESC guideline formally does not recommend oral anticoagulation for female patients
with lone AF (3% of the study population), which might have resulted in a small number
CHA2DS2-VASc non-cases to be classified as cases in this study. We did not include
bleeding risk assessment into our models due to missing variables, while this is essential
in clinical practice. Future studies should investigate whether it is feasible to integrate
stroke risk and bleeding risk into a single model and provide physicians with information
that displays net benefit of OAC over time for individual patients.
The CPRD dataset consisted of only patients who did not already use OAC, which could
lead to selection bias, i.e. favoring patients that for some reasons did not get OAC pre-
scribed. Patients were censored at the start of OAC treatment, loss to follow up or death.
As antithrombotic treatment improved through the years, patients included in later years
may have been followed for a shorter period of time. This could have reduced the number
of high risk patients in the dataset in later years. Despite these shortcomings, it should
be noted that we currently do not have better methods to investigate novel methods for
stroke prediction, as it would be unethical to withhold patients with AF from OAC.
The use of random test sets and a separate validation set ensured we limited the chance
of over fitting our models and optimized external validity of our results. Although the
latter can only truly be established by external validation. Temporal validation was at-
tempted but not feasible due to differences in stroke rate between temporally sampled
120
Development of a regression model and neural network to predict stroke Chapter 8
training and validation sets, which is likely due to the short follow up period for patients
at the end of the dataset. It is possible that other statistical methods are more capable at
stroke prediction than LR models or ANN, which requires further investigation.
Implications for practice
The findings of this study lead us to conclude that the use of LR or ANN, specifically for
decision support, should be investigated. It would be safe to assume that most decision
support systems are guideline based, although no studies explicitly investigated this
fact. These guidelines contain traditional prediction and classification schemes like the
CHA2DS2-VASc, which are developed for use in daily practice and are thus kept simple.
In modern day medicine, daily practice revolves around a computer, which is capable
of much more than calculating a simple cumulative prediction scheme. This fact will be
exaggerated when more complex predictors like biomarkers are added to prediction
schemes. By fully utilizing prediction models, prediction of disease and classification
of patients can be improved. This has been demonstrated by several studies using re-
gression, Bayesian and neural network methodologies. It is however, paramount to seek
external validation before implementing computer based prediction and classification
models, as it is in traditional schemes.[38, 39]
C oncl u sion
Logistic regression models and neural networks can improve stroke risk predictions and
reduce overtreatment when compared to the CHA2DS2-VASc, thus reducing the number
of adverse events caused by oral anticoagulation. Future studies will investigate use of
these methods in other datasets, but we suggest more studies focus on use of prediction
models to classify patients. Furthermore we argue, that to create widely usable, stable
(stroke) prediction schemes, research should focus on predictors that can consistently be
determined from routinely collected data, or define how these are to be extracted from
routinely collected data.
Author contributions
RvR provided the data, performed data cleaning and contributed to the manuscript.
TvS helped with the initial planning of the study and significantly contributed to the
manuscript. DA performed the analyses, and wrote the manuscript. HvW significantly
contributed to the manuscript and helped interpreting findings. AA provided input for
the statistical analyses and significantly contributed to the manuscript.
121
Chapter 8
R eferences
1. Wolf PA, Abbott RD, Kannel WB. Atrial fibrillation as an independent risk factor for stroke: the
2. Framingham Study. Stroke. 1991;22(8):9 83‑8.
Miller CS, Grandi SM, Shimony A, Filion KB, Eisenberg MJ. Meta-Analysis of Efficacy and Safety
3. of New Oral Anticoagulants (Dabigatran, Rivaroxaban, Apixaban) Versus Warfarin in Patients
With Atrial Fibrillation. Am J Cardiol. 2012;110(3):453‑60.
4. Hylek EM, Go AS, Chang YC, Jensvold NG, Henault LE, Selby JV, et al. Effect of intensity of
oral anticoagulation on stroke severity and mortality in atrial fibrillation. N Engl J Med. 2003;
5. 349(11):1 019‑26.
Lip GY, Nieuwlaat R, Pisters R, Lane DA, Crijns HJ. Refining clinical risk stratification for
6. predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based
7. approach: the euro heart survey on atrial fibrillation. Chest. 2010;1 37(2):2 63‑72.
Singer DE, Chang YC, Borowsky LH, Fang MC, Pomernacki NK, Udaltsova N, et al. A New Risk
8. Scheme to Predict Ischemic Stroke and Other Thromboembolism in Atrial Fibrillation: The
ATRIA Study Stroke Risk Score. Journal of the American Heart Association. 2013;2 (3).
9. Lip GYH. Stroke and bleeding risk assessment in atrial fibrillation: when, how, and why? Eur
10. Heart J. 2013;34(14):1 041‑9.
van Staa TP, Setakis E, di Tanna GL, Lane DA, Lip GYH. A comparison of risk stratification
11. schemes for stroke in 79 884 atrial fibrillation patients in general practice. J Thromb Hae-
12. most. 2011;9(1):3 9‑48.
van den Ham HA, Klungel OH, Singer DE, Leufkens HGM, van Staa TP. Comparative Perfor-
13. mance of ATRIA, CHADS2, and CHA2DS2-VASc Risk Scores Predicting Stroke in Patients With
14. Atrial Fibrillation: Results From a National Primary Care Database. J Am Coll Cardiol. 2015;
15. 66(17):1 851‑9.
16. Lip GYH, Lane DA. Stroke Prevention in Atrial Fibrillation A Systematic Review. Jama-Journal
17. of the American Medical Association. 2015;313(19):1950‑62.
18. Arts DL, Visscher S, Opstelten W, Korevaar JC, Abu-Hanna A, van Weert HCPM. Frequency
and Risk Factors for Under- and Over-Treatment in Stroke Prevention for Patients with Non-
Valvular Atrial Fibrillation in General Practice. PLoS One. 2013;8 (7).
Lisboa PJ, Taktak AFG. The use of artificial neural networks in decision support in cancer: A
systematic review. Neural Netw. 2006;1 9(4):408‑15.
Meinel LA, Stolpen AH, Berbaum KS. Breast MRI lesion classification: Improved performance
of human readers with a backpropagation neural network computer-aided diagnosis (CAD)
system. Journal of magnetic …. 2007(Query date: 2014-01-02).
Priddy KL, Keller PE. Artificial neural networks: An introduction: books.google.com; 2005.
Kara S, Okandan M. Atrial fibrillation classification with artificial neural networks. Pattern
Recognition. 2007;40(11):2967‑73.
Khan NF, Harrison SE, Rose PW. Validity of diagnostic coding within the General Practice
Research Database: a systematic review. Br J Gen Pract. 2010;60(572).
The clinical practice research datalink, [Available from: http://www.cprd.com/intro.asp.
Stef van Buuren KG-O. mice: Multivariate Imputation by Chained Equations in R. Journal of
Statistical Software. 2011;4 5(3):1‑67.
Camm AJ, Lip GYH, De Caterina R, Savelieva I, Atar D, Hohnloser SH, et al. 2012 focused
update of the ESC Guidelines for the management of atrial fibrillation. Eur Heart J. 2012;
33(21):2719‑47.
122
Development of a regression model and neural network to predict stroke
19. January CT, Wann LS, Alpert JS, Calkins H, Cigarroa JE, Cleveland JC, Jr., et al. 2014 AHA/ACC/ Chapter 8
HRS Guideline for the Management of Patients With Atrial Fibrillation: Executive Summary:
20. A Report of the American College of Cardiology/American Heart Association Task Force on
21. Practice Guidelines and the Heart Rhythm Society. Circulation. 2014;1 30(23):2071‑104.
Team RDC. R: A Language and Environment for Statistical Computing. 2011.
22. Eckman MH, Singer DE, Rosand J, Greenberg SM. Moving the Tipping Point The Decision to
Anticoagulate Patients With Atrial Fibrillation. Circulation-Cardiovascular Quality and Out-
23. comes. 2011;4 (1):1 4‑21.
Kremers WK. Concordance for Survival Time Data: Fixed and Time-Dependent Covariates and
24. Possible Ties in Predictor and Time 2007 30-11-2015. Available from: http://www.mayo.edu/
25. research/documents/biostat-80pdf/doc-10027891.
Pencina MJ, D’Agostino RB, Sr., D’Agostino RB, Jr., Vasan RS. Evaluating the added predictive
26. ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat
27. Med. 2008;2 7(2):157-72; discussion 207‑12.
Vickers AJ, Pepe M. Does the Net Reclassification Improvement Help Us Evaluate Models and
28. Markers? Ann Intern Med. 2014;1 60(2):136-+.
29. Leening MJ, Vedder MM, Witteman JC, Pencina MJ, Steyerberg EW. Net reclassification im-
provement: computation, interpretation, and controversies: a literature review and clinician’s
30. guide. Ann Intern Med. 2014;1 60(2):122‑31.
Hart RG, Pearce LA, Aguilar MI. Meta-analysis: Antithrombotic therapy to prevent stroke in
31. patients who have nonvalvular atrial fibrillation. Ann Intern Med. 2007;1 46(12):857‑67.
32. Fang MC, Go AS, Chang Y, Borowsky L, Pomernacki NK, Singer DE, et al. Comparison of risk
stratification schemes to predict thromboembolism in people with nonvalvular atrial fibrilla-
33. tion. J Am Coll Cardiol. 2008;51(8):810‑5.
34. Group SRiAFW. Comparison of 12 risk stratification schemes to predict stroke in patients with
nonvalvular atrial fibrillation. Stroke. 2008;39(6):1901‑10.
35. Olesen JB, Lip GYH, Hansen ML, Hansen PR, Tolstrup JS, Lindhardsen J, et al. Validation of
risk stratification schemes for predicting stroke and thromboembolism in patients with atrial
fibrillation: nationwide cohort study. Br Med J. 2011;342.
Shanthi D, Sahoo G, Saravanan N. Designing an artificial neural network model for the predic-
tion of thrombo-embolic stroke. International Journals of Biometric and Bioinformatics (IJBB).
2009;3(1):1 0‑8.
Letham B, Rudin C, McCormick TH, Madigan D, editors. An Interpretable Stroke Prediction
Model using Rules and Bayesian Analysis. AAAI (Late-Breaking Developments); 2013.
Kennedy EH, Wiitala WL, Hayward RA, Sussman JB. Improved Cardiovascular Risk Prediction
Using Nonparametric Regression and Electronic Health Record Data. Med Care. 2013;51(3):
251‑8.
Olesen JB, Lip GYH, Kamper AL, Hommel K, Kober L, Lane DA, et al. Stroke and Bleeding in
Atrial Fibrillation with Chronic Kidney Disease. N Engl J Med. 2012;3 67(7):6 25‑35.
Hughes M, Lip G. Stroke and thromboembolism in atrial fibrillation: a systematic review of
stroke risk factors, risk stratification schema and cost effectiveness data. Thromb Haemost.
2008;99(2):2 95‑304.
Hart RG, Pearce LA, Albers GW, Connolly SJ, Friday GH, Gage BF, et al. Independent predic-
tors of stroke in patients with atrial fibrillation - A systematic review. Neurology. 2007;6 9(6):
546‑54.
123
Chapter 8
36. Marinigh R, Lane DA, Lip GY. Severe renal impairment and stroke prevention in atrial fibrilla-
37. tion: implications for thromboprophylaxis and bleeding risk. J Am Coll Cardiol. 2011;57(12):
38. 1339‑48.
39. Yu S, Farooq F, van Esbroeck A, Fung G, Anand V, Krishnapuram B. Predicting readmission risk
with institution-specific prediction models. Artif Intell Med. 2015;6 5(2):89‑96.
Moons KGM, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction
models: II. External validation, model updating, and impact assessment. Heart. 2012;9 8(9):
691‑8.
Bleeker SE, Moll HA, Steyerberg EW, Donders ART, Derksen-Lubsen G, Grobbee DE, et al.
External validation is necessary in, prediction research: A clinical example. J Clin Epidemiol.
2003;5 6(9):8 26‑32.
124
Chapter 9
General discussion
General discussion Chapter 9
This thesis consists of three parts. In part of A of this thesis we investigated guideline
adherence to the guideline for atrial fibrillation (AF) of the Dutch College of General
Practitioners (NHG) and appraised the guideline to identify barriers to adherence. We
found low adherence to the guideline resulting in both over and under treatment. Dur-
ing our appraisal of the guideline we identified several potential improvements for the
structure and contents of the guideline. Next we searched the literature and categorized
reasons for intentional non-adherence in a systematic review. Reasons for guideline
non-adherence were frequent and sometimes valid, pertaining to patient preferences
and contra-indications.
Part B presents the main study of this thesis: The Expert-AF trial. We implemented a clini-
cal decision support system (CDSS) to improve adherence to the Dutch NHG AF guideline
and evaluated its effectiveness in a cluster randomized trial. We could not show a sig-
nificant increase in guideline adherence as a result of the implementation of the system.
We did, however, identify valuable insights into barriers to usage in a mixed-methods
evaluation of the system. In Part C we take a first step towards better prediction of stroke
using complex prediction models, and show that these models hold promise for the
future of stroke prediction. Chapter 10 contains a summary of each individual chapter.
The Expert-AF trial
To gain insights into real-life performance of interventions, the use of pragmatic trials
should be propagated (1). We designed the Expert-AF trial, in which a CDSS for preven-
tion of stroke in patients with AF was tested. We randomized clusters of GPs and offered
real-time decision support in a non-obtrusive way. Furthermore, we planned to freeze
the GP EHR system when a GP deviated from the advice. This condition was meant to
force the GP to explain why he or she deviated from the guideline.
A strength of the Expert-AF project was its pragmatic nature: a multi-domain clinical
decision support system implemented in the busy daily practice of the GP. This made
it difficult for GPs to establish whether they were randomized in the intervention or the
control group. The fact that the Expert-AF project was embedded in a multi-domain CDSS
might also have worked against it; it greatly increased the number of alerts presented,
which in turn might have resulted in alert fatigue (3).
We were required to work with the vendors of the GP Electronic Health Record (EHR)
systems. This proved to be a hurdle, as we weren’t able to implement every feature that
we required in a way that we wished. We were, for instance, not allowed to freeze the
GP’s EHR system when requiring the GP to enter a reason for deviating from a recom-
mendation, a feature that has shown to increase CDSS effectiveness and could have
provided us valuable information about possible shortcomings of the guideline (2).
127
Chapter 9
Furthermore, discontinued support for the CDSS plugin meant that we could not include
more GPs in the trial, limiting representativeness of the sample. Despite these limita-
tions, this trial provided us with valuable insights that can guide future designers of
clinical decision support systems.
Implications for science and practice
On guidelines
This thesis has shown that guideline adherence in atrial fibrillation is poor (4). However,
there are valid reasons to deviate from guidelines and perfect adherence is not desirable,
considering the current quality of guidelines (5). Expert opinion and experience often
fill in the gaps that guidelines currently leave open, especially in older patients with
multimorbidity and a short life expectancy. Although some doctors refer to “cookbook
medicine” when discussing guidelines, that doesn’t do justice to the concept of EBM, in
which external evidence from properly conducted research should be used in conjunction
with clinical experience and expert opinion to provide the best care possible (6). That is
by no means cookbook medicine, as the physician and patient ultimately determine the
treatment strategy together. Therefore, real-world studies are necessary when studying
the proper use of guidelines. This is especially true in that many guidelines have several
limitations and cannot be easily applied to complex individual patients (7-9). As it stands,
we should accept that perfect guideline adherence is currently unattainable in complex
patients. Thus, expert opinion and patient preference should dictate prioritization in
cases where guidelines cannot.
On decision support
Healthcare is becoming more complex and demanding (10). Properly implemented
guidelines can improve clinical practice (11, 12). Many look to CDSS to improve guideline
adherence, but currently these systems cannot fulfill that promise. While some studies
have shown the effectiveness of CDSS, mainly in a single-disease setting, many other
studies, including our Expert-AF trial, have failed to do so (13, 14). The qualitative evalu-
ation of our trial revealed many important barriers that limited effectiveness of our inter-
vention. These barriers were also identified in similar trials and reviews studying CDSS
effectiveness (15-17). The most important barriers we have identified in this thesis relate
to lack of time (for the GP), lack of context, and lack of functionality. To remove these
barriers, much work is needed. First, we need to address the issue of lack of time and alert
fatigue. How can we effectively present the physicians with gaps in current patient care,
without overloading them with recommendations during their already busy day (18)?
Alert prioritization, customization, and different modes of presenting alerts can be a step
128
General discussion Chapter 9
in the right direction. These features will allow the user to prioritize specific disease
areas over time and tailor recommendations to personal preferences. This will reduce the
number of alerts shown at one time and optimize integration into a personal workflow.
To further increase CDSS effectiveness, patients should be involved in prioritization of
health issues. Personal Health Records (PHR) and pre-visit questionnaires can be used to
inform patients about issues that were detected by the CDDS and allow them to prioritize
these issues before meeting with their physician (online or offline) (19).
Realistic, controlled lab tests (including simulation patients), should be used to determine
how to implement these features. These lab tests should precede further real-world tri-
als, as implementation in real-world systems may restrict the freedom needed to perform
robust trials with reproducible outcomes, throughout different fields of medicine. Results
from these simulations can be used to expand on current theoretical frameworks, such as
the Two Stream Model (20). Once we have established and confirmed properties of effec-
tive CDSS in various simulated clinical settings, we should proceed with real-world trials,
based on a solid theoretical framework. Researchers and vendors can use the theoretical
knowledge of “what works”, to slowly transform existing systems into platforms that
allow implementation of all features that make CDSS effective. Collaboration between
international research groups and standardization of research protocols is preferable.
This can result in best practices that are more generalizable and allow for distribution of
tasks, resulting in better use of research funding.
On complex prediction models in clinical practice
Current risk stratification scores are often based on data from old trials, and individual
risk factors are subject to interpretation (21). High-quality data should never be wasted,
and ideally we would use all relevant data, old and new, global and local, to create the
best disease prediction models possible. By “local” data, we mean data that are collected
in one region or even in one hospital. These data can be used to fine-tune existing models
and can account for geographic differences and local definition of risk factors. Using these
locally optimized models can improve risk stratification accuracy (22). The increasing
interest in “personalized medicine” (inclusion of, e.g., biomarkers to determine treatment
for individual patients and adapting interventions to a complex personalized context)
increases the need for complex prediction models that require the computational power
of modern computers (23-25). These models allow for the complex non-linear relation-
ships between risk factors and (bio)markers that personalized medicine requires. Studies
have shown that stroke prediction in patients with AF can benefit from the incorporation
of biomarkers (26-28). Net benefit of anticoagulation in this patient group can be high,
but side effects of anticoagulants are severe, making this domain a good candidate for
early trials on computerized personal prediction models.
129
Chapter 9
Decision support systems can and should be used to implement personalized medicine
through these prediction models, as traditional methods of decision support, such as
paper guidelines, will not be able to effectively guide patient-specific medical decision
making. Decision support will become more effective using personalized predictions and
simultaneously involve patients in the decision-making process.
Future perspectives
Healthcare complexity and costs will continue to increase: Personalized medicine will
make caring for individual patients more complex, aging populations will require more
long-term care, expensive new treatments will continue to emerge, and administrative
requirements will likely increase. Computerized, locally optimized guidelines will likely
play an important role in dealing with these changes. Figure 1 provides an overview
of the concepts we mention hereafter, sections are referenced as numbers in square
brackets. Standardized guideline development and increased scientific knowledge may
result in higher-quality guidelines, and medical societies [1] may start publishing their
own “e-guidelines” [4], consisting of decision rules using crisp (unambiguous) definitions
of medical terms (29, 30). Standardized EHR data may allow for continuous [8], real-time
feedback loops with primary outcome data that can be used to identify gaps in (e-)guide-
! ! $ % " & ! " "
! "
"! % !!
!"% " "!
"!
# ! !!
!
" " "
# "!
""
$! " #" $!
! !"
Figure 1. Future perspectives
130 " &
"
General discussion Chapter 9
lines and improve prediction of disease and therapy effects (Figure 1). Increased sharing
of higher-quality research data [2] may enable learning algorithms for early disease
prediction [3] that can be implemented uniformly in EHRs (31). Early patient involvement
will allow decision support systems [5] to provide recommendations that are deemed
relevant for individual patients. The use of computerized complex prediction models
will allow for visualization of patient-specific net benefit (and risks) over time, enabling
informed shared decision making [6] and patient-tailored treatment [7]. This may result
in more patient-centered care, and (interested) patients should be able to prioritize what
gaps in their care plan require the most attention (32).Tight integration with EHR systems
may reduce administrative load and allow easier execution of recommendations (33, 34).
Natural language processing may allow for more free-flowing documentation of care by
healthcare providers, as opposed to the structured documentation they are currently
required to perform (35, 36).
Combining these improvements may finally result in fulfilling the long-standing expecta-
tions for healthcare IT: supporting and improving care.
Closing thoughts
The future as described above is certainly attainable, but requires determined action
from patients, researchers, healthcare providers, vendors, healthcare organizations, and
governments. Until we can accurately process free text notes, healthcare providers and
vendors will have to invest in improving the quality of EHR data. Currently, EHR data is
often unstructured and not usable for high-quality decision support (37, 38). Govern-
ments should require that health data is standardized and accessible. Healthcare provid-
ers need to accept that structured documentation of care is required to move forward.
EHR vendors should invest in standardizing data that are being collected in their systems
using detailed clinical models (39, 40).
In the Netherlands, a small number of EHR providers are active in the market due to
risk avoidance and procurement influenced by peers. This potentially limits willingness
to make the required lasting changes that are needed for effective decision support.
The future of healthcare depends on being able to support healthcare providers in daily
practice, and computers are one of the most promising areas in this regard. Only when
vendors truly cooperate with researchers, will we be able to implement effective deci-
sion support across healthcare.
131
Chapter 9
C oncl u sion
Are guidelines and computers the future of medical decision making? This thesis has
shown there is a long road ahead of us before these tools might replace expert opinion
and experience. But, seeing the growing (scientific) interest, need, and collaboration on
these topics, we can expect them to help us improve care and reduce costs in years to
come, by helping us practice evidence-based medicine.
132
General discussion
R eferences
1. Roland M, Torgerson DJ. Understanding controlled trials: What are pragmatic trials? BMJ. Chapter 9
2. 1998;316(7127):285.
Roshanov PS, Fernandes N, Wilczynski JM, Hemens BJ, You JJ, Handler SM, et al. Features
3. of effective computerised clinical decision support systems: meta-regression of 162 ran-
4. domised trials. BMJ. 2013;346:f 657.
Ash JS, Sittig DF, Campbell EM, Guappone KP, Dykstra RH. Some unintended consequences of
5. clinical decision support systems. AMIA Annu Symp Proc. 2007:26‑30.
6. Arts DL, Visscher S, Opstelten W, Korevaar JC, Abu-Hanna A, van Weert HCPM. Frequency
7. and Risk Factors for Under- and Over-Treatment in Stroke Prevention for Patients with Non-
Valvular Atrial Fibrillation in General Practice. PLoS One. 2013;8 (7).
8. Arts DL, Voncken AG, Medlock S, Abu-Hanna A, van Weert HC. Reasons for intentional guide-
9. line non-adherence: A systematic review. Int J Med Inform. 2016;8 9:5 5‑62.
Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine:
10. what it is and what it isn’t. BMJ : British Medical Journal. 1996;3 12(7023):71‑2.
11. Wyatt KD, Stuart LM, Brito JP, Carranza Leon B, Domecq JP, Prutsky GJ, et al. Out of Context:
Clinical Practice Guidelines and Patients With Multiple Chronic Conditions: A Systematic
12. Review. Med Care. 2014;52:S92-S100.
13. Mutasingwa DR, Ge H, Upshur REG. How applicable are clinical practice guidelines to elderly
patients with comorbidities? Can Fam Physician. 2011;5 7(7):E253-E62.
14. Feuerstein JD, Akbari M, Gifford AE, Cullen G, Leffler DA, Sheth SG, et al. Systematic review:
the quality of the scientific evidence and conflicts of interest in international inflammatory
15. bowel disease practice guidelines. Aliment Pharmacol Ther. 2013;37(10):9 37‑46.
Plsek PE, Greenhalgh T. The challenge of complexity in health care. BMJ : British Medical
16. Journal. 2001;323(7313):625‑8.
17. Grimshaw JM, Russell IT. Originally published as Volume 2, Issue 8883Effect of clinical guide-
lines on medical practice: a systematic review of rigorous evaluations. The Lancet. 1993;
342(8883):1317‑22.
Lugtenberg M, Burgers JS, Westert GP. Effects of evidence-based clinical practice guidelines
on quality of care: a systematic review. Qual Saf Health Care. 2009;1 8(5):385‑92.
Moja L, Kwag KH, Lytras T, Bertizzolo L, Brandt L, Pecoraro V, et al. Effectiveness of Computer-
ized Decision Support Systems Linked to Electronic Health Records: A Systematic Review and
Meta-Analysis. Am J Public Health. 2014;104(12):E 12-E22.
Roshanov PS, Misra S, Gerstein HC, Garg AX, Sebaldt RJ, Mackay JA, et al. Computerized clini-
cal decision support systems for chronic disease management: A decision-maker-researcher
partnership systematic review. Implement Sci. 2011;6(1):92.
Lugtenberg M, Weenink JW, van der Weijden T, Westert GP, Kool RB. Implementation of
multiple-domain covering computerized decision support systems in primary care: a focus
group study on perceived barriers. BMC Med Inform Decis Mak. 2015;15:8 2.
Cook DA, Enders F, Caraballo PJ, Nishimura RA, Lloyd FJ. An automated clinical alert system for
newly-diagnosed atrial fibrillation. PLoS One. 2015;10(4):e 0122153.
Sittig DF, Wright A, Osheroff JA, Middleton B, Teich JM, Ash JS, et al. Grand challenges in
clinical decision support. Journal of Biomedical Informatics. 2008;41(2):3 87‑92.
133
Chapter 9
18. Kesselheim AS, Cresswell K, Phansalkar S, Bates DW, Sheikh A. Clinical decision support
systems could be modified to reduce ‘alert fatigue’ while still minimizing the risk of litigation.
19. Health Aff (Millwood). 2011;3 0(12):2310‑7.
Wald JS, Businger A, Gandhi TK, Grant RW, Poon EG, Schnipper JL, et al. Implementing
20. practice-linked pre-visit electronic journals in primary care: patient and physician use and
satisfaction. J Am Med Inform Assoc. 2010;1 7(5):5 02‑6.
21. Medlock S, Wyatt JC, Patel VL, Shortliffe EH, Abu-Hanna A. Modeling information flows in
clinical decision support: key insights for enhancing system effectiveness. J Am Med Inform
22. Assoc. 2016.
23. Gage BF, Waterman AD, Shannon W, Boechler M, Rich MW, Radford MJ. Validation of clinical
classification schemes for predicting stroke: Results from the national registry of atrial fibril-
24. lation. JAMA. 2001;285(22):2864‑70.
25. Yu S, Farooq F, van Esbroeck A, Fung G, Anand V, Krishnapuram B. Predicting readmission risk
26. with institution-specific prediction models. Artif Intell Med. 2015;6 5(2):89‑96.
27. Zethelius B, Berglund L, Sundström J, Ingelsson E, Basu S, Larsson A, et al. Use of Multiple
Biomarkers to Improve the Prediction of Death from Cardiovascular Causes. N Engl J Med.
28. 2008;3 58(20):2107‑16.
29. Schilsky RL. Personalized medicine in oncology: the future is now. Nat Rev Drug Discov. 2010;
30. 9(5):363‑6.
31. Ginsburg GS, McCarthy JJ. Personalized medicine: revolutionizing drug discovery and patient
32. care. Trends Biotechnol. 2001;1 9(12):4 91‑6.
33. Hijazi Z, Oldgren J, Siegbahn A, Granger CB, Wallentin L. Biomarkers in atrial fibrillation: a
34. clinical review. Eur Heart J. 2013;3 4(20):1 475‑80.
Hijazi Z, Lindb_Ck J, Alexander J, Hanna M, Hylek E, Lopes R, Et Al. A NEW Biomarker Based
35. Risk Score For Predicting Stroke Or Systemic Embolism In Atrial Fibrillation: The Best Risk
36. Score. J Am Coll Cardiol. 2014;63(12_S).
Vílchez JA, Roldán V, Hernández-Romero D, Valdés M, Lip GYH, Marín F. Biomarkers in atrial
fibrillation: an overview. Int J Clin Pract. 2014;68(4):434‑43.
Peleg M. Computer-interpretable clinical guidelines: A methodological review. Journal of
Biomedical Informatics. 2013;4 6(4):7 44‑63.
Organization WH. WHO handbook for guideline development: World Health Organization;
2014.
Etzioni R, Urban N, Ramsey S, McIntosh M, Schwartz S, Reid B, et al. The case for early detec-
tion. Nat Rev Cancer. 2003;3(4):243‑52.
Charles C, Gafni A, Whelan T. Shared decision-making in the medical encounter: What does it
mean? (or it takes at least two to tango). Soc Sci Med. 1997;4 4(5):681‑92.
Lenz R, Reichert M. IT support for healthcare processes – premises, challenges, perspectives.
Data & Knowledge Engineering. 2007;61(1):3 9‑58.
Roshanov PS, Misra S, Gerstein HC, Garg AX, Sebaldt RJ, Mackay JA, et al. Computerized clini-
cal decision support systems for chronic disease management: a decision-maker-researcher
partnership systematic review. Implement Sci. 2011;6:92.
Meystre SM, Haug PJ. Comparing Natural Language Processing Tools to Extract Medical Prob-
lems from Narrative Text. AMIA Annual Symposium Proceedings. 2005;2 005:525‑9.
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of
electronic medical records to improve case detection: a systematic review. J Am Med Inform
Assoc. 2016.
134
General discussion
37. Chan KS, Fowles JB, Weiner JP. Review: Electronic Health Records and the Reliability and Va-
38. lidity of Quality Measures: A Review of the Literature. Med Care Res Rev. 2010;67(5):503‑27.
39. Parsons A, McCullough C, Wang J, Shih S. Validity of electronic health record-derived quality
40. measurement for performance monitoring. J Am Med Inform Assoc. 2012;19(4):6 04‑9.
Coyle JF, Mori AR, Huff SM. Standards for detailed clinical models as the basis for medical
data exchange and decision support. Int J Med Inform. 2003;6 9(2–3):1 57‑74.
Goossen W, Goossen-Baremans A, van der Zel M. Detailed Clinical Models: A Review. Healthc
Inform Res. 2010;1 6(4):201‑14.
Chapter 9
135
Chapter 10
English summary
English summary
Improving medical decision making
In this thesis we have investigated guideline non-adherence, the implementation of a Chapter 10
decision support system to improve adherence and its evaluation. Furthermore, we in-
vestigated usage of complex prediction models to improve stroke prediction in patients
with atrial fibrillation. Below a brief summary of each chapter is provided.
Chapter 1
In Chapter 1 we introduced the concepts of evidence based medicine (EBM), clinical
practice guidelines (CPG), clinical decision support systems and complex prediction
models. EBM is the current gold standard for clinical practice, as ideally diagnostic and
treatment strategies should be based on a synthesis of empirical studies from groups
of patients, as opposed to one physician’s opinion. It implies using the best available
evidence, in the form of CPGs, to make decisions in relation to patients’ circumstances
and preferences. However, guideline adherence is low throughout medicine: making
good use of guidelines requires high-quality evidence, a willing doctor and patient, the
right tools, circumstances, and time.
Can we improve guideline adherence by supporting physicians by means of clinical deci-
sion support systems? We attempted to answer this question in the field of atrial fibril-
lation (AF), specifically stroke prevention in AF. Guideline non-adherence is prevalent in
this field and the potential for improving patient outcomes is large. Atrial fibrillation is
the most common form of cardiac arrhythmia (abnormal heart rhythm). Its prevalence
is age-dependent and has increased over the last 30 years due to aging populations
and improved diagnostics. It now ranges from 0.7% (age 55-59) to 17.8% (age > 85).
Patients with AF are at high risk for stroke, with up to a five-fold risk compared to patients
without AF. Oral anticoagulation (OAC), such as warfarin and the new oral anticoagulants
(NOACs), can reduce risk of stroke by 60%. Therefore, most patients with AF need to be
treated with antithrombotic medication to reduce their risk of stroke. However, adher-
ence to stroke prevention guidelines in AF is poor. This non-adherence is mainly related
to underuse of OAC in patients with medium to high stroke risk.
Chapter 2
To determine if guideline non-adherence in AF stroke prevention was an issue in the
Netherlands, we investigated general practitioners’ adherence to the Dutch Guideline
for Atrial Fibrillation in Chapter 2. Our findings were in line with other national and inter-
national studies: adherence was low (43%) and some patient groups were over-treated
(31%) and other undertreated (26%). We suspected this low guideline adherence might
have been partially due to a lack of knowledge about the guideline. This provides an
opportunity to employ clinical decision support systems (CDSS) to increase adherence,
139
Chapter 10
as these systems can remind and inform the GP about the guideline through notifications
in their electronic health records system.
Chapter 3
To gain insight into possible guideline related barriers for adherence, we appraised the
Dutch guideline for atrial fibrillation in Chapter 3. We found that guideline adherence
might improve by involving more stakeholders (e.g. patients and physicians), standard-
izing the layout to allow for easier interpretation and clarifying ambiguous terms that
hamper the operationalization in electronic systems. The latter will allow for easier
implementation of the guidelines in computerized systems such as CDSS. These systems
in turn, can support physicians with implementing the guideline.
Chapter 4
We studied reasons why physicians intentionally deviate from guidelines in Chapter 4.
We found that, depending on the guideline, there are many valid reasons, as adjudicated
by peers, for guideline non-adherence. Reasons often were related to contra-indications
(that were not mentioned explicitly in the guideline) or patient preference. The latter
seems to agree with our findings in Chapter 3, where more stakeholder involvement
would be recommended. Involving more physicians would allow for more extensive
documentation of contra-indications, whilst involving patients might improve patient
adherence with the guideline.
Chapter 5
In Chapter 5 we described the protocol for our randomized controlled trial for the evalua-
tion of a clinical decision support system in general practice: Expert-AF. It was specifically
designed to increase adherence to the atrial fibrillation guideline of the Dutch national
College of General Practitioners and support GPs in providing antithrombotic treatment
to their patients. We created a plugin for the GP Electronic Health Records system (EHR)
together with two vendors. This plugin was displayed on the GP’s screen and could be
expanded to show recommendations for the current patient, i.e. the patient file that was
open on the screen (figure 1). Clicking on a recommendation opened a larger window
with additional details (figure 2). We expected to see an absolute increase of guideline
adherence by 10 to 20% in the intervention groups. The study employed a cluster ran-
domized design with three groups, one control group and two intervention groups. Both
intervention groups received the same recommendations, but the groups differed on the
fact that the second intervention required the GPs to enter a reason to deviate from the
recommendation, whereas the third group was not asked for these reasons.
140
English summary
Atrial fibrillation Clinical Rules
Stroke prevention in AF
Yearly medication revision
Risk of falling
Figure 1. The notification window in its expanded state, showing three notification items.
Stroke prevention in patients with atrial fibrillation
Recommendation
(517) Start oral anticoagulation.
Background
The patient is not receiving antithrombotic treatment for the prevention of stroke. The patient has a CHA2DS2-VAsc score
of more than 1: Antithrombotic treatment is recommended.
Reasons for scoring more than 1 point on CHA2DS2-VAsc:
Patient characteristics:
Age: 95 years old
Gender: Male
Risk factors:
Hypertension
Response to recommendation
I will follow the recommendation I will ignore the recommendation
Source
This recommendation is based on the Dutch General Practitioners guideline for Atrial Fibrillation.
Figure 2. The window after clicking on a notification item, containing background information, an Chapter 10
actionable recommendation and response buttons to allow the GP to indicate whether they accept
(1) or decline (2) the advice (3) close the window (no action).
141
Chapter 10
Chapter 6
The results of the Expert-AF project are described in Chapter 6. The trial ran for 240
days in general practices in Amsterdam South-East. Usage of the system varied greatly
between GPs. Some opened almost all of the recommendations, while others did not
open more than 3 throughout the trial. Overall, usage was low and declined over time.
We were unable to show a significant difference in the increase of guideline adherence
between groups: both groups improved by more than 5% compared to baseline adher-
ence. We suspect the simultaneous introduction of a new guideline for AF might have
played a part here, as well as the low usage of the system.
Chapter 7
In Chapter 7 we employed a mixed methods evaluation of the reasons for low usage
of the Expert-AF system described in the previous chapter. A survey and a focus group
revealed the main barriers to usage of our system. These were mainly related to lack
of time to follow recommendations, too many alert notifications and limitations of the
system’s functionality. Participants in our study acknowledged the potential that CDSSs
hold for the future of healthcare, but implementing these systems in daily practice for
multiple domains remains challenging. Alert prioritization, user customization, tight EHR
integration and strict selection of alerts might improve CDSS effectiveness.
Chapter 8
In Chapter 8, we attempted to improve stroke prediction for patients with atrial fibrilla-
tion using logistic regression and neural networks. Currently, the CHA2DS2-VASc is used
for stroke prediction in most international guidelines. However, its performance could be
improved to allow for more tailored stroke treatment of patients with AF. The prediction
models we developed demonstrated we could reduce overtreatment in low risk patients,
thus sparing patients from adverse events resulting from antithrombotic medication
use. We also concluded that due to the use of ambiguous terms in prediction scores,
the future of disease prediction in healthcare may lie in creating locally optimized and
validated models using local EHR data. It might be more effective to recreate and validate
a prediction model using the local definition of a risk factor, using EHR data, than to and
map it to a generic definition.
Chapter 9
In Chapter 9 we reviewed the Expert-AF trial and discussed implications of this thesis for
current and future clinical practice and research. A strength of the Expert-AF project was
its pragmatic nature: a multi-domain clinical decision support system implemented in
the busy daily practice of the GP. However, it was this pragmatic nature that required us
to work with the vendors of the GP Electronic Health Record (EHR) systems. This proved
142
English summary Chapter 10
to be a hurdle, as we weren’t able to implement every feature that we required in a
way that we wished. Furthermore, discontinued support for the CDSS plugin meant that
we could not include more GPs in the trial, limiting representativeness of the sample.
Despite these limitations, this trial provided us with valuable insights that can guide
future designers of clinical decision support systems.
Furthermore, we concluded that perfect guideline adherence is currently not attainable
and not desirable due to gaps in guidelines and the many valid reasons physicians
have for not adhering to these guidelines completely. Decision support systems can
help improve non-adherence that results from other factors than those mentioned, by
ensuring physicians are always aware when a guideline is applicable and supporting its
implementation. To fulfil this promise, we need more knowledge on “what works” before
we continue with more real world trials. We suggest several features that might improve
effectiveness of CDSS, but realistic, controlled lab tests (including simulation patients),
should be used to determine how to implement these features. This will allow for repro-
ducible results throughout different fields of medicine as opposed to the heterogeneous
nature of current real world trials on decision support. Researchers and vendors can use
the theoretical knowledge of “what works”, to slowly transform existing systems into
platforms that allow implementation of all features that make CDSS effective
Next, we discussed complex locally optimized prediction models and how these can
improve risk stratification accuracy and disease prediction. The increasing interest in
“personalized medicine” (inclusion of, e.g., biomarkers to determine treatment for indi-
vidual patients and adapting interventions to a complex personalized context) increases
the need for complex prediction models that require the computational power of modern
computers. We conclude that decision support systems can and should be used to imple-
ment personalized medicine through these prediction models, as traditional methods of
decision support, such as paper guidelines, will not be able to effectively guide patient-
specific medical decision making.
Future perspectives
Healthcare complexity and costs will continue to increase. Proper use of technology is
one of the few possible solutions for these growing issues. We discussed the possibilities
of properly applied technology in the final section of chapter 9. Briefly, standardized
healthcare data could be used to continuously improve e-guidelines, that can be directly
integrated into decision support systems. These data can also be used to develop early
disease prediction algorithms. Complex computerized prediction models can integrate
patient preference with current evidence and allow for informed shared decision making
using visualization of personalized risk over time. These improvements may finally result
in fulfilling the long-standing expectations for healthcare IT: supporting and improving
care.
143
Chapter 10
To reach this goal, all stakeholders should invest in standardizing healthcare data and
systems. Researchers and vendors need to collaborate closely to create systems that are
effective at supporting medical decision making in modern healthcare.
144
Nederlandse samenvatting
Nederlandse samenvatting
Medische besluitvorming verbeteren
In dit proefschrift hebben wij onderzoek gedaan naar richtlijn adherentie (het volgen Chapter 10
van de richtlijn), een klinisch beslissingsondersteunend systeem om richtlijn adherentie
te verbeteren en de evaluatie van dit systeem. Daarnaast hebben wij onderzocht of
beroertes beter konden worden voorspeld met behulp van een complex predictiemodel
bij patiënten met boezemfibrilleren. Hieronder geven we een korte samenvatting van elk
hoofdstuk.
Hoofdstuk 1
In hoofdstuk 1 worden de begrippen “evidence based medicine” (EBM), klinische richt-
lijnen, klinische beslissingsondersteunende systemen en complexe predictie modellen
geïntroduceerd. EBM is de gouden standaard voor hedendaagse zorg, omdat diagnostiek
en behandeling gebaseerd zouden moeten worden op gecombineerde onderzoeksre-
sultaten van vele patiënten, in tegenstelling tot de mening van één enkele behandelaar.
EBM heeft betrekking op het gebruiken van de meest robuuste wetenschappelijke
resultaten, in de vorm van klinische richtlijnen, om beslissingen te maken die aansluiten
op de omstandigheden en voorkeuren van de patiënt. Desondanks worden richtlijnen
mondjesmaat gevolgd in de geneeskunde: het goed gebruiken van richtlijnen vereist
hoge kwaliteit van wetenschappelijke bewijsvoering, een welwillende arts en patiënt, de
juiste middelen, omstandigheden en tijd.
Kan richtlijn adherentie worden verbeterd door hulpverleners te ondersteunen met
beslissingsondersteunende systemen? Deze vraag probeerden wij te beantwoorden in
het domein van boezemfibrilleren, om precies te zijn, bij het voorkomen van beroertes
bij patiënten met boezemfibrilleren. Richtlijnen worden in dit domein beperkt gevolgd
en de potentie om de gezondheid van de patiënt te verbeteren is daarom hoog. Boezem-
fibrilleren is de meest voorkomende vorm van cardiale aritmieën (afwijkend hartritme).
De prevalentie (het aantal mensen met een ziekte in de totale populatie) is afhankelijk
van de leeftijd en is over de laatste 30 jaar gestegen vanwege de stijgende levensver-
wachting en verbeterde diagnostiek. Het komt nu bij 0.7% van de 55-59 jarigen voor en
bij 17.8% van de 85 plussers. Patiënten met boezemfibrilleren hebben een hoger risico
op beroertes; tot 5 keer zo hoog. Bloedverdunners zoals vitamine K antagonisten en de
nieuwe orale anticoagulantia (NOACs) kunnen dit risico met 60% doen afnemen. Daarom
dienen de meeste patiënten met boezemfibrilleren behandeld te worden met bloedver-
dunners, om zo hun risico op een beroerte te beperken. Toch worden de richtlijnen met
deze adviezen beperkt gevolgd blijkt, uit internationale onderzoeken. Dit heeft vooral te
maken met het feit dat bloedverdunners te weinig worden voorgeschreven bij patiënten
met een gemiddeld tot hoog risico op een beroerte.
147
Chapter 10
Hoofdstuk 2
Om te bepalen of de richtlijnen voor patiënten met boezemfibrilleren ook in Nederland
slecht worden gevolgd, hebben wij gekeken in welke mate Nederlandse huisartsen
de Nederlandse Huisartsen Genootschap (NHG) richtlijn boezemfibrilleren volgen in
hoofdstuk 2. Onze bevindingen sloten aan bij andere nationale en internationale studies:
adherentie was laag (43%) en patiënten werden over-behandeld en onder-behandeld.
Onze hypothese was dat dit vooral werd veroorzaakt met een gebrek aan kennis van de
richtlijn. Dit biedt bij uitstek een kans om een klinisch beslissingsondersteunend sys-
teem in te zetten om richtlijn adherentie te verhogen. Deze systemen kunnen de huisarts
informeren over de richtlijn door middel van notificaties in hun digitale patiëntendossier.
Hoofdstuk 3
Om inzicht te krijgen in de richtlijn gerelateerde redenen die artsen hebben voor het
niet volgen van de NHG richtlijn boezemfibrilleren, hebben wij deze in hoofdstuk 4 sys-
tematisch beoordeeld. Er werd geconcludeerd dat richtlijn verbeterd zou kunnen worden
door meer betrokkenen zoals patiënten en artsen te betrekken, door de vormgeving
duidelijker te maken en door onduidelijke begrippen beter te definiëren. Deze laatste
verbetering maakt het ook eenvoudiger om een richtlijn te vertalen naar een compu-
tersysteem. Deze systemen kunnen dan op hun beurt de arts helpen met het volgen en
uitvoeren van de richtlijn.
Hoofdstuk 4
In hoofdstuk 4 werd onderzocht welke redenen artsen hebben om bewust af te wijken
van richtlijnen. Wij vonden dat er, afhankelijk van de richtlijn, veel valide redenen zijn
om de richtlijn niet te volgen, wanneer deze redenen worden beoordeeld door collega’s.
Redenen hadden vaak betrekking op contra-indicaties (die niet expliciet in de richtlijn
werden genoemd) of patiënt voorkeuren. Dat laatste lijkt aan te sluiten bij bevindingen
in hoofdstuk 3, waar meer bemoeienis van betrokkenen werd aangeraden. Het betrek-
ken van artsen zou kunnen helpen bij het juist vastleggen van contra-indicaties voor
een bepaalde behandeling, en het betrekken van patiënten zou de kans vergroten dat
patiënten het eens zijn met de voorgestelde behandeling.
Hoofdstuk 5
In hoofdstuk 5 wordt het protocol voor onze randomized controlled trial (RCT) beschre-
ven, waarin een beslissingsondersteunend systeem werd geëvalueerd bij Nederlandse
huisartsen: Expert-AF. Dit systeem werd specifiek ontworpen om adherentie aan de
Nederlandse boezemfibrilleren richtlijn te verhogen. Het ondersteunde huisartsen bij
het juist voorschrijven van bloedverdunners bij patiënten met boezemfibrilleren. Wij
ontwikkelden een plug-in voor het digitale huisartsen patiëntendossier samen met 2
148