The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by UCSAlibrary, 2015-06-29 22:49:12

Clinical Methodology (1)

ARTICLE 1:

A Patient-Centered Methodology
That Improves the Accuracy of
Prognostic Predictions in Cancer

A Patient-Centered Methodology That Improves the
Accuracy of Prognostic Predictions in Cancer

Mohammed Kashani-Sabet1*, Richard W. Sagebiel1, Heikki Joensuu2, James R. Miller III1

1 Center for Melanoma Research and Treatment, California Pacific Medical Center and Research Institute, San Francisco, California, United States of America, 2 Molecular
Cancer Biology Program, University of Helsinki, Helsinki, Finland

Abstract

Individualized approaches to prognosis are crucial to effective management of cancer patients. We developed a
methodology to assign individualized 5-year disease-specific death probabilities to 1,222 patients with melanoma and to
1,225 patients with breast cancer. For each cancer, three risk subgroups were identified by stratifying patients according to
initial stage, and prediction probabilities were generated based on the factors most closely related to 5-year disease-specific
death. Separate subgroup probabilities were merged to form a single composite index, and its predictive efficacy was
assessed by several measures, including the area (AUC) under its receiver operating characteristic (ROC) curve. The patient-
centered methodology achieved an AUC of 0.867 in the prediction of 5-year disease-specific death, compared with 0.787
using the AJCC staging classification alone. When applied to breast cancer patients, it achieved an AUC of 0.907, compared
with 0.802 using the AJCC staging classification alone. A prognostic algorithm produced from a randomly selected training
subsample of 800 melanoma patients preserved 92.5% of its prognostic efficacy (as measured by AUC) when the same
algorithm was applied to a validation subsample containing the remaining patients. Finally, the tailored prognostic
approach enhanced the identification of high-risk candidates for adjuvant therapy in melanoma. These results describe a
novel patient-centered prognostic methodology with improved predictive efficacy when compared with AJCC stage alone
in two distinct malignancies drawn from two separate populations.

Citation: Kashani-Sabet M, Sagebiel RW, Joensuu H, Miller JR III (2013) A Patient-Centered Methodology That Improves the Accuracy of Prognostic Predictions in
Cancer. PLoS ONE 8(2): e56435. doi:10.1371/journal.pone.0056435

Editor: Soheil S. Dadras, University of Connecticut Health Center, United States of America

Received August 16, 2012; Accepted January 10, 2013; Published February 27, 2013

Copyright: ß 2013 Kashani-Sabet et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by United States Public Health Service Grants CA114337 and CA122947 (to M.K.S.). The funders had no role in study design,
data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: Mohammed Kashani-Sabet owns stock in Melanoma Diagnostics, Inc., and James R. Miller III has ownership interest in MDMS, LLC. There
are further no patents, products in development or marketed products to declare. This does not alter the authors’ adherence to all of the PLOS ONE policies on
sharing data and materials, as detailed online in the guide for authors.

* E-mail: kashani@cpmcri.org

Introduction do help to determine these probabilities and elapsed times, but the
factors, themselves, are not the primary focus of the analyses.
The art of prognosis has a long history, as physicians have Patient-centered success measures must reflect the accuracy of
attempted to understand the clinical behavior of disease. Ancient individual probabilistic predictions rather than the relative potency
Egyptians estimated patient survival in order to arrive at an initial of the prognostic factors. In addition, patient-centered prognoses
conclusion of either ‘‘a patient I will treat’’ or ‘‘a patient not to be must identify and exploit the most relevant factors that can drive
treated’’ (the former with a chance to cure and the latter thought clinical decisions for an individual patient. The risk of progression
to be incurable). More recently, prognostic models have been or death may best be predicted by addressing factors beyond those
developed using computerized analyses of large databases of incorporated into the staging classification and by analyzing
patients with commonly recorded factors in order to predict available prognostic factors in specifically novel ways. In this
outcome. In such factor-centered analyses, results are usually manuscript, we developed a patient-centered prognostic method-
stated in terms of relative risks, odds ratios and P-values associated ology and applied it to established databases of melanoma and
with each factor. In the realm of cancer, staging classifications are breast cancer patients to determine its predictive accuracy, when
developed from these prognostic analyses and constitute the compared to predicting strictly on the basis of initial stage.
primary means of predicting patient outcomes and of making
treatment decisions. However, they are not routinely the products Materials and Methods
of patient-centered analyses. Assigning a 5-year survival probabil-
ity to a group of patients in a particular stage of a given Ethics Statement
malignancy is not the same as providing a separately tailored This prognostic analysis was approved by the institutional
prognostic probability for each individual patient.
review boards at the University of California, San Francisco, and
Patient-centered analyses take a different approach. Prognostic at the California Pacific Medical Center. The analysis was based
conclusions are stated in terms of an individual patient’s on a chart review of the majority of patients entered into the
probability of experiencing and/or the time required to experience datasets. Consequently, it was deemed minimal risk by these
some salient event, such as recurrence or death. Prognostic factors review committees, and informed consent was not required.

PLOS ONE | www.plosone.org 1 February 2013 | Volume 8 | Issue 2 | e56435

Tailored Prognostic Methodology

Written informed consent was obtained from the patients whose cohort, this resulted in a low-risk subgroup containing 503
tissues were tested as part of the analysis. These procedures were patients, an intermediate-risk subgroup containing 423 patients,
approved by the aforementioned institutional review boards. and a high-risk subgroup containing 296 patients. In the breast
cancer cohort, the low-risk subgroup encompassed 552 patients,
Study Populations the intermediate subgroup comprised 387 patients, and the high-
We accumulated a cohort of 1,222 United States patients, risk subgroup included 286 patients. Stratifying both samples into
these three subgroups served to maintain sufficient subgroup sizes
diagnosed with primary cutaneous melanoma between 1971 and to support stable statistical estimates, while preserving the rank
2006, whose demographic composition appears in Table S1. The order of 5-year survival rates by stage inherent in each cohort.
mean and median follow-up times were 7.93 years and 7.44 years,
respectively. Then, each prognostic factor was transformed, separately within
each risk subgroup, via the Scale Partitioning and Spacing
In addition, we had access to a previously described [1] dataset Algorithm (SPSA) into a corresponding Univariate Impact
of 1,225 breast cancer patients from Turku, Finland, with a mean Reflecting Index (UIRI), as described in Methods S1.
and median follow-up of 9.97 and 8.5 years, respectively. The
demographic composition of the breast cancer cohort appears in For each of the nine prognostic factor group and patient risk
Table S2. subgroup combinations, an individualized prognostic algorithm
was developed (described in Methods S1). The algorithm was
Analysis of Prognostic Factors based on the logistic regression analysis whose dependent variable
Melanoma. Fifteen prognostic factors were recorded at the was experience or non-experience of disease-specific death within
five years of diagnosis and whose independent variables were the
time of diagnosis of primary cutaneous melanoma and distributed UIRI values calculated for the risk factors and patient subgroup
into two prognostic factor groups. The first factor group comprised constituting that combination. A composite prognostic algorithm
six factors, including three histological factors incorporated into was then constructed by merging the logistic regression outputs of
the current AJCC staging classification (i.e., tumor thickness, the three patient risk subgroups, when all risk factors (i.e., their
ulceration, and mitotic rate) [2], and three clinical factors included UIRI values) were used as independent variables of the regression.
in analyses of the AJCC melanoma staging committee (i.e., age,
gender, and tumor site) [3,4]. The following nine histological The prognostic efficacy of the composite algorithm was assessed
factors were included in a second factor group: histological using three measures: the AUC generated by a receiver operating
subtype, Clark level, presence or absence of microsatellites, characteristic (ROC) analysis; its mean individual probabilistic
vascular involvement, regression, degree of tumor vascularity, prediction error; and its minimally achievable misclassification
level of tumor infiltrating lymphocytes, number of positive lymph rate (the latter two are defined in Methods S1). All reported P
nodes, and the within-subgroup initial AJCC stage. The potential values are two-sided.
prognostic significance of these factors was previously reviewed
[5]. The manner in which these additional prognostic factors were Results
defined, measured, and coded was described previously [6,7].
To develop a patient-centered approach, we analyzed a cohort
The prognostic impact of nine molecular factors (NCOA3, of 1,222 patients with primary cutaneous melanoma (Table S1)
SPP1, RGS1, WNT2, FN1, ARPC2, PHIP, POU5, and p65 and a separate cohort of 1,225 patients with breast cancer (Table
subunit of NF-kB), constituting a third factor group, was examined S2).
in tissues from 375 of the 1,222 melanoma patients using
immunohistochemical analysis. The individual role of several of A Tailored Prognostic Model for Melanoma
these markers in melanoma progression, including the methods Initially, we stratified our melanoma cohort, based primarily on
used for immunohistochemical staining and scoring, was previ-
ously described [8–11]. The prognostic significance of several of initial stage, into three patient subgroups. The low-risk subgroup
these molecular factors has been validated in other tissue sets or by had a 94.6% 5-year disease-specific survival (DSS), the interme-
other investigators [10,12–14]. diate-risk subgroup had a 75.4% 5-year DSS, and the high-risk
subgroup had a 49.3% 5-year DSS. The three subgroups had
Breast Cancer. We performed a similar analysis in our significantly different survival characteristics, when assessed by 5-
cohort of 1,225 breast cancer patients. The available prognostic yr DSS (Kruskal-Wallis test corrected for tied observations,
factors were divided into the following three groups: the first factor P,0.001) and by Kaplan-Meier analysis (Log-rank test,
group included patient age, anatomical location of the primary P,0.001, Fig. 1A).
tumor within the breast, size of the primary tumor along its longest
dimension (in millimeters), mitotic count, and ulceration of the For each prognostic factor group and patient subgroup we
primary tumor. The second factor group consisted of the following developed a separate prognostic algorithm that best predicted 5-
twelve factors: primary tumor type (ductal or lobular), tumor year disease-specific death. Separate algorithms were merged into
grade, necrosis, tubule formation, nuclear pleomorphism, inflam- a single, composite algorithm for each risk subgroup. Each
mation, estrogen receptor level (fmol./mg.), progesterone receptor composite algorithm produced a corresponding composite prog-
level (fmol./mg.), bilaterality, T scale value, N scale value, and M nostic index. Values of this index were individual probabilities of
scale value. The third factor group consisted of the following two 5-year disease-specific death assigned by the composite prognostic
factors: radiation therapy (yes or no), and type of adjuvant therapy, algorithm to each patient. Under an ROC analysis, the composite
if any. index generated an AUC of 0.867 (Fig. 2A). It was able to
correctly predict 84.0% of the 5-year disease-specific events,
Statistical Analysis resulting in a misclassification rate of 16.0%.
To develop a patient-centered prognostic algorithm for disease-
We compared the prognostic efficacy of the composite index
specific death within 5 years of diagnosis, both the 1,222 with several other prognostic methodologies. Initially, we assessed
melanoma and 1,225 breast cancer patients were first stratified the six routinely available prognostic factors by estimating
into three risk-defined subgroups, based on AJCC stage at individual probabilities of 5-year disease-specific death from a
diagnosis, if available, or T, N, and/or M stage. In the melanoma

PLOS ONE | www.plosone.org 2 February 2013 | Volume 8 | Issue 2 | e56435

Tailored Prognostic Methodology

Figure 1. Panel A. Kaplan-Meier analysis of DSS by prognostic We then constructed a separate weighted index designed to
subgroup in the melanoma cohort. Panel B. Kaplan-Meier analysis reflect the relative predictive potency of each prognostic factor in
of DSS by prognostic subgroup in the breast cancer cohort. each risk subgroup (Table S3). Thus, tumor thickness, mitotic rate,
doi:10.1371/journal.pone.0056435.g001 tumor vascularity, RGS1 expression level, and FN1 expression
level were uniformly potent predictors, with positive weights in all
multiple logistic regression of these factors. This produced an of the three subgroups.
AUC of 0.762, and a misclassification rate of 21.2% (Table 1).
A Tailored Prognostic Model for Breast Cancer
Next, we performed a dummy-variable logistic regression using We used the identical procedure to develop personalized
AJCC stage, alone, to assign 5-year disease-specific death
probabilities in our melanoma sample and determined its predictions of 5-year DSS for breast cancer patients, using data
prognostic efficacy. This analysis yielded an AUC of 0.787 from our cohort of 1,225 patients. We stratified the overall cohort
(Fig. 2A and Table 1) and reduced mean absolute probabilistic into three risk subgroups, based on the AJCC staging criteria for
prediction error (matched-pairs T-test, P,0.001, Table 1). breast cancer. The low-risk subgroup had a 88.6% 5-year DSS,
the intermediate-risk subgroup had a 60.2% 5-year DSS, and the
Then, we included the six prognostic factors and used initial high-risk subgroup had a 19.9% 5-year DSS. The three prognostic
AJCC stage to stratify the 1,222 patients into the three risk subgroups had significantly different survival characteristics, when
subgroups. The individual probability estimates generated by the assessed by 5-yr DSS (Kruskal-Wallis test corrected for tied
multiple logistic regression analyses for each subgroup were observations, P,0.001) and by Kaplan-Meier analysis (Log-rank
merged, resulting in an AUC of 0.823, and further reduced mean test, P,0.001, Fig. 1B).
absolute probabilistic error (matched-pairs T-test, P,0.001,
Table 1). Application of the patient-centered approach to breast cancer
patients generated an AUC of 0.907 (Fig. 2B). The final composite
We then incorporated the eighteen additional prognostic factors prognostic index developed for breast cancer was able to correctly
and formed the composite algorithm described above to generate predict 84.1% of the 5-year disease-specific deaths, resulting in a
the final prognostic index. Enhancing the model in these ways misclassification rate of 15.9%.
increased the AUC to 0.867 and further reduced the mean
absolute probabilistic error (matched-pairs T-test, P,0.001, The initial factor-centered analysis consisted of five prognostic
Fig. 2A and Table 1). factors that were as comparable as possible to the factors used in
the melanoma analysis (except for gender, as all patients were
women). Combining these factors via logistic regression and
developing an individually tailored probability of 5-year disease-
specific death resulted in an AUC of 0.743 (Table 2).

Next, we performed a dummy-variable logistic regression using
AJCC stage, alone, to assign 5-year disease-specific death
probabilities due to breast cancer and determined its prognostic
efficacy. This analysis yielded an AUC of 0.802 (Fig. 2B) and
reduced mean absolute probabilistic prediction error (matched-
pairs T-test, P,0.001, Table 2).

We then stratified the cohort using the three prognostic
subgroups with distinct DSS. The individual probability estimates
generated by the multiple logistic regression analyses for each
subgroup were merged, resulting in an AUC of 0.880 and a
further reduced mean absolute probabilistic error (matched-pairs
T-test, P,0.001, Table 2).

Finally, we incorporated fourteen additional prognostic factors
and formed the composite algorithm previously described to
generate the final prognostic index. This procedure increased the
AUC to 0.907 and further reduced the mean absolute probabilistic
error (matched-pairs T-test, P,0.001, Fig. 2B and Table 2).

A separate weighted index similarly identified prognostic factors
that were relatively potent predictors of 5-year disease-specific
death in each risk subgroup (Table S4). Thus, mitotic rate and
tumor grade were uniformly potent predictors, with positive
weights in all of the three subgroups.

A Split-Sample Validation of the Tailored Prognostic
Methodology in Melanoma

In order to ascertain the reliability of the procedure used to
construct our composite prognostic algorithm, we randomly split
our sample of melanoma patients into a training subsample of 800
and a validation subsample of the remaining patients. Patients in
the two subsamples were divided into three separate risk
subgroups, using exactly the same criteria used to stratify patients
in the total sample.

Next, we constructed a composite algorithm from the training
subsample, using the same procedure applied to the entire cohort.

PLOS ONE | www.plosone.org 3 February 2013 | Volume 8 | Issue 2 | e56435

Tailored Prognostic Methodology

Figure 2. Panel A. ROC plots of 5-year melanoma-specific death probabilities estimated by different logistic regression analyses.
Panel B. ROC plots of 5-year breast cancer-specific death probabilities estimated by different logistic regression analyses. In each panel, curve 1
represents the ROC plot using initial AJCC stage (unstratified), curve 2 the ROC plot stratified by AJCC stage, and curve 3 the ROC plot determined by
the composite weighted index.
doi:10.1371/journal.pone.0056435.g002

PLOS ONE | www.plosone.org 4 February 2013 | Volume 8 | Issue 2 | e56435

Tailored Prognostic Methodology

Table 1. Comparison of predictive accuracy achieved in melanoma through differing prognostic methodologies (N = 1,222).

Prognostic Methodology AUC Mean Reduction T value P value
N/A N/A N/A
Six traditional prognostic factors (unstratified logistic regression) 0.762 0.015 3.67 ,0.001
0.016 4.35 ,0.001
AJCC stage (dummy variable logistic regression) 0.787
0.033 9.62 ,0.001
Six traditional prognostic factors (logistic regression stratified by 0.823
AJCC stage)

Composite index (logistic regression, stratified by AJCC stage, 0.867
incorporating 18 additional factors)

Note: T values and accompanying 2-tail P values refer to reductions in mean absolute probabilistic error achieved relative to the prognostic methodology tabled in the
line immediately above, where each matched-pair T test is applied to the indicated 1,222 matched pairs of individual probabilistic prediction errors.
doi:10.1371/journal.pone.0056435.t001

This algorithm was quite similar to the algorithm produced for the by both criteria (group 2); and 129 patients identified only by our
total sample. The composite index generated by the composite methodology (group 3). Their survival was analyzed using Kaplan-
prognostic algorithm constructed from the training subsample was Meier analysis. Whereas the DSS of groups 2 and 3 was not
found to be superior to the corresponding probabilistic indices significantly different, the DSS of group 1 was significantly longer
derived from the six routinely available prognostic factors and compared with either group 2 or 3 by (Fig. 3, log-rank test,
from initial AJCC stage in both the training and validation P,0.001).
subsamples by ROC analysis (data not shown).
Discussion
Finally, we compared the prognostic efficacies achieved by the
composite algorithm, when applied to the training and validation In this manuscript, we describe a patient-centered methodology
subsamples. When applied to the 800 patients in the training to determine the prognosis associated with two common and
subsample, it achieved an AUC of 0.853. When applied to the potentially fatal cancers. We demonstrate that use of this approach
remaining patients in the validation subsample, the same results in significant improvements over the use of standard
composite algorithm achieved an AUC of 0.789. Thus, the prognostic methodologies, when predictive efficacy is measured
algorithm developed from the training subsample preserved 92.5% using AUC, probabilistic prediction errors, and misclassification
of its prognostic efficacy, as measured by AUC, when applied to rates in the prediction of 5-year death due to melanoma or breast
the validation subsample. cancer.

Utility of Tailored Prognostic Methodology for Identifying Use of our tailored prognostic approach resulted in AUC
Patients Subsets for Adjuvant Therapy increases in predicting both 5-year cancer-specific deaths. We also
demonstrate that use of this methodology results in the improved
We then aimed to assess whether the tailored methodology identification of high-risk candidates for adjuvant therapy in
could be utilized to identify specific prognostic patient subsets for melanoma.
systemic adjuvant therapy. High-dose interferon alpha (IFN) has
been the standard adjuvant therapy for melanoma for over a We achieved these improvements: (i) by first stratifying patients
decade. The traditional eligibility criteria for IFN [15–17] include into separate risk groups according to initial stage and by then
patients with thick primary melanoma (greater than 4.0 mm thick) executing analyses, separately, for each group; (ii) by pre-
or node-positive disease. Using these criteria, we identified 492 converting all prognostic factors into comparably calibrated
patients in our melanoma cohort eligible for IFN treatment. We indices (UIRIs); (iii) by handling missing observations in a manner
then identified an identical number of patients using our that does not require eliminating patients with sparse data from
methodology with the highest individual probabilities of 5-year the analysis; and (iv) by incorporating additional prognostic factors
disease-specific death (excluding stage IV patients). These two not routinely captured in staging schemes, using these same three
subsamples were combined, and subsequently partitioned into methodological devices.
three mutually exclusive subsets: 129 patients identified only by
standard IFN eligibility criteria (group 1); 363 patients identified In addition, our patient-centered approach is different from
traditional prognostic analyses in a number of other ways.
Traditional analyses typically focus on the relative prognostic

Table 2. Comparison of predictive accuracy achieved in breast cancer through differing prognostic methodologies (N = 1,225).

Prognostic Methodology AUC Mean Reduction T value P value
N/A N/A N/A
Five prognostic factors (unstratified logistic regression) 0.743 0.052 7.08 ,0.001
0.064 11.69 ,0.001
AJCC stage (dummy variable logistic regression) 0.802
0.037 9.77 ,0.001
Five prognostic factors (logistic regression stratified by AJCC 0.880
stage)

Composite index (logistic regression, stratified by AJCC stage, 0.907
incorporating 14 additional factors)

Note: T values and accompanying 2-tail P values refer to reductions in mean absolute probabilistic error achieved relative to the prognostic methodology tabled in the
line immediately above, where each matched-pair T test is applied to the indicated 1,225 matched pairs of individual probabilistic prediction errors.
doi:10.1371/journal.pone.0056435.t002

PLOS ONE | www.plosone.org 5 February 2013 | Volume 8 | Issue 2 | e56435

Tailored Prognostic Methodology

Figure 3. Kaplan-Meier analysis of DSS of high-risk patients identified by traditional eligibility for high-dose IFN only (curve 1),
those identified by both criteria (curve 2), and those identified by the tailored prognostic model only (curve 3).
doi:10.1371/journal.pone.0056435.g003

potency of various factors using multivariate Cox or logistic dataset of 14,760 patients were validated in an independent cohort
regression. Yet possessing independent statistical significance does of 10,974 patients. Significant procedural differences preclude
not guarantee that a factor will be prognostically useful for an comparisons with the patient-centered methodology described
individual patient [18]. In addition, staging schemes typically here. Importantly, no details were provided regarding the
provide a survival estimate over a defined time period (e.g., 5- or prognostic efficacy of their approach. However, in our cohort,
10-year survival) for all patients in a distinct substage of the cancer. the patient-centered approach was superior in prognostic accuracy
By contrast, our approach converts prognostic output into tailored when compared with the use of routinely available prognostic
individual probabilities of some salient event, such as 5-year factors, alone.
disease-specific death. This is the essence of the patient-centered
approach. It focuses on individual patient outcomes rather than on Based on the results presented here, our patient-centered
the comparative potency of specific prognostic factors. Further- methodology may be of broad-based utility in making individually
more, it generates a separate probability of 5-year disease-specific tailored prognoses for other cancers, as well as for other chronic
death for each individual patient. It represents a shift in focus from diseases with significant morbidity. We utilized this methodology
the specific prognostic factors present in certain subgroups of to improve prognostic accuracy and risk assessment for adjuvant
patients to individual patient outcomes. While the role played by therapy, but the same approach could also be used to identify
prognostic factors remains crucial, the factors now serve as the patients with differential response to therapy. This may be
basis on which individually tailored patient probabilities are especially relevant in the current debate to limit financial resources
calculated. Prognostic factors are no longer the focus of the for health care. Methodologies that improve prognostic accuracy
analysis in terms of which final conclusions are stated. might also be useful in identifying patients who would benefit from
receiving expensive and/or toxic therapies for chronic medical
Since prognostic research usually focuses on identifying factors conditions.
that provide statistically independent impact with a significant P
value, whether or not alternative analytical procedures can Our prognostic approach enables the determination of individ-
improve prognostic efficacy at the level of individual patient ualized prognoses, even when values for many factors are missing.
outcomes is infrequently discussed and rarely demonstrated. Here While it is helpful to have information for all prognostic factors,
we demonstrate the improvement in AUC achieved by our this is not practical for each individual patient. The patient-
patient-centered prognostic approach, when compared with the centered approach enables the determination of an individual’s
use of AJCC stage in two different malignancies. prognosis, based on whatever data are available. This is in contrast
to a typical multivariate logistic or Cox regression, in which
Developing tailored prognostic models is an important goal that complete information on all prognostic factors is typically required
has been examined by other groups. Cochran et al. [19] identified for a given patient to be included in the analysis. In addition, our
factors that emerged from logistic regression in a dataset of 1,042 methodology identifies factors of greatest prognostic significance to
melanoma patients, and developed individualized probabilistic risk distinct risk subgroups of patients and suggests which factors (that
estimates. Recently, the AJCC Melanoma Task Force developed may be missing) would be most useful to include in a patient’s
an electronic tool to predict survival of localized melanoma using pathology report (and prognostic assessment).
multivariate Cox regression analyses of five routinely available
prognostic factors [20]. The survival estimates developed in a Datasets for the two malignancies selected to illustrate our
patient-centered methodology were not population based. While

PLOS ONE | www.plosone.org 6 February 2013 | Volume 8 | Issue 2 | e56435

Tailored Prognostic Methodology

population-based datasets are preferable in factor-centered anal- realizable from adding new biomarkers, especially when AUC is
yses, it is more important in the patient-centered approach to inadequate in its ability to detect changes in absolute risk [21–24].
identify patients who are prognostically ‘‘similar’’ to a particular In the realm of cancer, these techniques have been used to assess
patient whose prognosis is being determined. This distinction is breast cancer risk [25]. In our analysis, both the use of ROC plots
another of the salient implications of moving from a strictly factor- and probabilistic prediction methods proved adequate to demon-
centered to a patient-centered approach. However, in order to strate the improved efficacy of our tailored prognostic methodol-
compile a comprehensive set of reference strata containing ogy. More importantly, our methodology goes beyond measuring
‘‘similar’’ patients, it will be necessary to replicate this method- predictive improvements. It offers procedures and devices by
ology in larger datasets that sample multiple strata of a general which such improvements can be realized.
population with a given malignancy.
In conclusion, we have developed a methodology to assign
An important limitation of our patient-centered methodology is individualized probabilities to a specified focal event (e.g. five-year
the possibility of statistical over-fitting. The same devices disease-specific death). This approach resulted in significant
incorporated in the methodology that contribute to its improved improvements in predictive accuracy in two different malignancies
prognostic accuracy also risk over-fitting the prognostic algorithm when compared with the use of routine prognostic methodologies,
to whatever empirical observations are used as training data. To and can be used to tailor discussions regarding prognosis and
compensate for this, built-in protections against over-fitting therapy for an individual patient.
include the admissibility criteria applied before introducing a
candidate prognostic factor into the analysis and the minimum Supporting Information
partition sizes established by the algorithm-generating procedure.
Methods S1 Additional methods not included in the
It is important to note that much of the improvement in main text.
predictive accuracy achieved by our methodology cannot reason- (DOC)
ably be attributed to over-fitting. A substantial portion was
realized simply by analyzing the modest number of routinely Table S1 Clinical and histologic characteristics of the
available prognostic and staging parameters in a different manner, melanoma sample (N = 1,222).
prior to incorporating additional factors within the analyses (rows (DOCX)
1 and 2 vs. row 3 in Tables 1 and 2, respectively).
Table S2 Clinical and histologic characteristics of the
We have departed from the traditional approach to validating breast cancer sample (N = 1,225).
individual prognostic markers in which separate training and (DOCX)
validation cohorts are used. Rather, we have developed a novel
methodology, specifically designed to make prognostic predictions Table S3 Relative weights in differentiating predictive
at the individual patient level. This methodology was then shown potency of prognostic factors included in the melanoma
to improve prognostic accuracy (when compared with initial stage) sample (N = 1,222).
in two data sets drawn from distinct populations and involving (DOCX)
different cancers. In addition, a split-sample reliability analysis of
the melanoma cohort revealed that a significant proportion Table S4 Relative weights in differentiating predictive
(greater than 90%) of the prognostic accuracy achieved was potency of prognostic factors included in the breast
retained in the validation subsample. Ultimately, however, our cancer sample (N = 1,225).
methodology would need to be applied to even larger data sets (DOCX)
(several thousands of patients) both to mitigate excessive over-
fitting and to produce a practically useful composite prognostic Author Contributions
algorithm that could be used to make individual patient
predictions. Conceived and designed the experiments: MKS RWS JRM. Performed the
experiments: MKS JRM. Analyzed the data: MKS RWS HJ JRM.
Our study differs in its focus from important recent studies Contributed reagents/materials/analysis tools: MKS RWS HJ JRM.
aimed at measuring the improvements in prognostic efficacy Wrote the paper: MKS RWS HJ JRM.

References 9. Rangel J, Torabian S, Shaikh L, Nosrati M, Baehner FL, et al. (2006) Prognostic
significance of NCOA3 overexpression in primary cutaneous melanoma. J Clin
1. Joensuu H, Toikkanen S (1995) Cured of Breast Cancer? J Clin Oncol 13: 62– Oncol 24: 4565–4569.
69.
10. Kashani-Sabet M, Venna S, Nosrati M, Nosrati M, Sucker A, et al. (2009) A
2. Balch CM, Gershenwald JE, Soong SJ, Thompson JF, Atkins MB, et al. (2009) multi-marker prognostic assay for primary cutaneous melanoma. Clin Cancer
Final version of 2009 AJCC melanoma staging and classification. J Clin Oncol Res 15: 6987–6992.
27: 6199–6206.
11. Kashani-Sabet M, Rangel J, Torabian S, Nosrati M, Simko J, et al. (2009) A
3. Balch CM, Soong SJ, Gershenwald JE, Thompson JF, Reintgen DS, et al. (2001) multi-marker assay to distinguish malignant melanomas from benign nevi. Proc
Prognostic factors analysis of 17,600 melanoma patients: validation of the Natl Acad Sci USA 106:6268–6272.
American Joint Committee on Cancer melanoma staging system. J Clin Oncol
19:3622–3634. 12. Alonso SR, Tracey L, Ortiz P, Pe´rez-Go´mez B, Palacios J, et al. (2007) A high-
throughput study in melanoma identifies epithelial-mesenchymal transition as a
4. Balch CM, Buzaid AC, Soong SJ, Atkins MB, Cascinelli N, et al. (2001) Final major determinant of melanoma metastasis. Cancer Res 67:3450–3460.
version of the American Joint Committee on Cancer staging system for
cutaneous melanoma. J Clin Oncol 2001; 19:3635–3648. 13. Conway C, Mitra A, Jewell R, Randerson-Moor J, Lobo S, et al. (2009) Gene
expression profiling of paraffin-embedded primary melanoma using the DASL
5. Zettersten E, Shaikh L, Ramirez R, Kashani-Sabet M (2003) Prognostic factors assay identifies increased osteopontin expression as predictive of reduced relapse-
in primary cutaneous melanoma. Surg Clin North Am 83:61–75. free survival. Clin Cancer Res 15:6939–6946.

6. Kashani-Sabet M, Sagebiel RW, Ferreira CMM, Nosrati M, Miller JR III (2002) 14. Gould Rothberg BE, Berger AJ, Molinaro AM, Subtil A, Krauthammer MO, et
Tumor vascularity in the prognostic assessment of primary cutaneous al. (2009). Melanoma prognostic model using tissue microarrays and genetic
melanoma. J Clin Oncol 20:1826–1831. algorithms. J Clin Oncol 27:5772–5780.

7. Kashani-Sabet M, Sagebiel RW, Ferreira CMM, Nosrati M, Miller JR III (2001) 15. Kirkwood JM, Strawderman MH, Ernstoff MS, Smith TJ, Borden EC, et al.
Vascular involvement in the prognosis of primary cutaneous melanoma. Arch (1996) Interferon a-2b adjuvant therapy of high-risk resected cutaneous
Dermatol 137:1169–1173. melanoma: the Eastern Cooperative Oncology Group Trial EST 1684. J Clin
Oncol 14:7–17.
8. Kashani-Sabet M, Shaikh L, Sagebiel RW, Nosrati M, Ferreira CM, et al.
(2004) NF-kB in the vascular progression of melanoma. J Clin Oncol 22:617–
623.

PLOS ONE | www.plosone.org 7 February 2013 | Volume 8 | Issue 2 | e56435

Tailored Prognostic Methodology

16. Kirkwood JM, Ibrahim JG, Sondak VK, Richards J, Flaherty LE, et al. (2000) 21. Cook NR, Buring JE, Ridker PM (2006) The effect of including C-reactive
High- and low-dose interferon a-2b in high-risk melanoma: first analysis of protein in cardiovascular risk prediction models for women. Ann Intern Med
Intergroup Trial E1690/S9111/C9190. J Clin Oncol 18:2444–2458. 145:21–29.

17. Kirkwood JM, Ibrahim JG, Sosman JA, Sondak VK, Agarwala SS, et al. (2001) 22. Pencina MJ, D’Agostino RBS, D’Agostino RBJ,Vasan RS (2008) Evaluating the
High-dose interferon a-2b significantly prolongs relapse-free and overall survival added predictive ability of a new biomarker: from area under the ROC curve to
compared with the GM2-KLH/QS21 vaccine in patients with resected stage reclassification and beyond. Stat Med 27:157–172.
IIB-III melanoma: results of Intergroup trial E1694/S9512/C509801. J Clin
Oncol 19: 2370–2380. 23. Pepe MS, Feng Z, Gu JW (2008) Comments on ‘‘Evaluating the added
predictive ability of a new biomarker: from area under the ROC curve to
18. Ware JH (2006) The limitation of risk factors as prognostic tools. New Engl J Med reclassification and beyond.’’ Stat Med 27:173–181.
355: 2615–2617.
24. Cook NR, Ridker PM (2009) The use and magnitude of reclassification
19. Cochran AJ, Elashoff D, Morton DL, Elashoff R (2000) Individualized prognosis measures for individual predictors of global cardiovascular risk. Ann Intern Med
for melanoma patients. Hum Pathol 31: 327–331. 150: 795–802.

20. Soong S-j, Ding S, Coit D, Balch CM, Gershenwald JE, et al. (2010) Predicting 25. Tice JA, Cummings SR, Smith-Bindman R, Ichikawa L, Barlow WE, et al.
survival outcome of localized melanoma: an electronic prediction tool based on (2008) Using clinical factors and mammographic breast density to estimate
the AJCC melanoma database. Ann Surg Oncol 17: 2006–2014. breast cancer risk: development and validation of a new predictive model. Ann
Intern Med 148: 337–347.

PLOS ONE | www.plosone.org 8 February 2013 | Volume 8 | Issue 2 | e56435

ARTICLE 2:

A Viral Discovery Methodology for
Clinical Biopsy Samples Utilising
Massively Parallel Next Generation
Sequencing

A Viral Discovery Methodology for Clinical Biopsy
Samples Utilising Massively Parallel Next Generation
Sequencing

Gordon M. Daly1, Nick Bexfield1, Judith Heaney1, Sam Stubbs1, Antonia P. Mayer1, Anne Palser2, Paul
Kellam2, Nizar Drou3, Mario Caccamo3, Laurence Tiley1, Graeme J. M. Alexander4, William Bernal5,
Jonathan L. Heeney1*

1 Department of Veterinary Medicine, The University of Cambridge, Cambridge, United Kingdom, 2 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus,
Hinxton, Cambridge, United Kingdom, 3 The Genome Analysis Centre, Norwich Research Park, Norwich, United Kingdom, 4 Department of Hepatology, Addenbrookes
Hospital, Cambridge University Hospitals National Health Services Foundation Trust, Cambridge, United Kingdom, 5 Institute of Liver Studies, King’s College London
School of Medicine, King’s College Hospital, Denmark Hill, London, United Kingdom

Abstract

Here we describe a virus discovery protocol for a range of different virus genera, that can be applied to biopsy-sized tissue
samples. Our viral enrichment procedure, validated using canine and human liver samples, significantly improves viral read
copy number and increases the length of viral contigs that can be generated by de novo assembly. This in turn enables the
Illumina next generation sequencing (NGS) platform to be used as an effective tool for viral discovery from tissue samples.

Citation: Daly GM, Bexfield N, Heaney J, Stubbs S, Mayer AP, et al. (2011) A Viral Discovery Methodology for Clinical Biopsy Samples Utilising Massively Parallel
Next Generation Sequencing. PLoS ONE 6(12): e28879. doi:10.1371/journal.pone.0028879
Editor: Patricia V. Aguilar, University of Texas Medical Branch, United States of America
Received August 26, 2011; Accepted November 16, 2011; Published December 21, 2011
Copyright: ß 2011 Daly et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: GD, NB, SS and AM are funded by the Wellcome Trust. AP and PK are funded by the Wellcome Trust and part of this work is supported by funding from
the European Community’s Seventh Framework Programme (FP7/2007–2013) under the project EMPERIE, EC grant agreement number 223498’. The funders had
no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: jlh66@cam.ac.uk

Introduction Results

A variety of methods for identifying unknown viruses have been Detection of HCV reads from HCV infected human biopsy
reported, such as: degenerate primer PCR/amplification [1], viral samples using the Illumina platform
microarrays [2–4] and conventional sequencing. Low abundance
of viral sequences relative to total host nucleic acids usually We analysed frozen Hepatitis C virus (HCV) infected Tru-Cut
requires the use of viral enrichment and concentration procedures. liver biopsies without viral enrichment to ascertain the limitations
These include: filtration, ultracentrifugation and nuclease treat- of detection of virus in a small liver biopsy using the Roche 454
ment followed by random priming and amplification using the and the Illumina NGS platforms. Total RNA was extracted from
sequence-independent single primer amplification (SISPA) method six biopsy samples (RNA integrity (RIN) values between 6 and 8)
or variations thereof [1,5–10] and/or with the Viral Discovery and HCV infection was confirmed by PCR. 0.5 mg of RNA from
cDNA-AFLP (VIDISCA) method [11–15]. These approaches each sample were pooled and underwent SISPA (detailed in
have generally been limited to liquid based samples (body fluids, materials and methods). The minimally amplified pooled material
eluted swabs, culture supernatants and environmental samples). was then mass sequenced on a single Illumina NGS lane. Mapping
NGS has shown great potential for novel virus discovery [16–19]. of the short reads to HCV reference genomes from the Los Alamos
The use of NGS alone can be sufficient if the viral nucleic acids are HCV database confirmed HCV infection with sub-type 3a.
in sufficient abundance relative to host nucleic acids. However, as However, the mapping clearly showed a paucity of viral
we confirm here, clinical biopsy samples can present a problem genome-coverage (12.5%) with a total of 32 HCV reads out of
where even the depth of sequencing provided by NGS may be ,8 million (Fig. 1). tBLASTx analysis of the complete dataset of
insufficient to generate useful viral sequence contigs by de novo viral fragments against all HCV genomes in the EMBL database
assembly. did not identify any further HCV reads. The lack of overlapping
viral sequence reads prevented de novo assembly of viral contigs,
We have now established a broadly applicable approach for making the use of the Illumina NGS platform and the SISPA
viral nucleic acid enrichment from small biopsy sized clinical liver protocol alone a potentially ineffective technique for novel virus
tissue (e.g Tru-Cut), which combined with the Illumina NGS discovery. The same process using the Roche 454 platform failed
platform could provide an effective tool for viral discovery. to identify any HCV sequences.

PLoS ONE | www.plosone.org 1 December 2011 | Volume 6 | Issue 12 | e28879

Viral Discovery Using Clinical Biopsy Tissue

Figure 1. NGS of HCV biopsy RNA. Extracted and pooled (66) HCV infected biopsy RNA was reverse transcribed and amplified prior to Illumina
NGS and mapped to the HCV reference 3a.NZL. NC_0009824 (Los Alamos HCV database).
doi:10.1371/journal.pone.0028879.g001

Liver Cytosol/Pellet fractionation for viral enrichment parvovirus 2 (CPV2), canine adenovirus 1 (CAV1), human
To improve on published tissue extraction methods for viral hepatitis B virus (HBV) and hepatitis C virus (HCV). Liver
samples were dissected into 2 mm3 pieces (equivalent to
discovery [20] we compared different homogenization procedures. approximately half of a Tru-Cut needle biopsy sample). Estimates
We found the optimal procedure was the use of a TH Omni- of the viral sequence copy number per liver sample (2 mm3) were
homogeniser/hard-tissue probe combination (Omni-International) made by qPCR (Fig. 3) to confirm the presence of the viral nucleic
using a short pulse (15 seconds) in cold PBS with a dry ice freeze acids within the liver tissue and to estimate the maximum possible
thaw cycle, repeated three times, followed by RNAse and DNAse starting viral copy number of the sample with the subsequent
digestion of the host nucleic acids as illustrated in figure 2a and processed fractions.
detailed in materials and methods. We found that pestle grinding
with Alumina did not break the liver cells as reliably as our freeze/ Next, our enrichment procedure and the total RNA and DNA
thaw-mechanical approach (determined microscopically). extractions were carried out on separate 2 mm3 dissected
fragments of each liver sample (materials and methods). Following
We compared the two methods with dissected (2 mm3) biopsy nuclease treatment with turbo-DNAse and RNAse1 (without
sized fragments of an HCV infected liver sample with and without Benzonase) the SISPA based protocol was used to amplify a
the presence of Benzonase as a nuclease to remove host nucleic minimum amount of material needed for the Illumina NGS
acids (Fig. 2b). Following nuclease treatment with both Turbo- platform (1–3 mg) after cleaning and size fractionation to remove
DNAse and RNAse1 (+/2 Benzonase) to remove host nucleic sub-200 bp fragments. The level of enrichment achieved, per
acids, we extracted viral genomic material and residual host nanogram of amplified material, for all four different virus infected
nucleic acids using the Trizol RNA extraction method with liver samples was quantified by qPCR (Fig. 4). The DNA virus
glycogen as a carrier. A modified SISPA protocol (materials and enrichment achieved a 104 increase in CPV viral nucleic acids
methods) was used to amplify a minimum amount of material relative to the highest non-enriched CPV sample extract (total
needed for the Illumina NGS platform (1–3 mg) after cleaning and RNA) and a similar enrichment level for CAV relative to the
size fractionation to remove sub-200 bp fragments. By removing highest non-enriched CAV sample extract (total DNA). The RNA
the sub-200 bp fragments we were effectively comparing NGS virus enrichment process achieved a 106 increase in both HBV
readable nucleic acids whilst removing fragments likely to and HCV nucleic acids relative to the highest non-enriched
represent residual host material that had survived exposure to samples (total RNA).
the nuclease treatment. We determined that the improved cell
breakage (determined microscopically) with the probe homogeni- Illumina NGS viral read and de novo assembly
zation and freeze thaw cycles, correlated with an improved
concentration of HCV nucleic acids using identical amounts of comparison
post-amplification dsDNA as an input to an HCV qPCR assay. Based on the qPCR results, ten samples were selected for mass
Furthermore, with both methods the addition of Benzonase
together with RNAse1 and Turbo DNAse reduced the HCV copy sequencing on the Illumina NGS platform with one sample library
number 2–6 fold. per lane, indicated by an asterisk in Fig. 4. Mapping sequence
reads (Fig. 5) revealed complete or near complete (.95%)
To validate our extraction/enrichment approach (high-speed coverage of the reference genomes for the virally enriched
tissue homogenization and freeze/thaw cycles) with Illumina samples. Viral nucleic acid point-coverage was orders of
NGS, we used four pieces of liver tissue infected with canine magnitude (46101–8.56104) greater than total RNA (non-

PLoS ONE | www.plosone.org 2 December 2011 | Volume 6 | Issue 12 | e28879

Viral Discovery Using Clinical Biopsy Tissue

Figure 2. Enrichment methodologies. a: Illustration of the key steps in isolating and enriching viral nucleic acids relative to host nucleic acids
prior to sequencing by NGS. Liver tissue cells were broken in an isotonic buffer supplemented with BSA using a hard tissue probe and an Omni-
Homogeniser with a dry-ice freeze/thaw step (repeated 36). To ensure that cells membranes were broken the lysate was checked microscopically. b:
Comparison of Alumina/pestle-grinding against hard-tissue probe homogenisation with freeze/thaw cycles (+/2 the addition of Benzonase). (2 mm3
fragments were dissected from an HCV infected liver sample. Duplicate tissue samples were used for each protocol. HCV nucleic acids were measured
in triplicate by dual labelled qPCR assay with NIBSC standard.
doi:10.1371/journal.pone.0028879.g002

enriched) samples processed in the same manner. The increase in in the size and viral reference coverage of the contigs that could be
viral sequences in enriched samples as a percentage of the total generated by de novo assembly, we utilised CLC Genomics v4
NGS read set is shown in Fig. 6. Mass-sequencing of total DNA (Katrinebjerg, 8200 Aarhus N, Denmark) and ABySS [21] de novo
from CAV and CPV infected samples was performed since qPCR assemblers in tandem and with varying stringencies. The data
data indicated these samples had a higher viral genome copy from the unenriched CAV and HCV infected total RNA samples
number than the RNA transcriptome samples (Fig. 4). This indicated that RNA extraction and SISPA processing alone was
correlated with the NGS read mapping analysis (Figs. 5 & 6) for insufficient to generate contigs with significant viral reference
both the CPV and CAV total DNA samples. coverage. The HBV infected RNA fraction produced two contigs
covering one third of the HBV genome. In contrast, the virally
In order to ascertain whether the increased viral read copy enriched samples produced contigs covering between 73–100% of
number and reference coverage yielded significant improvement

PLoS ONE | www.plosone.org 3 December 2011 | Volume 6 | Issue 12 | e28879

Viral Discovery Using Clinical Biopsy Tissue

Figure 3. Quantitative PCR (viral genome copy number in liver key in facilitating the discovery and characterization of novel
samples). A SYBR-green assay and plasmid standard was used for the viruses by nucleotide and amino acid similarity to viral database
CAV and CPV samples, whilst the HCV and HBV assays used dual sequences or by predicted structural domain conservation.
labelled probed with NIBSC standards. Total RNA extracted for HCV and Concurrent NGS with the Roche 454 platform failed to sequence
Total DNA extracted for HBV, CPV and CAV. The Genomic strand copy any viral nucleic acids that we could detect and for both platforms
number for HCV was estimated by performing the RT step in the the only assembled contigs over 200 bp corresponded to host
presence of the reverse primer only. sequences.
doi:10.1371/journal.pone.0028879.g003
We have shown that our enrichment methodology together with
their reference genome (Fig. 7), with the HBV enriched the Illumina NGS platform works for a range of viruses using
preparation producing a single contig spanning 100% of the sample sizes equivalent to a small needle biopsy. Cells in the
genome. The canine parvovirus was clearly transcriptionally active sample are disrupted and the cytosol isolated with encapsidated
and the contig coverage from the total RNA (non-enriched) virus intact. The resulting cytosolic fraction can be separated from
fraction was comparable to the virally enriched sample. For the the cell debris and nuclei leaving a low viscosity solution that can
CAV and CPV total DNA extracts, the resulting assembled be nuclease treated directly or readily size fractionated by filtration
contigs (1–2/sample) covered the reference genomes between methods. qPCR analysis of equivalent samples processed for DNA,
83.6% and 97% respectively, lower than, but comparable to the RNA and with the viral enrichment method clearly demonstrate
enriched viral DNA samples. the extent of enrichment (Fig. 4) as did mapping of the subsequent
NGS reads to the respective viral genomes to assess coverage
Discussion (Fig. 5) and target viral sequence as a percentage of the whole
NGS read set (Fig. 6). Subsequent de novo assembly of the NGS
Identifying unknown viruses in clinical samples is technically reads using different assembly algorithms (fig. 7) demonstrates the
challenging and this is especially true for solid tissues when advantage of the enrichment procedure over whole RNA
compared with using less complex samples such as body fluids. transcriptome sequencing. Enriched sample de novo assembled
Additionally, whilst biopsy sampling is routine in the clinic it can contigs using two different algorithms resulted in 73–100%
be limited for research purposes by ethical and safety consider- coverage of the viral reference genome with one to three contigs.
ations. Often, the amount of material available for virological For the un-enriched RNA samples, only the CPV and the HBV
analysis may be limited to a fraction of a small needle biopsy left- sample contigs included contigs to the reference virus (97% and
over after diagnostic histopathological analysis. Detection of virus 32.5% respectively). Interestingly, it is clear that viral DNA
is not always possible in serum and plasma even with nucleic acid genomes, from the CAV and CPV samples we used for validation,
tests whilst the ‘occult’ virus may still be detectable in the viral could be reconstituted from total extracted DNA using NGS
reservoir tissue [22–23]. alone. This is of particular interest if one considers that an
unknown (DNA) virus may be latent, in a non-encapsidated, non-
Mass sequencing technologies provide a new avenue for viral replicative and transcriptionally repressed form at a given time
discovery as highlighted by Feng and Palacios [17–18] using the point or cell type. This is of course a hypothetical, but in such a
Roche 454 NGS platform. The Illumina NGS platform also has case, total RNA and viral enrichment analysis would be unlikely to
scope for novel viral discovery, particularly with recent technical work. Furthermore, with this methodology, the DNA can be
developments yielding improved sequence length and quality now concurrently extracted from the nuclei containing pellet after
complementing its exceptional sequencing depth. However, our homogenization and prior to nuclease treatment, thus a three way
preliminary work (Fig. 1) demonstrated that virally infected clinical strategy might be considered when attempting to find an unknown
biopsy samples exist where extracted total nucleic acids processed virus in a solid tissue biopsy. Our technique readily allows total
for NGS with the Illumina platform contain viral sequence at such RNA, virus enrichable cytosolic fraction and total DNA to be
a low number relative to host nucleic acids that viral reference extracted with minimal loss of sample and minimal sample size.
coverage was sparse with no overlapping reads, precluding the
possibility of assembling larger contiguous viral sequences. This is In summary, our method can enrich a range of virus types that
can be sequenced using the Illumina NGS platform. Furthermore,
viral genomes can be largely reconstituted by currently available de
novo assembly algorithms. This approach is robust, enabling the
use of NGS for the detection and identification of novel viral
pathogens from small diagnostic biopsy samples without the
requirement to culture or isolate the virus first. We show this
technique works for liver tissue, which can be a difficult tissue for
extracting high quality RNA from and the HCV sample we used
for enrichment validation and NGS was approximately 8 mm3 of
very hard fibrotic tissue that had a low viral genome copy number
(Fig. 3) suggesting that the technique could probably be applied
successfully to most tissue types including other fibrotic diseased
tissues or joint collagen for example. Importantly, this methodol-
ogy is rapid and results in very little loss of sample.

Methods

Liver samples and processing
Human liver samples were acquired from the Institute of Liver

Studies, Kings College Hospital, London, University of London,
UK. Samples were obtained with patient written consent. This

PLoS ONE | www.plosone.org 4 December 2011 | Volume 6 | Issue 12 | e28879

Viral Discovery Using Clinical Biopsy Tissue

Figure 4. Quantitative PCR (viral copy number estimations of prepared samples for NGS). For each viral liver sample, 2 mm3 of liver was
used separately for total RNA extraction, total DNA extraction, DNA viral enrichment and RNA viral enrichment. qPCR assays were performed using
20 ng of each sample (estimated by Pico-green assay) as an input in triplicate. The samples subsequently mass sequenced are indicated with an
asterisk.
doi:10.1371/journal.pone.0028879.g004

work forms part of a broader project with ethical approval Mini Kit (Qiagen) according to the manufacturer’s instructions
provided by the UK National Research Ethics Service, Cambridge with on column DNAse treatment. Total DNA extraction was
3 Research Ethics Committee, Cambridge CB21 5XB (REC performed on liver tissue (2 mm3 or from nuclear pellet) using the
reference numbers 09/HO306/52, 09/HO306/60) and Kings QIAamp DNA Mini Kit (Qiagen) and according to the
College Hospital Research Ethics Committee, London SE5 9RS manufacturer’s instructions. Liver tissue was homogenized in the
(REC reference number 04/Q0703/27). relevant denaturant, according to the kit manufacturers
instructions, using a micropestle (Eppendorf).
Canine liver samples were acquired from the Blue Cross Animal
Hospital (CPV) and the Royal Veterinary College (CAV) with Enrichment. Liver tissue (2 mm3) was immersed in 250 ml
informed and written owner consent acquired by both centers as per ice cold 0.7% bovine albumen supplemented buffered saline
the guidelines of the Royal College of Veterinary Surgeons (RCVS), pH.7.2 and homogenised for 15 seconds on ice using an Omni
UK. The CAV and CPV liver samples represented legacy material TH - Tissue Homogenizer and disposable (7 mm6110 mm)
for which no ethical committee approval was required in the UK. ‘Omni Tip’ hard-tissue probe (Omni-International). The
resulting homogenate was placed on dry ice for approximately
Individual samples were screened +ve for the following viruses two minutes until frozen, and thawed quickly before returning to
(Table. 1). After collection, liver tissue was stored at 280uC ice. Homogenization followed by freezing and thawing was
pending further analysis. Liver samples were divided into sections repeated a further two times to disrupt the cells (.90%) while
measuring 2 mm3 and weighing ,15 mg comparable to approx- leaving the nucleus intact (.90%) determined microscopically.
imately half a Tru-Cut needle biopsy. Size rather than weight was Samples were then spun at 6006g for 10 minutes at 4uC to pellet
used for handling reasons in order to keep the samples as cold as the nuclei and large cellular aggregate. Non-particle protected
possible to minimize nucleic acid degradation. viral DNA and RNA was removed from the supernatant by
digestion with 30 U of turbo DNase [Ambion] and 25 U of RNase
No Enrichment. Total RNA extraction was performed on
liver tissue (Liver biopsy or 2 mm3) using the RNeasy Lipid Tissue

PLoS ONE | www.plosone.org 5 December 2011 | Volume 6 | Issue 12 | e28879

Viral Discovery Using Clinical Biopsy Tissue

Figure 5. NGS virus sequence mapping to reference genomes. To assess the relative difference between total RNA and total DNA extracts
with the virally enriched samples, the NGS viral reads were mapped to respective reference genomes.
doi:10.1371/journal.pone.0028879.g005

Figure 6. Virus reference reads as percentage of NGS datasets. Viral reference sequencing reads shown as a percentage of total total reads
from the Illumina NGS datasets for the differentially processed liver samples.
doi:10.1371/journal.pone.0028879.g006

PLoS ONE | www.plosone.org 6 December 2011 | Volume 6 | Issue 12 | e28879

Viral Discovery Using Clinical Biopsy Tissue

Figure 7. De novo assembly of viral contigs. NGS reads for infected liver samples and their processed fractions, were used to generate contigs
using ABySS and CLC de novo assembly algorithms. Contigs (length over 200 nt) were mapped to the viral reference genomes with the length and
coverage indicated (solid black lines). Regions of the reference genomes not covered by the contigs are indicated with dashed lines.
doi:10.1371/journal.pone.0028879.g007

Table 1. Characteristics of the different viral genera present One [Promega]) in 16 DNAse buffer (Ambion) and incubated at
in the liver tissue used in this study. 37uC for 90 minutes. Viral DNA (virally enriched) was extracted
using the High Pure Viral Nucleic Acid Kit (Roche) according to
Virus Group Envelope +/2 Genome Size (Kb) the manufacturer’s instructions and eluted with 30 ml water. The
DNA virus extraction method used a polyA carrier not suitable for
CAV-1 I 2 dsDNA 30.5
CPV-2 II 2 ssDNA 2/+ 5 the viral RNA extraction with the necessity for subsequent random
HCV-3 IV + ssRNA + 10.5 priming and amplification. We further established that glycogen
HBV VII + (Partial) dsDNA 3.1
was not an efficient substitute for polynucleotides as a carrier in
doi:10.1371/journal.pone.0028879.t001
silica column based methods. Therefore, viral RNA from the
cytosolic extract was extracted using Trizol LS (Invitrogen)

according to the manufacturer’s instructions with modifications.
20 mg glycogen was added prior to precipitation and left over-
night at 220uC and vortexed after 1 hour and 24 hours. The

PLoS ONE | www.plosone.org 7 December 2011 | Volume 6 | Issue 12 | e28879

Viral Discovery Using Clinical Biopsy Tissue

precipitation was then frozen at 280uC and vortexed again prior qPCR Assays
to centrifugation at 10 Kg for 30 minutes at 4uC. The RNA pellet
was rinsed three times in 75% ethanol, dried at room temperature Viral nucleotide sequence copy numbers were measured in the
and the pellet re-suspended in 20 ml water supplemented with amplified material from the various fractions prior to NGS
80 U RNAse OUT (Invitrogen). RNA was passed through a sequencing as well as to estimate the viral genome sequence copy
NucleoSpin RNA Clean-up XS column (Machery-Nagel) accor- number in the original samples. All liver samples were dissected
ding to the manufacturer’s instructions and eluted with 10 ml water into equally sized, 2 mm3 pieces. Equivalent samples were used for
supplemented with 80 U RNAse OUT. the different extraction procedures and all assays used a Rotorgene
6000 qPCR machine (Rotor-Gene Version 6.1 build 93, 2009,
Alumina grinding and the addition of Benzonase was used in a Corbett Research/Qiagen). Data was analysed using Rotor gene
comparison study with a HCV infected liver sample using a software, version 1.7 (Corbett Research/Qiagen).
previously reported protocol [20]. This was adapted for use with a
needle biopsy sized tissue sample, obviating the need for CAV and CPV qPCR. CAV and CPV viral sequence copy
ultracentrifugation to pellet the virus. Briefly, the liver tissue number was estimated relative to a standard to determine the
2 mm3 was ground with a micropestle (Eppendorf) with 20 mg of genome copy number of CAV and CPV in the liver tissue samples
100-mesh Alumina (Sigma-Aldrich) and 250 ml of ice cold 0.7% and to acquire a viral nucleic acid copy number estimate from the
bovine albumen supplemented buffered saline pH.7.2. The processed fractions. To produce the standard, amplified fragments of
homogenized sample was spun at 2.5 k rpm for 20 minutes at CAV and CPV were individually cloned into pJET1.2 (Fermentas)
4uC and passed through a 0.45 micron low bind filter (Millipore). according to the manufacturer’s instructions. Chemically competent
This was compared to an identical sized sample from the same E.coli cells (One Shot Top10 E.coli; Invitrogen) were transformed with
specimen processed for enrichment using our freeze/thaw this construct and plasmid DNA was extracted from E.coli grown
protocol described. Additionally, Benzonase was tested as an overnight in liquid culture (Plasmid mini kit; Qiagen). Plasmid DNA
additional nuclease to remove host nucleic acids, with both was linearised and quantified using the Quant-iT PicoGreen assay
processed samples using 20 U benzonase (Novagen). (Invitrogen). A 10-fold dilution series was made by diluting the
plasmid DNA in polyinosinic-polycytidylic acid (Poly I:C) from 108
Sequence independent Single Primer Amplification to 1 copy. qPCR was performed on each of the standards, DNA
(SISPA): RT/2nd strand synthesis and product extracted from infected tissue (for genome copy number) as well as
the processed DNA and cDNA fractions in triplicate. Amplification
amplification was performed using 1 ml of template and 0.3 mM of each primer
The protocol was adapted from previously reported SISPA using the QuantiTect SYBR Green Master Mix (Qiagen) and
distilled water to a final volume of 25 ml. After an initial PCR
based protocols10,21–22 and using the SISPA primers/adaptors: activation step at 95uC for 15 minutes, 45 cycles of amplification
FR26RV-N GCCGGAGCTCTGCAGATATCNNNNNN were performed consisting of 95uC for 15 seconds, 60uC for
FR20RV GCCGGAGCTCTGCAGATATC. 30 seconds and 72uC for 30 seconds.
First-strand cDNA synthesis was performed using 100 U of
Hexon gene CAV1 primers
Superscript III and the primer FR26RV-N (20 pmoles) using the (forward) 59-TGCTGCCACAATGGTCTTAC-39 (reverse) 59-
manufacturers recommended random primer protocol (Invitro- CCACAGTGGGGTTTCTGAAC-39
gen, UK) at 50uC for 60 minutes in the presence of RNase- OUT
(Invitrogen, UK). cDNA and DNA were incubated with 2.5 U of NS1 gene CPV2 primers
Klenow DNA polymerase (NEB) at 37uC for 60 minutes with an (forward) 59- GACTGGGAATCGGAAGTTGA-39 (reverse)
inactivation step of 75uC for 20 minutes. PCR of the above 59-CAATGCCAGCCTTGATCTTT-39
extension products was performed using 5 ml of cDNA or DNA in
a total reaction volume of 50 ml containing 2.5 mM MgCl2, HCV and HBV qPCR viral genome copy number
0.2 mM dNTPs, 16 Advantage 2 kit PCR buffer (Takara- estimation. These assays are based on the primers and probes
Clontech), 0.8 mM primer FR20RV and 1 U Advantage 2 kit previously reported [24–25] and modified to fit the QuantiTect Virus
Polymerase Mix. Temperature cycling was performed as follows: qPCR kit (Qiagen) according to the manufacturers instructions.
1 cycle of 95uC for 2 minutes, 20 cycles (minimum) of denaturing Briefly, the primer final concentrations were at 0.4 mM and the probe
at 95uC for 30 seconds, 65uC for 1 minute, 68uC for 30 seconds. final concentrations were at 0.2 mM. Polymerase activation/strand
An additional extension of 3 minutes at 68uC was performed. denaturation was at 95uC for 5 minutes, and two step cycling was at
Further cycling was used when necessary to generate a final 95uC for 15 seconds, and 60uC for 45 seconds for 45 cycles. Total
output of 1–3 mg of dsDNA post-fractionation and clean-up. DNA or RNA was extracted from 2 mm3 of HBV and HCV liver
Amplified DNA/cDNA was cleaned and fractionated using respectively.
Chroma-Spin-200 columns (Takara-Clontech) to remove sub-
200 bp nucleotide fragments to effectively standardise the HCV (forward) 59TGCTAGCCGAGTAGYGTTGG39
samples for Illumina NGS platform sequencing. Integrity and HCV (reverse) 59ACTCGCAAGCACCCTATCAG39
quantity was assessed on a gel chip 7500 (Agilent) using a 2100 HCV (Probe) 59-[JOE] ACCACAAGGCCTTTCGCGAC
Bioanalyzer (Agilent Technologies UK Ltd). Concentration of [BHQ1] – 3
dsDNA was determined using the Quant-iT PicoGreen assay HBV (forward)59CAACCTCCAATCACTCACCAAC39
(Invitrogen). HBV (reverse) 59ATATGATAAAACGCCGCAGACAC39
HBV (Probe) 59[CY3.5] TCCTCCAATTTGTCCTGGT-
PCR diagnostics TATCGCT [BHQ2] 39
A HCV PCR diagnostic kit (GeneAmpH EZ rTth RNA PCR, qPCR standards and controls. HCV genotype 1a WHO
International Standard, NIBSC. 154,881 IU/ml. HBV Eurohep
AB life technologies) was used to confirm the presence of the virus, standard reference 1, genotype A, HBsAg subtype adw, WHO
according to the manufacturers instructions, in the HCV infected International Standard, NIBSC. 1,000,000 IU/ml. Viral nucleic
biopsy samples and the post-transplantation excised HCV infected acids were extracted using the High Pure Viral Nucleic Acid Kit,
liver sample used for the enrichment protocol. Roche. Viral load was calculated per ml of eluate with no
adjustment for kit extraction efficiency. Triplicate dilutions of the
HCV and HBV standards were run at neat and 1in 5 serial

PLoS ONE | www.plosone.org 8 December 2011 | Volume 6 | Issue 12 | e28879

Viral Discovery Using Clinical Biopsy Tissue

dilutions together with the samples in triplicate. HCV RNA reference genomes (.95% similarity) PVCCP-N for canine
genome copy number was estimated by using single primers (not parvovirus, AC_000003 for canine adenovirus and EU155829
pooled) for the reverse transcription step prior to the addition of for hepatitis B virus. Hepatitis C virus reads were first mapped to
the second primer and the PCR cycles in order to exclude the –ve the 61 reference genomes from the Los Alamos National
strand from the genome copy estimation. The reverse tran- Laboratory HCV sequence database and the HCV genotype 3
scription step was at 50uC for 20 minutes. reference sequence NC_009824 with the HCV sub-type 3a
consensus genome Ref.3a.DE.HCVCENS1.X76918 used subse-
HCV and HBV qPCR viral sequence quantitation in quently for best coverage (,85%) and sample comparison using
processed sample fractions. HCV and HBV viral CLC Bio Genomics Workbench v4 (Katrinebjerg, 8200 Aarhus N,
sequences in each fraction (RNA or DNA virus enriched, total Denmark) and BWA [28]. The references were indexed using bwa
RNA and total DNA) were quantified against NIBSC controls index –a IS with bwa aln and bwa sampe used to achieve the
(HCV 06/100 and HBV 97/750) essentially as described paired end alignments. No read mapping was possible to the
previously with minor modifications [24,26]. All reactions were 59UTR and the first 400 bp of the core protein sequence for any of
carried out in a total volume of 25 ml using Jumpstart Taq the confirmed or unconfirmed HCV geno/subtypes indicative of a
readymix (Sigma) and specific forward and reverse primers (HBV1 novel subtype as defined by this hypervariable region.
and HBV2, MAD1 and MAD2 for HCV) were used at a final
concentration of 400 nM, and labelled taqman probes at 0.2 mM Illumina NGS viral contig de novo assembly
(BS1 and MAD3 for HBV and HCV respectively). Following a De novo assemblies of viral contigs were generated using the CLC
10 minute denaturation step at 95uC, 50 cycles of 30 seconds at
60uC and 30 seconds at 95uC were performed. Bio Genomics Workbench v4 (Katrinebjerg, 8200 Aarhus N,
Denmark) and ABySS 1.2.7 [21] in tandem. For CLC de novo
Illumina NGS protocol assembly both paired and unpaired were co-assembled with a
Sequencing was performed using standard Illumina methods. paired read distance min/max at 180/380. For Abyss, a K-mer
size of 37 was used, K = 37 and the sequences were assembled as
Libraries were created with the Illumina Paired End Genomic pairs. Contig consensus sequences were mapped to the viral
DNA Sample Prep kit. Briefly, DNA was sheared into 200–400 bp reference genomes to determine total coverage as well as contig
fragments using a Covaris AFA (Covaris, Woburn, MA), end size relative to genome size. Sequence alignment to the reference
repaired and an A-overhang added. Illumina paired end adapters genomes was performed using the sequence alignment algorithms
were A-T ligated onto the ends of the fragments. Libraries were from CLC Genomics workbench v4.0 and bwa. For the CAV and
PCR amplified and each sample sequenced using one lane of an CPV enriched NGS data sets the viral reads were greater than
Illumina GA II sequencer generating 76 bp paired end reads. For 20% of the total. De novo assembly was ineffective probably due to
more detail see Quail et al [27]. ‘noise’ as a result of accumulated errors in the NGS data sets
containing millions of target viral reads. This occurred with both de
Illumina NGS read set trimming novo assemblers used and was overcome by a combination of
Read set trimming of the NGS data was performed using the splitting the read sets to #1 million reads and increasing the
alignment stringency of the reduced sets.
CLC Bio Trimming algorithms with the parameters: Failed reads
removed on import, ambiguous limit (2), 39 terminal nucleotides Author Contributions
removed (2), homopolymeric tracts of 30 bp removed, SISPA
primers removed and minimum number of nucleotides in read Conceived and designed the experiments: GD. Performed the experiments:
allowed (38 nt). GD NB SS J. Heaney. Analyzed the data: GD AM AP ND. Contributed
reagents/materials/analysis tools: WB GA NB PK MC. Wrote the paper:
Illumina NGS Viral Sequence copy number and reference GD J. Heeney LT.
coverage estimation
10. Stang A, Korn K, Wildner O, Uberla K (2005) Characterization of virus isolates
To assess overall genome coverage and reference nucleotide by particle-associated nucleic acid PCR. J Clin Microb 43: 716–720.
coverage, the Illumina reads were mapped to complete viral
11. De Souza Luna LK, Baumgarte S, Grywna K, Panning M, Drexler JF, et al.
References (2008) Identification of a contemporary human parechovirus type 1 by
VIDISCA and characterisation of its full genome. Virol J 5: 26.
1. Reyes GR, Kim J (1991) Sequence-independent, single-primer amplification
(SISPA) of complex DNA populations. Mol Cell Probes 5: 473–481. 12. De Vries M, Pyrc K, Berkhout R, Vermeulen-Oost W, Dijkman R, et al. (2008)
Human parechovirus type 1, 3, 4, 5, and 6 detection in picornavirus cultures.
2. Palacios G, Quan P, Jabado OJ, Conlan S, Hirschberg DL, et al. (2007) J Clin Microbiol 46: 759–762.
Panmicrobial oligonucleotide array for diagnosis of infectious diseases. Emerging
Inf Dis 13: 73–81. 13. De Vries M, Deijs M, Canuti M, van Schaik BD, Faria NR, et al. (2011) A
sensitive assay for virus discovery in respiratory clinical samples. PLoS ONE 6:
3. Wang D, Urisman A, Liu YT, Springer M, Ksiazek TG, et al. (2003) Viral e16118.
discovery and sequence recovery using DNA microarrays. PLoS Biol 1: E2
(2003). 14. Pyrc K, Jebbink MF, Berkhout B, Van der Hoek L (2008) Detection of new
viruses by VIDISCA. Virus discovery based on cDNA-amplified fragment length
4. Wang D, Coscoy L, Zylberberg M, Avila PC, Boushey HA, Ganem D, et al. polymorphism. Meth Mol Biol 454: 73–89.
(2002) Microarray-based detection and genotyping of viral pathogens. Proc Natl
Acad Sci USA 99: 15687–15692. 15. Tan LH, van Doorn R, van der Hoek L, Hien VM, Jebbink MF, et al. (2011)
Random PCR and ultracentrifugation increases sensitivity and throughput of
5. Allander T, Emerson SU, Engle RE, Purcell RH, Bukh J (2001) A virus VIDISCA for screening of pathogens in clinical specimens. J Inf Dev Count 5:
discovery method incorporating DNase treatment and its application to the 142–148.
identification of two bovine parvovirus species. Proc Natl Acad Sci USA 98:
11609–11614 (2001). 16. Feng H, Taylor JL, Benos PV, Newton R, Waddell K, et al. (2007) Human
transcriptome subtraction by using short sequence tags to search for tumor
6. Ambrose H, Clewley J (2006) Virus discovery by sequence-independent genome viruses in conjunctival carcinoma. J Virol 81: 11332–11340.
amplification. Rev Med Virol 16: 365–383.
17. Feng H, Masahiro Shuda, Yuan Chang, Moore PS (2008) Clonal integration of
7. Delwart E (2007) Viral metagenomics. Rev Med Virol 17: 115–131. a polyomavirus in human Merkel cell carcinoma. Science 319: 1096–1100.
8. Djikeng A, Halpin R, Kuzmickas R, DePasse J, Feldblyum J, et al. (2008) Viral

genome sequencing by random priming methods. BMC genomics 9: 5.
9. Telenius H, Carter NP, Bebb CE, Nordenskjold M, Ponder BA, et al. (1992)

Degenerate oligonucleotide-primed PCR: general amplification of target DNA
by a single degenerate primer. Genomics 13: 718–725.

PLoS ONE | www.plosone.org 9 December 2011 | Volume 6 | Issue 12 | e28879

Viral Discovery Using Clinical Biopsy Tissue

18. Palacios G, Druce J, Du L, Tran T, Birch C, et al. (2008) A new arenavirus in a 24. Candotti D, Temple J, Owusu-Ofori S, Allain JP (2004) Multiplex real-time
cluster of fatal transplant-associated diseases. N Engl J Med 358: 991–998. quantitative RT-PCR assay for hepatitis B virus, hepatitis C virus, and human
immunodeficiency virus type 1. J Virol Meth 118: 39–47.
19. Weber G, Shendure J, Tanenbaum DM, Church GM, Meyerson M (2002)
Identification of foreign gene sequences by transcript filtering against the human 25. Hsia CC, Purcell RH, Farshid M, Lachenbruch PA, Yu MW (2006)
genome. Nat Gen 30: 141–142. Quantification of hepatitis B virus genomes and infectivity in human serum
samples. Transfusion 46: 1829–1835.
20. Victoria J, Kapoor A, Dupuis K, Schnurr DP, Delwart EL, et al. (2008) Rapid
identification of known and new RNA viruses from animal tissues. PLoS Path 4: 26. Weinberger KM, Bauer T, Bohm S, Jilg W (2000) High genetic variability of the
e1000163. group-specific a-determinant of hepatitis B virus surface antigen (HBsAg) and
the corresponding fragment of the viral polymerase in chronic virus carriers
21. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, et al. (2009) ABySS: a lacking detectable HBsAg in serum. J Gen Virol 81: 1165–1174.
parallel assembler for short read sequence data. Genome Res 19: 1117–1123.
27. Quail MA, Kozarewa I, Smith F, Scally F, Stephens PJ, et al. (2008) A large
22. Castillo I, Pardo M, Bartolome J, Ortiz-Movilla N, Rodrıguez-Inigo E, et al. genome center’s improvements to the Illumina sequencing system. Nat Meth 5:
(2004) Occult hepatitis C infection in patients in whom the etiology of 1005–1010.
persistently abnormal results of liver-function tests is unknown. J Inf Dis 189:
7–14. 28. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-
Wheeler transform. Bioinformatics 25: 1754–1760.
23. De Marco L, Gillio-Tos A, Fiano V, Ronco G, Krogh V, et al. (2009) Occult
HCV infection: an unexpected finding in a population unselected for hepatic
disease. PloS1 4: e8128.

PLoS ONE | www.plosone.org 10 December 2011 | Volume 6 | Issue 12 | e28879

ARTICLE 3:

Changes in Clinical Trials
Methodology Over Time: A
Systematic Review of Six Decades
of Research in Psychopharmacology

Changes in Clinical Trials Methodology Over Time: A
Systematic Review of Six Decades of Research in
Psychopharmacology

Andre´ R. Brunoni1, Laura Tadini2,3, Felipe Fregni3*

1 Department and Institute of Psychiatry, University of Sao Paulo, Sao Paulo, Brazil, 2 Centro Clinico per le Neuronanotecnologie e la Neurostimolazione, Fondazione
IRCCS Ospedale Maggiore Policlinico, Mangiagalli e Regina Elena, Milan, Italy, 3 Berenson-Allen Center for Noninvasive Brain Stimulation, Beth Israel Deaconess Medical
Center, Harvard Medical School, Boston, Massachusetts, United States of America

Abstract

Background: There have been many changes in clinical trials methodology since the introduction of lithium and the
beginning of the modern era of psychopharmacology in 1949. The nature and importance of these changes have not been
fully addressed to date. As methodological flaws in trials can lead to false-negative or false-positive results, the objective of
our study was to evaluate the impact of methodological changes in psychopharmacology clinical research over the past 60
years.

Methodology/Principal Findings: We performed a systematic review from 1949 to 2009 on MEDLINE and Web of Science
electronic databases, and a hand search of high impact journals on studies of seven major drugs (chlorpromazine, clozapine,
risperidone, lithium, fluoxetine and lamotrigine). All controlled studies published 100 months after the first trial were
included. Ninety-one studies met our inclusion criteria. We analyzed the major changes in abstract reporting, study design,
participants’ assessment and enrollment, methodology and statistical analysis. Our results showed that the methodology of
psychiatric clinical trials changed substantially, with quality gains in abstract reporting, results reporting, and statistical
methodology. Recent trials use more informed consent, periods of washout, intention-to-treat approach and parametric
tests. Placebo use remains high and unchanged over time.

Conclusions/Significance: Clinical trial quality of psychopharmacological studies has changed significantly in most of the
aspects we analyzed. There was significant improvement in quality reporting and internal validity. These changes have
increased study efficiency; however, there is room for improvement in some aspects such as rating scales, diagnostic criteria
and better trial reporting. Therefore, despite the advancements observed, there are still several areas that can be improved
in psychopharmacology clinical trials.

Citation: Brunoni AR, Tadini L, Fregni F (2010) Changes in Clinical Trials Methodology Over Time: A Systematic Review of Six Decades of Research in
Psychopharmacology. PLoS ONE 5(3): e9479. doi:10.1371/journal.pone.0009479

Editor: Roberta W. Scherer, Johns Hopkins Bloomberg School of Public Health, United States of America

Received September 4, 2009; Accepted February 11, 2010; Published March 3, 2010

Copyright: ß 2010 Brunoni et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The authors have no support or funding to report.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: ffregni@bidmc.harvard.edu

Introduction also experienced advancements such as novel study designs, better
methods of blinding and randomization, more sophisticated
Clinical trials gained importance in medical research after statistical methods and better definition of outcomes [3].
World War II, when there was a rapid increase in drug
development and research. Psychopharmacology is a field that Presently, psychiatric research faces important challenges. For
reflects the marked increase in using clinical trials. In fact, the instance, although psychiatric drugs have distinct mechanisms of
modern era of psychopharmacology began only in 1949, when action, they seem to have the same efficacy in clinical trials [4].
lithium was reintroduced in psychiatry [1], being followed by the Moreover, the assessment of outcomes is mostly based upon
release of chlorpromazine (1954), imipramine (1958) and several severity scales that are somewhat subjective [5]. Another issue is
others. These new drugs brought dramatic modifications in that the diagnostic criteria are ‘‘operational’’, meaning that a
psychiatric practice and research as a new study methodology minimum appearance of symptoms are required to fulfill a
had to be developed for a field that was, until then, virtually absent diagnosis, which does not always reflect clinical practice [6].
from pharmacological therapies. Products of this new methodol- Consequently, there is a concern whether psychiatric clinical
ogy included the development of severity rating scales and new trials are methodologically adequate and, if not, which aspects of
diagnostic criteria, which eventually led to the third and fourth trial design should be further improved [7]. Therefore, it is
editions of the Diagnostic and Statistical Manual of Mental important to analyze the change of these aspects over time in
Disorders (DSM) [2]. Meanwhile, medical clinical research itself order to understand our current methodological practice and
also to be able address whether the results of past trials, which in

PLoS ONE | www.plosone.org 1 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

many cases support our current therapeutics, are valid. Finally, as Search and Collection of the Data
more recent clinical studies in psychopharmacology are failing to Our search strategy is shown in Figures 1, 2 and 3. We
achieve positive results, new paths for clinical trial design are
needed [7]. considered the following databases: MEDLINE, Web of Science,
Cochrane and EMBASE. For drugs introduced before 1970, the
Therefore, a critique overview of the methodology used in past first author (ARB) also searched on the web sites of the journals
and current clinical trials can advance psychopharmacologic containing past issues. The first (ARB) and the second (LT) author
research. Our aim is to examine the major changes in clinical trial also performed hand search in the libraries of University of Sao
design by reviewing selected studies published in high-impact Paulo Medical School and Harvard Medical School (Countway
journals over the past sixty years. The purpose of our study is to Medical Library), respectively. Finally, ARB and LT examined
work towards providing a better understanding on the develop- reference lists in systematic reviews and retrieved papers and
ment of psychopharmacological clinical trials, and thereby contacted experts on the field. The keywords used for each drug
identifying future directions for its continuous advancement. review was the name of the drug, limited by the time period and
by the referred journals (Figures 1, 2, 3). The procedures carried
Methods out in this review are consistent with the Cochrane guidelines for
reporting systematic reviews and meta-analyses [17] and also with
Eligibility Criteria the QUOROM guidelines (Table S1).
Because a review of all psychopharmacological drug clinical
The inclusion criteria for each drug were: (1) clinical studies on
trials over the past sixty years is unfeasible, we reviewed only anxious, mood or psychotic disorders; (2) all controlled, random-
studies published in high-impact, influential general medical (The ized, interventional trials, whether testing either drug therapeutic
New England Journal of Medicine [NEJM], JAMA, Lancet and or prophylactic properties (i.e., response/remission or relapse/
British Medical Journal) and psychiatric journals (Archives of recrudescence). We excluded: (1) other designs, such as case
General Psychiatry, The American Journal of Psychiatry [AJP], reports, case series, observational designs or quasi-experimental
The Journal of Mental Sciences/British Journal of Psychiatry studies; (2) studies whose primary aim was not to test drug efficacy
[BJP] and The Journal of Clinical Psychiatry [JCP]). It would also (e.g., psychometric studies); (3) clinical trials performed for other
be unfeasible to review all of the available drugs currently and ever conditions than specified (e.g. lithium in hyperactive children)
used in psychiatry; therefore we looked for important psychiatric [18]); and (4) studies in animals. Since all selected journals are
drugs developed at different time periods that: (1) are currently published in English, language restriction was not an issue.
used in psychiatry (for ease of interpretation of results); (2) are used
in psychotic, mood or anxiety disorders (since such disorders rely Data Extraction
significantly on psychopharmacological therapies) and (3) were The first author (ARB) performed the data extraction and
introduced in different time periods as to cover the time period
reviewed. The selected drugs were: lithium (most effective and compiled the variables extracted to the database, while the second
frequently used drug for bipolar disorder) [8]; chlorpromazine (one author (LT) checked if data were correctly recorded. The third
of the most important drugs in the history of psychiatry) [9]; author (FF) reviewed a random sample of the articles to recheck
diazepam (the most used benzodiazepinic drug) [10]; clozapine for errors in data extraction or interpretation. Disagreements were
(the most effective antipsychotic drug to date) [11]; fluoxetine (the resolved by consensus. We designed a semi-structured checklist,
prototypical, most studied antidepressant) [12]; risperidone (the based on previous methodological reviews of clinical trials
first second-generation antipsychotic introduced) [13]; and [19,20,21,22,23] to address the following aspects:
lamotrigine (the first drug FDA approved for maintenance
treatment of bipolar disorder since lithium) [14]. (1) general characteristics (author names, publication year, journal
published and sources of financial support);
We also looked only for studies published within 100 months
after the first retrieved article, when efficacy studies are typically (2) abstract reporting, in which the complete report of background,
conducted. The exceptions were lithium and clozapine, in which methods and results in the abstract (yes/no for each one) were
we expanded the search to twenty years, as such drugs were not considered;
initially available in the U.S. due to several deaths initially
reported related to their non-monitored use [15]. Here, it should (3) study design, assessing number of centers (uni- vs. multicentric),
be underscored that three possible strategies were considered in use of washout (yes vs. no vs. drug-free), use of placebo arm
our study: (1) to review all studies over 60 years on one drug only; (yes vs. no), study design (2-arm vs. 3-arm vs. other designs), use
(2) to review all studies on one mental condition only; (3) the of intention-to-treat analysis (yes vs. no);
present strategy. However, the first strategy would hinder the
review of newer drugs, while older drugs are currently seldom (4) participants section, assessing the sample size, the reporting of
researched for efficacy The second strategy premises diagnostic informed consent (yes vs. no) and eligibility criteria (clear vs.
stability criteria over time, which is invalid: in 60 years, there were unclear), the method for evaluating diagnostic severity
4 Diagnostic and Statistical Manual of Mental Disorders (DSM) and 5 (personal judgment vs. rating scales) and for confirming the
International Classification of Diseases (ICD) with different diagnostic diagnostic (clinical interview vs. structured questionnaires);
nomenclatures. For instance, the current diagnostic of major
depressive disorder did not exist in DSM-II in which depressed (5) methods section, assessing whether the method of randomization
patients would probably be diagnosed as depressive neurosis; reported was adequate vs. inadequate vs. biased; the method
involutional melancholia; manic-depressive illness, depressed type; or for allocation concealment (adequate vs. inadequate vs.
neurasthenic neurosis [16]. Moreover, there is no single diagnosis biased); sample size calculation reporting (yes vs. no); and
for which different drugs were tested in efficacy trials for this entire statement of primary hypothesis (adequate vs. inadequate);
period. Finally, the present strategy allowed us to consider several
drugs and diagnoses thus extending the scope of this review (6) results reporting, assessing the reporting of baseline comparisons
examining changes over time. (adequate vs. inadequate), of adverse effects (adequate vs.
inadequate) and of dropout reasons (adequate vs. inadequate);
and the use of parametric tests (yes/no).

(7) conclusion section, assessing whether the trial was reported as
positive vs. negative vs. unclear; and whether the conclusions

PLoS ONE | www.plosone.org 2 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Figure 1. Flow chart for the selection of Risperidone and Fluoxetine studies.
doi:10.1371/journal.pone.0009479.g001

presented were consistent with the results (consistent vs. (12.2%) on risperidone. Most trials were published in the BJP (30
inconsistent vs. dubious). trials, 33%), the JCP (20 trials, 22%) and the AJP (19 trials, 21%).
We did not identify any trials from NEJM. Twenty- four trials
The criteria used for data classification are presented in Table 1. were performed in 1961 or earlier, 23 trials throughout 1962–74,
22 trials throughout 1975–89 and 22 trials from 1990 to 2003.
Data Analysis Also, we were not able to identify the major source of sponsorship
The variables collected were managed as outcome variables and in 48 (52%) of the studies. In 36 studies, we classified the
sponsorship as public while in 7 the classification was considered
each one was analyzed separately. ‘‘Year’’ was the main predictor private. The issue here is that newer trials have many authors and
variable as to assess whether the outcome changed over time. We each one usually has one or more funding source. For example,
performed a separate analysis using drug class (3 levels: one article [24] reported funding from a NIH grant, two
antipsychotics – clozapine, chlorpromazine and risperidone; mood foundations award grants, and a public, local mental health grant.
stabilizers – lamotrigine and lithium; and others – fluoxetine and The first author was a member of the speaker’s bureau for four
diazepam) as to assess a possible drug class confounding effect. pharmaceutical companies, one of them being the sponsor of the
‘‘Year’’ was treated as a continuous and an ordinal variable tested drug. In such cases, we classified the sponsorship as
(divided in equal quartiles). When treated as continuous, logistic ‘‘unclear’’. As this issue occurred in 52% of the studies, we did not
regressions were applied; when ordinal, we used the chi-square or perform further statistical analyses.
the Fisher’s exact test. Analyses were performed using Stata
statistical software, version 9.0 (StataCorp, College Station, TX, The individual characteristics of each trial are presented in the
USA) and SPSS Software, version 16. As shown below, analyses Appendix (Table S2). Table 2 presents the summary character-
using both methods yielded quite similar results. istics of the reviewed studies. Table 3 shows the analyses run for
categorical variables.
Results
Regarding abstract reporting, there was an improvement in
Ninety-one articles were reviewed, 24 (26.7%) on chlorprom- quality reporting in all sections of an abstract (background,
azine, 20 (21%) on lithium, 8 (8.9%) on diazepam, 6 (6.7%) on methods and results) over time (p,0.01 for all analyses)
clozapine and lamotrigine each, 16 (17.8%) on fluoxetine and 11 (Figure 4).

PLoS ONE | www.plosone.org 3 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Figure 2. Flow chart for the selection of Clozapine and Lamotrigine studies.
doi:10.1371/journal.pone.0009479.g002

In the ‘‘participants’’ section, we noticed a significant improve- placebo for maniac-depressive illness, ambulatory patients had
ment in clear eligibility criteria (p,0.01). Examples of unclear their drug changed to placebo without knowing [30].
eligibility criteria were: ‘‘anxiety enough to require a tranquilizer’’
(comparing Diazepam and Lorazepam) [25]; ‘‘the most aggressive Regarding study design, a two-arm, parallel design was most
and disturbed untreated patients’’ (comparing Chlorpromazine often used in newer trials, when compared to the three-arm and
and Prochlorpromazine)[26]; ‘‘patients needing ECT’’ (comparing other designs (p,0.01) (Figure 5). The number of studies using
Diazepam and Amitryptyline) [27]; and ‘‘when chlorpromazine placebo arms did not change over time (p = 0.13 for year as
was [considered] the treatment of choice’’ (comparing Chlor- continuous and ordinal). Newer studies were also associated with
promazine and ‘Pacatal’). Also, newer trials used more structured multicentric designs, drug washout prior to the trial onset, and
interviews to confirm a diagnosis, while older trials relied mainly intention-to-treat analyses (p,0.01for all variables) (Figure 6).
on clinical interviews (p,0.01). Accordingly, newer trials used
severity rating scales more frequently than older trials, which We noticed that six studies reported clearly biased methods of
assessed severity based on physician’s judgment (p,0.01). A randomization and allocation: alternated admission in the ward
performance bias was also possible as the raters were not blinded [31], using 25 red and 25 black cards for group assignment [32],
to the interventions what could theoretically favors the experi- physician’s judgment on the best therapy (insulin coma or
mental arm in some of the studies. It was also noticed that newer chlorpromazine) [33]; randomization and assignment performed
trials performed or reported more sample size calculations than by the hospital pharmacist, ‘‘the choice having been made by him at
older trials (p,0.01). The sample sizes of newer studies were random’’, although 45 patients received active drugs and 25 control
marginally larger (p = 0.04 and 0.03 for year as continuous and as tablets [34]; assignment according to the patient willingness to do
ordinal, respectively) than older studies; however this difference weekly blood tests (mandatory when taking clozapine) [35]; and
could be explained by a recent (1995) trial [28] that is twice as physician’s judgment on the best therapy (olanzapine or
large as compared to next largest study [29]. Finally, newer trials risperidone) [36]. In these cases, although the methods were
reported or used more informed consents than the older trials reported, we considered them as ‘‘inadequate’’ and were analyzed
(p,0.01). Signs of poor ethical standards were observed in some of accordingly. The results showed that the reporting of sequence
the older trials. For example, in one relapse trial of lithium vs. generation methods improved over time (p = 0.01 and p,0.01 for
year as continuous and as ordinal, respectively) while the
allocation concealment did not (p = 0.39 and p = 0.08 for year as

PLoS ONE | www.plosone.org 4 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Figure 3. Flow chart for the selection of Chlorpromazine, Lithium and Diazepam studies.
doi:10.1371/journal.pone.0009479.g003

continuous and as ordinal, respectively). However, the overall were: a lamotrigine vs. placebo trial that concluded the active drug
number of trials reporting the randomization and allocation ‘‘is associated with superior efficacy’’ although this was true for
methods was low (18% and 10%, respectively). Also, eight trials some but not all analyses [40]; and a trial comparing acetophe-
were not double-blinded or single-blinded with external raters, nazine vs. diazepam in anxious depression that reported several
four of them compared patients using pharmacological vs. non- comparisons and was not able to conclude which one was better
pharmacological treatments (ECT, insulin therapy or psychother- [41]. Examples of inconsistent conclusions were: a underpowered
apy) [31,33,37,38]. One used a no-treatment arm [39], one was trial that compared lithium vs. chlorpromazine in 23 patients with
initially double-blinded but patients and physicians discovered the mania that concluded that ‘‘lithium is apparently superior (…) in
assignment because the pills taken differed in color, size and mania’’. Although the author reported that ‘‘lithium was superior
quantity for each arm [32], one had patients in one group doing on all scales, this was not statistically significant on any(…)’’. He
weekly blood tests while the other group did not [35]; and in explained his conclusion arguing that ‘‘in this study and all
another study, patients knew their assignment groups [36]. The previous ones these findings are based on poor methodological
other 83 trials used double-blinded or ‘‘double-dummy’’ tech- techniques…. due to the nature of the illness and the [nature of]
niques. Figure 7 visually assesses these changes. the drugs, no reasonable (…) trial can ever be performed’’ [42];
and a 1959 trial in which the author compared the effects of 4
Regarding the results section, newer trials adequately reported drugs in geriatric patients with various diagnostics – his severity
more than older trials: ‘‘baseline group comparisons’’ (p,0.01), assessment was based on four dimensions (social, intellectual,
‘‘adverse effects’’ of drugs (p,0.01) but not ‘‘reasons for drop- mood and thought improvement) and included his clinical
outs’’ (p = 0.34 and p = 0.41 for year as continuous and ordinal, evaluation, a psychologist evaluation and the ‘‘nurses and
respectively). Also, newer trials reported more than older trials the psychiatric aides’’ evaluation performed two times a week for 18
p statistics (p,0.01) and used more parametric tests (p,0.01). weeks. At the end, though, the author stated that ‘‘since it was
impossible quantitatively to weigh these fluctuating factors, the
In the conclusion section we assessed whether the results were final judgment in assessing the patient’s responses was necessarily a
presented as positive, negative or did not provide a clear clinical decision based on the accumulated data’’ [43]. Impor-
statement. We also recorded whether or not the conclusion is tantly, the 12 studies rated as ‘‘inconsistent’’ had some signs of
supported by the results; accordingly to our previous definitions
(Table 1). Some examples of the 35 trials classified as ‘‘dubious’’

PLoS ONE | www.plosone.org 5 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Table 1. Criteria used for data classification in the present review.

Abstract Reporting Background Adequate - when a synthesis of the current knowledge and study objectives was provided.
Study design Methods Adequate - when the trial design, the subjects, and the interventions were described.
Subjects Results Adequate - when the results, the primary outcome and the main conclusions were described.
Wash-out Yes - if prior treatments were withdrawn before the trial started.
Methods Intention to treat Yes - if the analysis considered the entire sample, before dropouts.
Sample Size Calculation Yes - if an analysis for sample size was performed and presented.
Informed consent Yes - if the use of an informed consent is described.
Eligibility criteria Clear - the study population can be reproducible with the information given.
Unclear - The study population cannot be reproducible and/or there is evidence of enrollment bias.
Diagnostic Criteria Clinical interview - the diagnostic was confirmed by a clinical interview.
Structured form - the diagnostic was confirmed by using an structured questionnaire.
Diagnostic Severity Rating scales - when rating scales were used to assess severity.
Physician judgment - when the physician judged the degree of improvement and/or severity.
Randomization Adequate - when the method of sequence generation was reported.
Inadequate - when the sequence generation method was not reported.
Allocation Evidence of bias - when the method was described but it was biased.
Adequate - when the method of allocation concealment was reported.
Results Primary Hypothesis Inadequate - when the allocation concealment method was not reported.
Conclusion Baseline Comparisons Evidence of bias - when the method was described but it was biased.
Adverse Effects Adequate - the primary hypothesis was clearly stated.
Dropout reasons Inadequate - the primary hypothesis was not or was incompletely stated.
p value Adequate - when the groups were compared at baseline.
Trial result Inadequate - when the groups were not compared at baseline.
Adequate - the adverse effects were fully reported.
Consistency Inadequate - the adverse effects were not or were partially reported.
Adequate - the reasons of dropouts were assessed and presented.
Inadequate - the dropout reasons were not presented or not fully reported.
Adequate - the p value of the primary outcome was reported.
Positive - the authors stated their main hypothesis was proved.
Negative - the authors stated they failed to prove their main hypothesis.
Unclear - the authors does not clear state whether or not their main hypothesis was proved.
Yes - the conclusion is supported by the study results.
Dubious - lack of trial quality or overinterpretation of results.
No - There is clear evidence of bias in the study or the conclusion is clearly not coherent with the
results shown.

doi:10.1371/journal.pone.0009479.t001

methodological flaws. Four were single-blinded, 4 did not report baseline comparison (p = 0.04) and consistency of results (p,0.01);
the gender proportion, 3 did not report the mean age of the although in all cases the difference was significant only for the
subjects, none used intention-to-treat, 10 had unclear eligibility group ‘‘others’’ that enrolled fluoxetine and diazepam, not
criteria, 9 did not report randomization methods and 10 did not properly showing a ‘‘drug class effect’’. Also, since the results
state detailed adverse effects. were only marginally significant, they are probably false-positive
findings.
We observed that newer trials showed more conclusions
consistent with results (when compared to dubious or inconsistent) Discussion
than older trials (p,0.01), an association that remained significant
when the variable ‘‘positive or negative results’’ was inputted in the Our results show that the methodology of clinical trials changed
model (p,0.01). Also, we did not observe a particular trend in substantially over the past 60 years, with significant improvement
more positive results (as compared to negative or unclear results) in quality reporting and in internal validity. The gains in quality
over time (p = 0.16) (Figures 8, 9 and 10). reporting were observed in abstract reporting, in which we
observed more complete reports in all subsections (background,
Finally, we ran separate analyses for drug class to address methods and results) over time. Improvement was also observed in
whether it could explain the differences observed. Of the 24 results reporting – as p values, effect sizes, baseline group
analyses performed, we observed associations between the drug comparisons and adverse effects were more completely reported
class ‘‘other’’ and the variables informed consent (p = 0.01), use of
placebo (p = 0.01), randomization (p = 0.02), allocation (p = 0.02),

PLoS ONE | www.plosone.org 6 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Table 2. Shows the summary characteristics of the studies.

Time Period Antipsychotic 1961 and earlier 1962–1974 1975–1989 1990 and after
General characteristics Mood Stabilizer
Number of trials Other 24 23 22 22
Drug class Psychosis (*) 24 0 4 13
Affective disorders (**) 0 15 5 6
Disorder Anxiety disorders 0 8 13 3
Unipolar depression 23 0 3 12
Centers Multicentric 0 16 4 6
Number of subjects Mean (SE) 1 7 0 2
Abstract 0 0 15 2
Background Adequate 3 8 6 15
Methods Adequate 120 (30) 69 (14) 86 (15) 214 (65)
Results Adequate
Study design 2 2 8 17
Wash-out Reported 8 14 14 21
Intention-to-treat Performed 5 5 8 21
Sample size calculation Reported
Informed Consent Reported 4 2 12 17
Number of arms 2-arm 0 1 4 16
3-arm 0 1 0 7
Subjects 0 3 15 19
Eligibility criteria Clear 6 14 20 18
Diagnostic criteria Structured form 4 2 2 2
Diagnostic severity Rating scales
Methods 2 2 18 22
Randomization Adequate 0 1 0 9
Allocation Adequate 12 13 20 19
Primary hypothesis Adequate
Results 4 2 0 11
Baseline comparisons Adequate 5 0 1 3
Adverse effects Adequate 0 4 2 12
Dropout reasons Adequate
p value Adequate 4 11 11 21
Conclusions 5 4 11 16
Trial result Positive 15 14 15 19
Negative 6 7 10 18
Consistency Unclear
Yes 7 15 12 13
Dubious 10 6 8 6
No 7 2 1 3
1 11 13 18
14 10 7 4
9 2 1 0

All data are presented as the number (count) of trials per period, except the number of subjects, which is presented as mean and standard error.
(*)includes Schizophrenia, ‘‘Paraphrenia’’, ‘‘elderly patients with psychosis’’ and other types of non-affective psychosis.
(**)includes Maniac-Depressive Illness, Mania, and Bipolar Disorder.
doi:10.1371/journal.pone.0009479.t002

over time. Also, internal validity increased, since newer studies conclusion of the results of newer studies were more appropriate
used more explicit eligibility criteria, objective rating scales, and consistent than older trials. Study design also changed in some
intention-to-treat analyses. Newer studies also showed less biased aspects over time: sample size increased, more studies performed
methods of randomization and blinding. Accordingly, the (or reported) sample size calculations, and 2-arm substituted 3-or-

PLoS ONE | www.plosone.org 7 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Table 3. Data analysis and study results.

Outcome variables Predictor variables

Year (continuous) Year Drug Class
(ordinal)
Abstract reporting Level B (S.E.) p p x2 or p
Adequate (vs. Inadequate) 20.11(0.02) x2 or ANOVA
Background Adequate (vs. Inadequate) 20.08 (0.02) ,0.01 ANOVA ,0.01
Methods Adequate (vs. Inadequate) 20.1 (0.02) ,0.01 ,0.01 2.54 0.28
Results ,0.01 33.8 ,0.01 5.4 0.07
Subjects section Clear (vs. Unclear) 20.23 (0.05) p 19.4 1.6 0.45
Eligibility Criteria Interview (vs. Structured) 0.15 (0.5) ,0.01 37.1
Diagnostic Criteria Scales (vs. Judgment) 0.06 (0.02) ,0.01
Diagnostic Severity Yes (vs. No) 0.17 (0.03) ,0.01 35.4 ,0.01 4.03 0.13
Informed Consent Yes (vs. No) 0.14 (0.05) ,0.01 17.4 ,0.01 5.7 0.06
Sample Size Estimation 2.58 (1.24) ,0.01 66.3 ,0.01 4.56 0.11
Number of Subjects(*) 2 (vs. 3 and others) 0.04 49.7 ,0.01 12.45 0.01
Study Design Yes (vs. No) 19.6 ,0.01 3.77 0.15
Number of Arms Yes (vs. No) 3.06 0.03(*) 1.83 0.17
Use of Placebo Uni (vs. Multicentric)
Wash-out period Yes (vs. No) 20.08(0.02) ,0.01 25.8 ,0.01 5.08 0.08
Centers 0.02(0.01) 0.13 5.7 0.13 9.8 0.01
Intention-to-Treat Adequate (vs. Inadequate) 0.05 (0.02) ,0.01 39.96 ,0.01 6.32 0.17
Methods section Adequate (vs. Inadequate) 0.06 (0.02) ,0.01 16.53 ,0.01 1.26 0.53
Randomization Adequate (vs. Inadequate) 0.15 (0.03) ,0.01 42.6 ,0.01 0.56 0.75
Allocation
Primary Hypothesis Adequate (vs. Inadequate) 0.05 (0.02) 0.01 20.83 ,0.01 8.78 0.02
Results reporting Adequate (vs. Inadequate) 20.02 (0.02) 0.39 6.8 0.08 7.96 0.02
Baseline comparisons Adequate (vs. Inadequate) 21.4 (0.26) ,0.01 24.34 ,0.01 5.78 0.55
Adverse Effects Adequate (vs. Inadequate)
Reasons for drop-outs Para (vs. Non-para) 0.08 (0.02) ,0.01 28.82 ,0.01 6.72 0.04
p value 0.07 (0.02) ,0.01 19.37 ,0.01 2.91 0.23
Test used Positive (vs. others) 0.09 (0.03) 0.34 6.1 0.41 5.63 0.21
Conclusion section Yes (vs. others) 0.11 (0.03) ,0.01 32.1 ,0.01 13.7 0.08
Trial result 0.05 (0.2) ,0.01 15.06 ,0.01 5.57 0.06
Consistency
20.02 (0.01) 0.16 10.23 0.11 14.3 0.06
20.01 (0.2) ,0.01 35.8 ,0.01 19.2 ,0.01

We used the logistic regression model to analyze the association between each outcome variable (treated as categorical data) and the predictor variable year (treated as
continuous data). Also, we used the Chi-square test or the Fisher’s exact test for the predictor variables year (when treated as ordinal data, divided in quartiles) and Drug
Class, treated as ordinal data, divided in mood stabilizers, antipsychotics and others (fluoxetine and diazepam).
(*)for number of subjects we used the one-way ANOVA. B (SE) represents B value and its standard deviation.
doi:10.1371/journal.pone.0009479.t003

more-arm designs over time. Placebo use did not change. We of clinical trials, which ultimately led to the CONSORT statement
further discuss some topics in which these changes impacted the [46]. However, our results showed that abstract reporting
development of clinical trials and discuss future directions based improved significantly before CONSORT; on the other hand,
on these results. recent reviews [47,48] of abstract reporting in top impact-factor
journals showed improvement also after CONSORT and also that
First, some limitations should be addressed. One issue is that we many top journals had not been referring to CONSORT or
based our results on the reports; therefore it is possible that some alternative abstract guidelines, or had referred to old CONSORT
methodological flaws we encountered were due to lack of versions. Thus, another possible reason for this improvement is
reporting. Also, publication bias was a potential issue in our study that the abstract gained more importance recently as it is openly
as we limited our study to articles published only in high-standard available in web databases, becoming an essential piece of
journals. information to decide whether or not the full manuscript should
be read. In fact, frequently, only the abstract is read, thus
We observed that the quality of abstract reporting improved supporting its conciseness showing the main characteristics of
over the past 60 years. One possible explanation is that journal study design (the reader should understand how the main
editors and clinical researchers had noticed that reports of hypothesis was tested by reading the abstract), main results
statistics, randomization and baseline comparisons were poor
[44,45] and proposed a set of guidelines to improve the reporting

PLoS ONE | www.plosone.org 8 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Figure 4. Changes in abstract reporting over time. Blue, red, and green bars show the number of trials adequately reporting background,
methods and results in the abstract, respectively, at each period of time. The number of trials per period was 24 (1961 and earlier), 23 (1962–1974), 22
(1975–1989) and 22 (1990 and after).
doi:10.1371/journal.pone.0009479.g004

presented as clearly and simply as possible and future implications than personal judgment on improvement. Using structured
of findings, avoiding overstatements. questionnaires improves study validity and reliability – as they
are more sensitive to perform differential diagnoses [49] and
Moreover, more trials reported the eligibility criteria used, have more agreement between raters than unstructured
confirmed the diagnostics with structured interviews rather evaluations [50]. Reporting the eligibility criteria and using
than clinical evaluation and used severity rating scales rather

Figure 5. Changes in study design over time. Blue bars represent the number of trials performing two-arm studies; red bars are the trials
performing three-arm studies. Green bar represent studies using other designs.The number of trials per period was 24 (1961 and earlier), 23 (1962–
1974), 22 (1975–1989) and 22 (1990 and after).
doi:10.1371/journal.pone.0009479.g005

PLoS ONE | www.plosone.org 9 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Figure 6. Changes in study methodology over time (1). Blue bars represent the number of trials that had a placebo arm at each period of time.
Red bars represent the number of studies using intention-to-treat techniques. Green bars represent the number of studies that clearly reported their
eligibility criteria.The number of trials per period was 24 (1961 and earlier), 23 (1962–1974), 22 (1975–1989) and 22 (1990 and after).
doi:10.1371/journal.pone.0009479.g006

severity rating scales allows readers and researchers to assess the instance, according to DSM-IV criteria, there are 93 different
targeted sample and thus to evaluate the generalizability of the combinations of depressive symptoms [6], reflecting patients
study results [51]. However, diagnostic criteria standardization with different characteristics that are in the same ‘‘depression
can also generate heterogeneous diagnostic groups. For DSM-IV’’ classification.

Figure 7. Changes in study methodology over time (2). Blue bars represent the number of trials that adequately reported randomization
methods at each period of time. Red bars represent the number of studies adequately reporting allocation methods. Green bars represent the studies
that adequately stated their primary hypothesis.The number of trials per period was 24 (1961 and earlier), 23 (1962–1974), 22 (1975–1989) and 22
(1990 and after).
doi:10.1371/journal.pone.0009479.g007

PLoS ONE | www.plosone.org 10 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Figure 8. Changes in results reporting over time. Blue bars represent the number of studies that applied parametric tests in their primary
outcome at each time period. Red bars represent the number of studies reporting p values at each time period. Green bars represent the number of
studies fully reporting adverse effects at each time period.The number of trials per period was 24 (1961 and earlier), 23 (1962–1974), 22 (1975–1989)
and 22 (1990 and after).
doi:10.1371/journal.pone.0009479.g008

Figure 9. Study outcomes over time. The figure shows the number of studies in which the conclusion was positive (i.e., confirmed the primary
hypothesis) (blue bars), negative (did not confirm the primary hypothesis) (red bars) or unclear, when the authors did not present a clear conclusion/
interpretation of their results (green bars).The number of trials per period was 24 (1961 and earlier), 23 (1962–1974), 22 (1975–1989) and 22 (1990 and
after).
doi:10.1371/journal.pone.0009479.g009

PLoS ONE | www.plosone.org 11 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

Figure 10. Reliability of study conclusions over time. The figure shows the number of studies in which the conclusion was consistent, i.e.,
supported by the results (blue bars); inconsistent (red bars), and dubious (green bars), when it depends on a particular interpretation of the data (for
instance, post-hoc analysis, multiple outcomes, etc).The number of trials per period was 24 (1961 and earlier), 23 (1962–1974), 22 (1975–1989) and 22
(1990 and after).
doi:10.1371/journal.pone.0009479.g010

Severity rating scales also increase internal validity by addition, we observed that newer trials performed more intention-
addressing drug efficacy either quantitatively (score reduction) or to-treat analysis, a method used to handle with differential
qualitatively (response and remission rates). These rating scales are dropouts in treatment groups, increasing the internal validity of
also useful to screen and recruit patients, assess severity, define the study [54].
predictors of response [52] and importantly, to compare the results
across different studies. Thus, psychometric scales grant more Placebo use did not change and remained elevated over time.
precision when measuring outcomes. On the other hand, they Although a full review on placebo is beyond our scope, two aspects
require proper training to gain satisfactory inter-rater reliability are important: the ethical issues when considering the use of
[53] and also are limited. An example of its limitation can be seen informed consent and the statistical/methodological importance of
through the Hamilton Depression Rating Scale. This scale is placebo in clinical trials. In 1970, Baastrup et al. [30] argued they
excessively weighted in anxiety and somatic symptoms but has would not inform patients that lithium would be changed to
little coverage for important depression symptoms [7]. Therefore, placebo because there was still uncertainty on its prophylactic
although diagnostic standardization certainly increased internal effects. The lack of the principle of autonomy can be seen in which
validity, there is still a significant margin for more diagnostic the patients themselves have the right to decide whether or not is
refinement. in their best interest to, for instance, stop taking a given drug.
Another important issue is that placebo response in comparison to
Sample size increased over time, however this was marginally the active group has increased over time [55], which could
significant and could be explained by one large trial with a very theoretically reflect an improvement in internal validity, as robust
large sample [28]. However, the number of multicentric studies studies are less susceptible to accidentally breaking blinding.
also increased, perhaps explaining this finding. In addition, more Nevertheless, some reasons explaining the past and present
trials performed (or reported) sample size calculations, which can elevated placebo use include: it maximizes assay sensitivity of a
be explained by several reasons, such as: (1) ethical and trial; therefore amplifying the signal [56]; placebo-controlled
economical issues in enrolling more subjects than necessary for studies need smaller sample sizes [13] and the relatively low risk of
the primary hypothesis; (2) statistical improvement over time, using placebo in psychiatric trials for short periods of time [57].
allowing a more precise estimation of sample size; (3) increase in
scientific rigor over time, as researchers are demanded to state Regarding statistics, we observed that more trials reported p
their primary hypothesis a priori; (4) concern with negative results values over time. This trend was also observed in a review of
due to lack of statistical power. statistical methods in rehabilitation literature [58], probably
reflecting more rigor in data reporting as well as more training
Regarding study design, we observed that recent trials favor in clinical research. In fact, perhaps ‘‘forcing’’ the authors (through
two-arm design while old trials favor three-arm and other designs. structured reporting guidelines) to report p values contributed to
Possible reasons are: (1) less prior knowledge on drug effects (e.g., increase their understanding of statistical methods. This is an
carry-over effects); (2) sponsorship interest of pharmaceutical important issue when the statistics is done by a third party
companies on researching a specific drug and; (3) scarce use of statistician. Also, we observed newer trials using more parametric
meta-analytic techniques that favor two-arm studies in the past. In tests for the primary hypothesis. Parametric tests increase study

PLoS ONE | www.plosone.org 12 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

efficiency, as such tests are more powerful and outcomes are psychometrics as well as assimilation of novel breakthrough
expressed in score changes rather than response/relapse rates; methods of clinical trial research. As a result, clinical trial quality
therefore decreasing sample size requirements. However there is a of psychopharmacological studies has changed significantly during
concern whether it is appropriate using parametric tests for the past 60 years in several aspects such as study design, sampling,
psychiatric rating scales, which are constructed by several items randomization, allocation, statistical methods, ethical aspects and
whose range of symptoms assessed are not continuous, but ordinal reporting. In fact, only the use of placebo remained stable in this
(e.g., questions about weight loss are usually divided in less than period. These changes have increased study efficiency and internal
0.5kg; between 0.5-1kg; more than 1kg). validity by systematically detecting, addressing and eliminating
various sorts of bias. However, there is room further improvement
Randomization techniques also improved over time; however in the development of rating scales and more refined diagnostic
the overall number of adequate reporting was quite low, even for criteria as well as better reporting of some aspects of trial
newer trials. This is surprising, as inadequate methods of methodology. Therefore, despite the significant advancements
randomization and allocation are considered major sources of observed with better designed and more reliable trials as compared
bias [59,60]. However, here there is the issue of trial quality vs. to the past, it is still uncertain that we have achieved the optimal
reporting quality that is highly debated in the literature. For clinical trials methodology.
instance, Devereaux et al. [61] contacted authors from 98
randomized controlled trials published after 1997 that failed to Supporting Information
report one or more of the RCT procedures. By asking the authors
of these trials, Deveraux et al. found that although many trials Table S1 QUOROM Checklist.
failed to report some aspects of trial designs, the procedures were Found at: doi:10.1371/journal.pone.0009479.s001 (0.03 MB
indeed performed in almost all studies. On the other hand, DOC)
Liberati et al. [62] reviewed 119 trials published from 1963 to
1986 and concluded that the overall low methodological quality of Table S2 Table S2 shows the main characteristics of each study -
the trials (assessed through a score system) only mildly improved the drug studied, the name of the author, the year and the journal
after a re-checking with the authors; and Schulz et al. [63], published; the disorder analyzed; the study design, the use of wash-
assessed trial quality in 250 RCTs; and found that poor quality is out, run-in, intention-to-treat (ITT) periods and informed consent;
related to bias. In addition, there is no method of choice in the sample size (SS) estimation and the report of the primary
assessing bias and trial quality [64]. hypothesis; the description of methods of randomization, allocation
and blinding; the number of patients enrolled (n) and the duration of
Along these lines, we verified several aspects of study design the trial; the reporting of baseline comparisons between groups,
(baseline group comparisons; adverse effects reporting; dropout drug adverse effects (AE) and reasons for drop-outs (DO) and,
reasons, type of statistical test used) to assess whether the finally; the reporting of p values, score values and effect size (ES)
conclusions presented were consistent. Studies rated as ‘‘inconsis- estimation. Chlor = chlorpromazine; Li = lithium; D = diazepam;
tent’’ were of quite low quality, while ‘‘consistent’’ studies had Cloz = clozapine; Flu = fluoxetine; Risp = risperidone; Lam = lamo-
good quality. Almost one third of the studies were rated of trigine; BJP = The British Journal of Psychiatry; BMJ = The British
‘‘dubious’’ quality in which we did not draw definite conclusions Medical Journal; AJP = The American Journal of Psychiatry;
due to incomplete reporting or tendentious data interpretation. Arch = The Archives of General Psychiatry; JCP = The Journal of
Because of that, we think that an important aim for manuscript Clinical Psychiatry; MDD = major depressive disorder; OCD = ob-
publication is to allow different researchers to replicate and thus to sessive-compulsive disorder; MDI = manic-depressive illness;
test the results of the studies. This would allow readers to critically CO = cross-over.
interpret these studies. In order to do so, the authors must detail Found at: doi:10.1371/journal.pone.0009479.s002 (0.40 MB
carefully the methods of their experiments [65]. Also, there is no DOC)
reason to not fully report all aspects of the study design,
particularly at the present time when journal editors and reviewers Acknowledgments
use structured checklists to assess complete reporting and the
authors are able to address missing points when reviewing their We are grateful to the two reviewers and the editor for their valuable
papers. Finally the issue of space can always be resolved with comments that we believe improved the manuscript. We are also thankful
supplementary online publication (even pointing out the methods for Rasheda El-Nazer who reviewed and copyedited our work.
section to a webpage with detailed methodology is now possible).
Importantly, our results show that newer trials reported more The references for the articles added to our systematic review are:
conclusions in line with the results, thus reflecting gains in chlorpromazine [26,29,31–34,37–39,43,66–80]; lithium [15,30,42,81,84–
reporting and quality. 89,92–97,99–101], diazepam [25,27,41,82,83,90,91], risperidone
[24,28,36,123–127,130,131], clozapine [35,98,102,103,113,121,122], la-
Conclusion motrigine [40,128,129,132–134], and fluoxetine [104–112,114–120].
The psychopharmacological revolution that has been observed
Author Contributions
since 1949 brought significant challenges for psychiatric research,
a field that virtually lacked drug treatment at that time. Some Conceived and designed the experiments: ARB FF. Analyzed the data:
changes include the adoption of operational diagnostic criteria and ARB LT FF. Wrote the paper: ARB LT FF.

References 4. Thase ME (2002) Studying new antidepressants: if there were a light at the end
of the tunnel, could we see it? J Clin Psychiatry 63 Suppl 2: 24–28.
1. Ban TA (2006) A history of the Collegium Internationale Neuro-Psychophar-
macologicum (1957–2004). Prog Neuropsychopharmacol Biol Psychiatry 30: 5. Lecrubier Y (2008) Refinement of diagnosis and disease classification in
599–616. psychiatry. Eur Arch Psychiatry Clin Neurosci 258 Suppl 1: 6–11.

2. Andreasen NC (2007) DSM and the death of phenomenology in america: an 6. Duffy FF, Chung H, Trivedi M, Rae DS, Regier DA, et al. (2008) Systematic
example of unintended consequences. Schizophr Bull 33: 108–112. use of patient-rated depression severity monitoring: is it helpful and feasible in
clinical psychiatry? Psychiatr Serv 59: 1148–1154.
3. Todd S (2007) A 25-year review of sequential methodology in clinical studies.
Stat Med 26: 237–252.

PLoS ONE | www.plosone.org 13 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

7. Gelenberg AJ, Thase ME, Meyer RE, Goodwin FK, Katz MM, et al. (2008) refractory schizophrenic patients: A prospective study. Journal of Clinical
The history and current state of antidepressant clinical trial design: a call to Psychiatry 59: 521–527.
action for proof-of-concept studies. J Clin Psychiatry 69: 1513–1528. 36. Ho BC, Miller D, Nopoulos P, Andreasen NC (1999) A comparative
effectiveness study of risperidone and olanzapine in the treatment of
8. Bauer M (2004) Review: lithium reduces relapse rates in people with bipolar schizophrenia. Journal of Clinical Psychiatry 60: 658–663.
disorder. Evid Based Ment Health 7: 72. 37. King PD (1958) REGRESSIVE EST, CHLORPROMAZINE, AND
GROUP-THERAPY IN TREATMENT OF HOSPITALIZED CHRON-
9. Himwich HE (1958) Psychopharmacologic drugs. Science 127: 59–72. IC-SCHIZOPHRENICS. American Journal of Psychiatry 115: 354–357.
10. Giusti P, Arban R (1993) Physiological and pharmacological bases for the 38. Hamilton M, Smith ALG, Lapidus HE, Cadogan EP (1960) A CON-
TROLLED TRIAL OF THIOPROPAZATE DIHYDROCHLORIDE
diverse properties of benzodiazepines and their congeners. Pharmacol Res 27: (DARTALAN), CHLORPROMAZINE AND OCCUPATIONAL-THERA-
201–215. PY IN CHRONIC-SCHIZOPHRENICS. Journal of Mental Science 106:
11. Davis JM, Chen N, Glick ID (2003) A meta-analysis of the efficacy of second- 40–55.
generation antipsychotics. Arch Gen Psychiatry 60: 553–564. 39. Abse DW, Dahlstrom WG (1960) THE VALUE OF CHEMOTHERAPY IN
12. Wong DT, Perry KW, Bymaster FP (2005) Case history: the discovery of SENILE MENTAL DISTURBANCES - CONTROLLED COMPARISON
fluoxetine hydrochloride (Prozac). Nat Rev Drug Discov 4: 764–774. OF CHLORPROMAZINE, RESERPINE-PIPRADROL, AND OPIUM.
13. Adam D, Kasper S, Moller HJ, Singer EA (2005) Placebo-controlled trials in Jama-Journal of the American Medical Association 174: 2036–2042.
major depression are necessary and ethically justifiable: how to improve the 40. Barbosa L, Berk M, Vorster M (2003) A double-blind, randomized, placebo-
communication between researchers and ethical committees. Eur Arch controlled trial of augmentation with lamotrigine or placebo in patients
Psychiatry Clin Neurosci 255: 258–260. concomitantly treated with fluoxetine for resistant major depressive episodes.
14. Weisler RH, Calabrese JR, Bowden CL, Ascher JA, DeVeaugh-Geiss J, et al. Journal of Clinical Psychiatry 64: 403–407.
(2008) Discovery and development of lamotrigine for bipolar disorder: a story 41. HollisteLe, Overall JE, Pokorny AD, Shelton J (1971) ACETOPHENAZINE
of serendipity, clinical observations, risk taking, and persistence. J Affect Disord AND DIAZEPAM IN ANXIOUS DEPRESSIONS. Arch Gen Psychiatry 24:
108: 1–9. 273–&.
15. Fieve RR, Platman SR, Plutchik RR (1968) USE OF LITHIUM IN 42. Platman SR (1970) A COMPARISON OF LITHIUM CARBONATE AND
AFFECTIVE DISORDERS. 2. PROPHYLAXIS OF DEPRESSION IN CHLORPROMAZINE IN MANIA. American Journal of Psychiatry 127:
CHRONIC RECURRENT AFFECTIVE DISORDER. American Journal of 351–&.
Psychiatry 125: 492–&. 43. Robinson DB (1959) EVALUATION OF CERTAIN DRUGS IN GERIAT-
16. DSM-II (1968). RIC-PATIENTS - EFFECTS OF CHLORPROMAZINE, RESERPINE,
17. Higgins J, Green S (2009) Cochrane Handbook for Systematic Reviews of PENTYLENETETRAZOL U.S.P., AND PLACEBO ON 84 FEMALE
Interventions Version 5.0.2. (updated september 2009). The Cochrane GERIATRIC-PATIENTS IN A STATE-HOSPITAL. Arch Gen Psychiatry
Collaboration, 2008. Available from www.cochrane-handbook.org. 1: 41–46.
18. Whitehead PL, Clark LD (1970) Effect of lithium carbonate, placebo, and 44. Pocock SJ, Hughes MD, Lee RJ (1987) Statistical problems in the reporting of
thioridazine on hyperactive children. Am J Psychiatry 127: 824–825. clinical trials. A survey of three medical journals. N Engl J Med 317: 426–432.
19. Boutron I, Guittet L, Estellat C, Moher D, Hrobjartsson A, et al. (2007) 45. Matthews JN, Altman DG, Campbell MJ, Royston P (1990) Analysis of serial
Reporting methods of blinding in randomized trials assessing nonpharmaco- measurements in medical research. Bmj 300: 230–235.
logical treatments. PLoS Med 4: e61. 46. Begg C, Cho M, Eastwood S, Horton R, Moher D, et al. (1996) Improving the
20. Glasser SP, Howard G (2006) Clinical trial design issues: at least 10 things you quality of reporting of randomized controlled trials. The CONSORT
should look for in clinical trials. J Clin Pharmacol 46: 1106–1115. statement. Jama 276: 637–639.
21. Hopewell S, Clarke M, Moher D, Wager E, Middleton P, et al. (2008) 47. Altman DG (2005) Endorsement of the CONSORT statement by high impact
CONSORT for reporting randomised trials in journal and conference medical journals: survey of instructions for authors. Bmj 330: 1056–1057.
abstracts. Lancet 371: 281–283. 48. Moher D, Jones A, Lepage L (2001) Use of the CONSORT statement and
22. Leucht S, Heres S, Hamann J, Kane JM (2008) Methodological issues in quality of reports of randomized trials: a comparative before-and-after
current antipsychotic drug trials. Schizophr Bull 34: 275–285. evaluation. JAMA 285: 1992–1995.
23. Zlowodzki M, Jonsson A, Bhandari M (2006) Common pitfalls in the conduct 49. Zimmerman M, Mattia JI (1999) Psychiatric diagnosis in clinical practice: is
of clinical research. Med Princ Pract 15: 1–8. comorbidity being missed? Compr Psychiatry 40: 182–191.
24. McDougle CJ, Epperson CN, Pelton GH, Wasylink S, Price LH (2000) A 50. Williams JW, Jr., Noel PH, Cordes JA, Ramirez G, Pignone M (2002) Is this
double-blind, placebo-controlled study of risperidone addition in serotonin patient clinically depressed? Jama 287: 1160–1170.
reuptake inhibitor-refractory obsessive-compulsive disorder. Arch Gen Psychi- 51. Tansella M, Thornicroft G, Barbui C, Cipriani A, Saraceno B (2006) Seven
atry 57: 794–801. criteria for improving effectiveness trials in psychiatry. Psychol Med 36:
25. Haider I (1971) COMPARATIVE TRIAL OF LORAZEPAM AND 711–720.
DIAZEPAM. British Journal of Psychiatry 119: 599–&. 52. Johnson T (1998) Clinical trials in psychiatry: background and statistical
26. Dransfield GA (1958) A CLINICAL-TRIAL COMPARING PROCHLOR- perspective. Stat Methods Med Res 7: 209–234.
PERAZINE (STEMETIL) WITH CHLORPROMAZINE (LARGACTIL) IN 53. Muller MJ, Dragicevic A (2003) Standardized rater training for the Hamilton
THE TREATMENT OF CHRONIC PSYCHOTIC-PATIENTS. Journal of Depression Rating Scale (HAMD-17) in psychiatric novices. J Affect Disord 77:
Mental Science 104: 1183–1189. 65–69.
27. Kay DWK, Fahy T, Garside RF (1970) 7-MONTH DOUBLE-BLIND 54. Feinman RD (2009) Intention-to-treat. What is the question? Nutr Metab
TRIAL OF AMITRIPTYLINE AND DIAZEPAM IN ECT-TREATED (Lond) 6: 1.
DEPRESSED PATIENTS. British Journal of Psychiatry 117: 667–&. 55. Walsh BT, Seidman SN, Sysko R, Gould M (2002) Placebo response in studies
28. Peuskens J (1995) RISPERIDONE IN THE TREATMENT OF PATIENTS of major depression: variable, substantial, and growing. Jama 287: 1840–1847.
WITH CHRONIC-SCHIZOPHRENIA - A MULTI-NATIONAL, MULTI- 56. March JS, Silva SG, Compton S, Shapiro M, Califf R, et al. (2005) The case for
CENTER, DOUBLE-BLIND, PARALLEL-GROUP STUDY VERSUS practical clinical trials in psychiatry. Am J Psychiatry 162: 836–846.
HALOPERIDOL. British Journal of Psychiatry 166: 712–726. 57. Kim SY (2003) Benefits and burdens of placebos in psychiatric research.
29. Casey JF, Lasky JJ, Klett CJ, Hollister LE (1960) TREATMENT OF Psychopharmacology (Berl) 171: 13–18.
SCHIZOPHRENIC REACTIONS WITH PHENOTHIAZINE-DERIVA- 58. Schwartz SJ, Sturr M, Goldberg G (1996) Statistical methods in rehabilitation
TIVES - A COMPARATIVE-STUDY OF CHLORPROMAZINE, TRI- literature: a survey of recent publications. Arch Phys Med Rehabil 77:
FLUPROMAZINE, MEPAZINE, PROCHLORPERAZINE, PERPHENA- 497–500.
ZINE, AND PHENOBARBITAL. American Journal of Psychiatry 117: 59. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, et al. (2009) The
97–105. PRISMA statement for reporting systematic reviews and meta-analyses of
30. Baastrup PC, Poulsen JC, Schou M, Thomsen K, Amdisen A (1970) studies that evaluate health care interventions: explanation and elaboration.
PROPHYLACTIC LITHIUM - DOUBLE BLIND DISCONTINUATION PLoS Med 6: e1000100.
IN MANIC-DEPRESSIVE AND RECURRENT-DEPRESSIVE DISOR- 60. Schulz KF, Grimes DA (2002) Allocation concealment in randomised trials:
DERS. Lancet 2: 326–&. defending against deciphering. Lancet 359: 614–618.
31. Boardman RH, Lomas J, Markowe M (1956) INSULIN AND CHLOR- 61. Devereaux PJ, Choi PT, El-Dika S, Bhandari M, Montori VM, et al. (2004) An
PROMAZINE IN SCHIZOPHRENIA - A COMPARATIVE STUDY IN observational study found that authors of randomized controlled trials
PREVIOUSLY UNTREATED CASES. Lancet 271: 487–490. frequently use concealment of randomization and blinding, despite the failure
32. Lomas J (1957) TREATMENT OF SCHIZOPHRENIA PACATAL AND to report these methods. J Clin Epidemiol 57: 1232–1236.
CHLORPROMAZINE COMPARED. British Medical Journal 2: 78–80. 62. Liberati A, Himel HN, Chalmers TC (1986) A quality assessment of
33. Fink M, Shaw R, Gross GE, Coleman FS (1958) COMPARATIVE STUDY randomized control trials of primary treatment of breast cancer. J Clin Oncol
OF CHLORPROMAZINE AND INSULIN COMA IN THERAPY OF 4: 942–951.
PSYCHOSIS. Jama-Journal of the American Medical Association 166: 63. Schulz KF, Chalmers I, Hayes RJ, Altman DG (1995) Empirical evidence of
1846–1850. bias. Dimensions of methodological quality associated with estimates of
34. Foote ES (1958) COMBINED CHLORPROMAZINE AND RESERPINE IN treatment effects in controlled trials. Jama 273: 408–412.
THE TREATMENT OF CHRONIC PSYCHOTICS. Journal of Mental
Science 104: 201–205.
35. Lindenmayer JP, Iskander A, Park M, Apergi FS, Czobor P, et al. (1998)
Clinical and neurocognitive effects of clozapine and risperidone in treatment-

PLoS ONE | www.plosone.org 14 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

64. Juni P, Altman DG, Egger M (2001) Systematic reviews in health care: IN TREATMENT OF ACUTE MANIC STATES. British Journal of
Assessing the quality of controlled clinical trials. Bmj 323: 42–46. Psychiatry 119: 267–&.
89. Prien RF, Caffey EM, Klett CJ (1972) COMPARISON OF LITHIUM
65. Moher D, Schulz KF, Altman D (2005) The CONSORT Statement: revised CARBONATE AND CHLORPROMAZINE IN TREATMENT OF MA-
recommendations for improving the quality of reports of parallel-group NIA - REPORT OF VETERANS-ADMINISTRATION AND NATIONAL-
randomized trials 2001. Explore (NY) 1: 40–45. INSTITUTE OF MENTAL HEALTH COLLABORATIVE STUDY
GROUP. Arch Gen Psychiatry 26: 146–&.
66. Rees WL, Lambert C (1955) THE VALUE AND LIMITATIONS OF 90. Wadzisz FJ (1972) COMPARISON OF OXYPERTINE AND DIAZEPAM
CHLORPROMAZINE IN THE TREATMENT OF ANXIETY STATES. IN ANXIETY NEUROSIS SEEN IN HOSPITAL OUT-PATIENTS. British
Journal of Mental Science 101: 834–840. Journal of Psychiatry 121: 507–&.
91. Marks IM, Gardner R, Viswanat R, Lipsedge MS (1972) ENHANCED
67. Salisbury BJ, Hare EH (1957) RITALIN AND CHLORPROMAZINE IN RELIEF OF PHOBIAS BY FLOODING DURING WANING DIAZEPAM
CHRONIC-SCHIZOPHRENIA - A CONTROLLED CLINICAL-TRIAL. EFFECT. British Journal of Psychiatry 121: 493–&.
Journal of Mental Science 103: 830–834. 92. Prien RF, Caffey EM, Klett CJ (1973) PROPHYLACTIC EFFICACY OF
LITHIUM CARBONATE IN MANIC-DEPRESSIVE ILLNESS - REPORT
68. Good WW, Sterling M, Holtzman WH (1958) TERMINATION OF OF VETERANS ADMINISTRATION AND NATIONAL INSTITUTE OF
CHLORPROMAZINE WITH SCHIZOPHRENIC-PATIENTS. American MENTAL-HEALTH COLLABORATIVE STUDY GROUP. Arch Gen
Journal of Psychiatry 115: 443–448. Psychiatry 28: 337–341.
93. Naylor GJ, Donald JM, Lepoidev D, Reid AH (1974) DOUBLE-BLIND
69. Smith JA, Christian D, Rutherford A, Mansfield E (1958) A COMPARISON TRIAL OF LONG-TERM LITHIUM-THERAPY IN MENTAL DEFEC-
OF TRIFLUPROMAZINE (VESPRIN), CHLORPROMAZINE AND TIVES. British Journal of Psychiatry 124: 52–57.
PLACEBO IN 85 CHRONIC-PATIENTS. American Journal of Psychiatry 94. Shopsin B, Gershon S, Thompson H, Collins P (1975) PSYCHOACTIVE-
115: 253–254. DRUGS IN MANIA - CONTROLLED COMPARISON OF LITHIUM-
CARBONATE, CHLORPROMAZINE, AND HALOPERIDOL. Arch Gen
70. Fleming BG, Currie JDC (1958) INVESTIGATION OF A NEW COM- Psychiatry 32: 34–42.
POUND, BW203, AND OF CHLORPROMAZINE IN THE TREAT- 95. Prien RF, Klett CJ, Caffey EM (1974) LITHIUM PROPHYLAXIS IN
MENT OF PSYCHOSIS. Journal of Mental Science 104: 749–757. RECURRENT AFFECTIVE-ILLNESS. American Journal of Psychiatry 131:
198–203.
71. Little JC (1958) A DOUBLE-BLIND CONTROLLED COMPARISON OF 96. Fieve RR, Dunner DL, Kumbarachi T, Stallone F (1975) LITHIUM-
THE EFFECTS OF CHLORPROMAZINE, BARBITURATE AND A CARBONATE IN AFFECTIVE-DISORDERS. 4. DOUBLE-BLIND-
PLACEBO IN 142 CHRONIC PSYCHOTIC INPATIENTS. Journal of STUDY OF PROPHYLAXIS IN UNIPOLAR RECURRENT DEPRES-
Mental Science 104: 334–349. SION. Arch Gen Psychiatry 32: 1541–1544.
97. Takahashi R, Sakuma A, Itoh K, Itoh H, Kurihara M, et al. (1975)
72. Fleming BG, Spencer AM, Whitelaw EM (1959) A CONTROLLED COMPARISON OF EFFICACY OF LITHIUM-CARBONATE AND
COMPARATIVE INVESTIGATION OF THE EFFECTS OF PROMA- CHLORPROMAZINE IN MANIA - REPORT OF COLLABORATIVE
ZINE, CHLORPROMAZINE, AND A PLACEBO IN CHRONIC PSY- STUDY-GROUP ON TREATMENT OF MANIA IN JAPAN. Arch Gen
CHOSIS. Journal of Mental Science 105: 349–358. Psychiatry 32: 1310–1318.
98. Vanpraag HM, Korf J, Dols LCW (1976) CLOZAPINE VERSUS
73. Walsh GP, Walton D, Black DA (1959) THE RELATIVE EFFICACY OF PERPHENAZINE - VALUE OF BIOCHEMICAL MODE OF ACTION
VESPRAL AND CHLORPROMAZINE IN THE TREATMENT OF A OF NEUROLEPTICS IN PREDICTING THEIR THERAPEUTIC AC-
GROUP OF CHRONIC-SCHIZOPHRENIC PATIENTS. Journal of TIVITY. British Journal of Psychiatry 129: 547–555.
Mental Science 105: 199–209. 99. Coppen A, Montgomery SA, Gupta RK, Bailey JE (1976) DOUBLE-BLIND
COMPARISON OF LITHIUM-CARBONATE AND MAPROTILINE IN
74. King PD, Weinberger W (1959) COMPARISON OF PROCLORPERAZINE PROPHYLAXIS OF AFFECTIVE-DISORDERS. British Journal of Psychi-
AND CHLORPROMAZINE IN HOSPITALIZED CHRONIC-SCHIZO- atry 128: 479–485.
PHRENICS. American Journal of Psychiatry 115: 1026–1027. 100. Dunner DL, Stallone F, Fieve RR (1976) LITHIUM-CARBONATE AND
AFFECTIVE-DISORDERS. 5. DOUBLE-BLIND-STUDY OF PROPHY-
75. Gilmore TH, Shatin L (1959) QUANTITATIVE COMPARISON OF LAXIS OF DEPRESSION IN BIPOLAR ILLNESS. Arch Gen Psychiatry 33:
CLINICAL EFFECTIVENESS OF CHLORPROMAZINE AND PROMA- 117–120.
ZINE. Journal of Mental Science 105: 508–510. 101. Watanabe S, Ishino H, Otsuki S (1975) DOUBLE-BLIND COMPARISON
OF LITHIUM-CARBONATE AND IMIPRAMINE IN TREATMENT OF
76. Hurst L (1960) CHLORPROMAZINE AND PECAZINE IN CHRONIC- DEPRESSION. Arch Gen Psychiatry 32: 659–668.
SCHIZOPHRENIA. Journal of Mental Science 106: 726–731. 102. Gelenberg AJ, Doller JC (1979) CLOZAPINE VERSUS CHLORPROMA-
ZINE FOR THE TREATMENT OF SCHIZOPHRENIA - PRELIMI-
77. Casey JF, Bennett IF, Lindley CJ, Hollister LE, Gordon MH, et al. (1960) NARY-RESULTS FROM A DOUBLE-BLIND-STUDY. Journal of Clinical
DRUG-THERAPY IN SCHIZOPHRENIA - A CONTROLLED-STUDY Psychiatry 40: 238–240.
OF THE RELATIVE EFFECTIVENESS OF CHLORPROMAZINE, 103. Shopsin B, Klein H, Aaronsom M, Collora M (1979) CLOZAPINE,
PROMAZINE, PHENOBARBITAL, AND PLACEBO. Arch Gen Psychiatry CHLORPROMAZINE, AND PLACEBO IN NEWLY HOSPITALIZED,
2: 210–220. ACUTELY SCHIZOPHRENIC-PATIENTS - CONTROLLED, DOUBLE-
BLIND COMPARISON. Arch Gen Psychiatry 36: 657–664.
78. Casey JF, Lasky JJ, Hollister LE, Klett CJ, Caffey EM (1961) COMBINED 104. Bremner JD (1984) FLUOXETINE IN DEPRESSED-PATIENTS - A
DRUG-THERAPY OF CHRONIC-SCHIZOPHRENICS - CON- COMPARISON WITH IMIPRAMINE. Journal of Clinical Psychiatry 45:
TROLLED EVALUATION OF PLACEBO, DEXTRO-AMPHETAMINE, 414–419.
IMIPRAMINE, ISOCARBOXAZID AND TRIFLUOPERAZINE ADDED 105. Chouinard G (1985) A DOUBLE-BLIND CONTROLLED CLINICAL-
TO MAINTENANCE DOSES OF CHLORPROMAZINE. American TRIAL OF FLUOXETINE AND AMITRIPTYLINE IN THE TREAT-
Journal of Psychiatry 117: 997–&. MENT OF OUTPATIENTS WITH MAJOR DEPRESSIVE DISORDER.
Journal of Clinical Psychiatry 46: 32–37.
79. Ashcroft GW, Macdougall EJ, Barker PA (1961) A COMPARISON OF 106. Cohn JB, Wilcox C (1985) A COMPARISON OF FLUOXETINE,
TETRABENAZINE AND CHLORPROMAZINE IN CHRONIC- IMIPRAMINE, AND PLACEBO IN PATIENTS WITH MAJOR DEPRES-
SCHIZOPHRENIA. Journal of Mental Science 107: 287–&. SIVE DISORDER. Journal of Clinical Psychiatry 46: 26–31.
107. Rickels K, Smith WT, Glaudin V, Amsterdam JB, Weise C, et al. (1985)
80. Wilson IC, Sandifer MG, McKay J (1961) DOUBLE-BLIND TRIAL TO COMPARISON OF 2 DOSAGE REGIMENS OF FLUOXETINE IN
INVESTIGATE EFFECTS OF THORAZINE (LARGACTIL, CHLOR- MAJOR DEPRESSION. Journal of Clinical Psychiatry 46: 38–41.
PROMAZINE), COMPAZINE (STEMETIL, PROCHLORPERAZINE) 108. Feighner JP (1985) A COMPARATIVE TRIAL OF FLUOXETINE AND
AND STELAZINE (TRIFLUOPERAZINE) IN PARANOID SCHIZO- AMITRIPTYLINE IN PATIENTS WITH MAJOR DEPRESSIVE DISOR-
PHRENIA. Journal of Mental Science 107: 90–&. DER. Journal of Clinical Psychiatry 46: 369–372.
109. Feighner JP, Cohn JB (1985) DOUBLE-BLIND COMPARATIVE TRIALS
81. Maggs R (1963) TREATMENT OF MANIC ILLNESS WITH LITHIUM- OF FLUOXETINE AND DOXEPIN IN GERIATRIC-PATIENTS WITH
CARBONATE. British Journal of Psychiatry 109: 56–&. MAJOR DEPRESSIVE DISORDER. Journal of Clinical Psychiatry 46:
20–25.
82. Capstick NS, Corbett MF, Pare CMB, Pryce IG, Rees WL (1965) A 110. Fabre LF, Putman HP (1987) A FIXED-DOSE CLINICAL-TRIAL OF
COMPARATIVE TRIAL OF DIAZEPAM (VALIUM) AND AMYLOBAR- FLUOXETINE IN OUTPATIENTS WITH MAJOR DEPRESSION.
BITONE. British Journal of Psychiatry 111: 517–519. Journal of Clinical Psychiatry 48: 406–408.

83. McDowall A, Owen S, Robin AA (1966) A CONTROLED COMPARISON
OF DIAZEPAM AND AMYLOBARBITONE IN ANXIETY STATES.
British Journal of Psychiatry 112: 629–&.

84. Melia PI (1970) PROPHYLACTIC LITHIUM - A DOUBLE-BLIND TRIAL
IN RECURRENT AFFECTIVE DISORDERS. British Journal of Psychiatry
116: 621–&.

85. Spring G, Schweid D, Gray C, Steinberg J, Horwitz M (1970) DOUBLE-
BLIND COMPARISON OF LITHIUM AND CHLORPROMAZINE IN
TREATMENT OF MANIC STATES. American Journal of Psychiatry 126:
1306–1310.

86. Stokes PE, Stoll PM, Shamoian CA, Patton MJ (1971) EFFICACY OF
LITHIUM AS ACUTE TREATMENT OF MANIC-DEPRESSIVE ILL-
NESS. Lancet 1: 1319–&.

87. Coppen A, Noguera R, Bailey J, Burns BN, Swani MS, et al. (1971)
PROPHYLACTIC LITHIUM IN AFFECTIVE DISORDERS - CON-
TROLLED TRIAL. Lancet 2: 275–&.

88. Johnson G, Gershon S, Burdock EI, Floyd A, Hekimian L (1971)
COMPARATIVE EFFECTS OF LITHIUM AND CHLORPROMAZINE

PLoS ONE | www.plosone.org 15 March 2010 | Volume 5 | Issue 3 | e9479

Overview of Clinical Trials

111. Young JPR, Coleman A, Lader MH (1987) A CONTROLLED COMPAR- TOMS IN OUTPATIENTS WITH SCHIZOPHRENIA. American Journal
ISON OF FLUOXETINE AND AMITRIPTYLINE IN DEPRESSED OUT- of Psychiatry 151: 20–26.
PATIENTS. British Journal of Psychiatry 151: 337–340. 123. Marder SR, Meibach RC (1994) RISPERIDONE IN THE TREATMENT
OF SCHIZOPHRENIA. American Journal of Psychiatry 151: 825–835.
112. Levine S, Deo R, Mahadevan K (1987) A COMPARATIVE TRIAL OF A 124. Bondolfi G, Dufour H, Patris M, May JP, Billeter U, et al. (1998) Risperidone
NEW ANTIDEPRESSANT, FLUOXETINE. British Journal of Psychiatry versus clozapine in treatment-resistant chronic schizophrenia: A randomized
150: 653–655. double-blind study. American Journal of Psychiatry 155: 499–504.
125. Wirshing DA, Marshall BD, Green MF, Mintz J, Marder SR, et al. (1999)
113. Kane J, Honigfeld G, Singer J, Meltzer H (1988) CLOZAPINE FOR THE Risperidone in treatment-refractory schizophrenia. American Journal of
TREATMENT-RESISTANT SCHIZOPHRENIC - A DOUBLE-BLIND Psychiatry 156: 1374–1379.
COMPARISON WITH CHLORPROMAZINE. Arch Gen Psychiatry 45: 126. Breier AF, Malhotra AK, Su TP, Pinals DA, Elman I, et al. (1999) Clozapine
789–796. and risperidone in chronic schizophrenia: Effects on symptoms, parkinsonian
side effects, and neuroendocrine response. American Journal of Psychiatry 156:
114. Debus JR, Rush J, Himmel C, Tyler D, Polatin P, et al. (1988) FLUOXETINE 294–298.
VERSUS TRAZODONE IN THE TREATMENT OF OUTPATIENTS 127. Katz IR, Jeste DV, Mintzer JE, Clyde C, Napolitano J, et al. (1999)
WITH MAJOR DEPRESSION. Journal of Clinical Psychiatry 49: 422–426. Comparison of risperidone and placebo for psychosis and behavioral
disturbances associated with dementia: A randomized, double-blind trial.
115. Laakmann G, Blaschke D, Engel R, Schwarz A (1988) FLUOXETINE VS Journal of Clinical Psychiatry 60: 107–+.
AMITRIPTYLINE IN THE TREATMENT OF DEPRESSED OUT- 128. Calabrese JR, Bowden CL, Sachs GS, Ascher JA, Monaghan E, et al. (1999) A
PATIENTS. British Journal of Psychiatry 153: 64–68. double-blind placebo-controlled study of lamotrigine monotherapy in outpa-
tients with bipolar I depression. Journal of Clinical Psychiatry 60: 79–+.
116. Montgomery SA, Dufour H, Brion S, Gailledreau J, Laqueille X, et al. (1988) 129. Calabrese JR, Suppes T, Bowden CL, Sachs GS, Swann AC, et al. (2000) A
THE PROPHYLACTIC EFFICACY OF FLUOXETINE IN UNIPOLAR double-blind, placebo-controlled, prophylaxis study of lamotrigine in rapid-
DEPRESSION. British Journal of Psychiatry 153: 69–76. cycling bipolar disorder. Journal of Clinical Psychiatry 61: 841–850.
130. Azorin JM, Spiegel R, Remington G, Vanelle JM, Pere JJ, et al. (2001) A
117. Perry PJ, Garvey MJ, Kelly MW, Cook BL, Dunner FJ, et al. (1989) A double-blind comparative study of clozapine and risperidone in the
COMPARATIVE TRIAL OF FLUOXETINE VERSUS TRAZODONE IN management of severe chronic schizophrenia. American Journal of Psychiatry
OUTPATIENTS WITH MAJOR DEPRESSION. Journal of Clinical 158: 1305–1313.
Psychiatry 50: 290–294. 131. Conley RR, Mahmoud R (2001) A randomized double-blind study of
risperidone and olanzapine in the treatment of schizophrenia or schizoaffective
118. Pigott TA, Pato MT, Bernstein SE, Grover GN, Hill JL, et al. (1990) disorder. American Journal of Psychiatry 158: 765–774.
CONTROLLED COMPARISONS OF CLOMIPRAMINE AND FLUOX- 132. Normann C, Hummel B, Scharer LO, Horn M, Grunze H, et al. (2002)
ETINE IN THE TREATMENT OF OBSESSIVE-COMPULSIVE DISOR- Lamotrigine as adjunct to paroxetine in acute depression: A placebo-controlled,
DER - BEHAVIORAL AND BIOLOGICAL RESULTS. Arch Gen double-blind study. Journal of Clinical Psychiatry 63: 337–344.
Psychiatry 47: 926–932. 133. Bowden CL, Calabrese JR, Sachs G, Yatham LN, Asghar SA, et al. (2003) A
placebo-controlled 18-month trial of lamotrigine and lithium maintenance
119. Usher RW, Beasley CM, Bosomworth JC (1991) EFFICACY AND SAFETY treatment in recently manic or hypomanic patients with bipolar I disorder.
OF MORNING VERSUS EVENING FLUOXETINE ADMINISTRA- Arch Gen Psychiatry 60: 392–400.
TION. Journal of Clinical Psychiatry 52: 134–136. 134. Calabrese JR, Bowden CL, Sachs G, Yatham LN, Behnke K, et al. (2003) A
placebo-controlled 18-month trial of lamotrigine and lithium maintenance
120. Feighner JP, Gardner EA, Johnston JA, Batey SR, Khayrallah MA, et al. (1991) treatment in recently depressed patients with bipolar I disorder. Journal of
DOUBLE-BLIND COMPARISON OF BUPROPION AND FLUOXE- Clinical Psychiatry 64: 1013–1024.
TINE IN DEPRESSED OUTPATIENTS. Journal of Clinical Psychiatry 52:
329–335.

121. Pickar D, Owen RR, Litman RE, Konicki PE, Gutierrez R, et al. (1992)
CLINICAL AND BIOLOGIC RESPONSE TO CLOZAPINE IN PA-
TIENTS WITH SCHIZOPHRENIA - CROSSOVER COMPARISON
WITH FLUPHENAZINE. Arch Gen Psychiatry 49: 345–353.

122. Breier A, Buchanan RW, Kirkpatrick B, Davis OR, Irish D, et al. (1994)
EFFECTS OF CLOZAPINE ON POSITIVE AND NEGATIVE SYMP-

PLoS ONE | www.plosone.org 16 March 2010 | Volume 5 | Issue 3 | e9479


Click to View FlipBook Version