Biostatistics Advance Access published April 5, 2006 Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
The logistic-transform for bounded outcome scores
Emmanuel Lesaffre,1,∗ Dimitris Rizopoulos1 and Roula Tsonaka1
1Biostatistical Centre, Catholic University of Leuven,
U.Z. St. Rafa¨el, Kapucijnenvoer 35, B–3000 Leuven, Belgium
March 16, 2006
Summary. The logistic transformation, originally suggested by Johnson (1949), is ap-
plied to analyze responses that are restricted to a finite interval (e.g., (0,1)), so-called
bounded outcome scores. Bounded outcome scores often have a non-standard distri-
bution, e.g., J- or U -shaped, precluding classical parametric statistical approaches for
analysis. Applying the logistic transformation on a normally distributed random vari-
able, gives rise to a logit-normal distribution. This distribution can take a variety of
shapes on (0,1). Further, the model can be extended to correct for (baseline) covariates.
Therefore, the method could be useful for comparative clinical trials. Bounded outcomes
can be found in many research areas, e.g., drug compliance research, quality-of-life stud-
ies and pain (and pain-relief) studies using visual analogue scores, but all of these scores
can attain the boundary values 0 or 1. A natural extension of the above approach is
therefore to assume a latent score on (0,1) having a logit-normal distribution. Two cases
are considered: (a) the bounded outcome score is a proportion where the true probabili-
ties have a logit-normal distribution on (0,1), (b) the bounded outcome score on [0,1] is
a coarsened version of a latent score with a logit-normal distribution on (0,1). We also
allow the variance (on the transformed scale) to depend on treatment. The usefulness of
our approach for comparative clinical trials will be assessed in this paper. It turns out
to be important to distinguish the case of equal and unequal variances. For a bounded
outcome score of the second type and with equal variances, our approach comes close
to ordinal probit regression. However, ignoring the inequality of variances can lead to
highly biased parameter estimates. A simulation study compares the performance of our
approach with the two-sample Wilcoxon test and with ordinal probit regression. Finally,
the different methods are illustrated on two data sets.
∗email: emmanuel.lesaff[email protected]
© The Author 2006. Published by Oxford University Press. All rights reserved.
For permissions, please e-mail: [email protected]
1
Key words: Barthel index, compliance research, bounded outcome scores, logistic- Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
transform, ordinal probit regression
1. Introduction
Bounded outcome scores are measurements that are restricted to a finite interval, which
can be closed, open or half-closed. Examples of bounded outcome scores can be found
in many medical disciplines. For instance, in compliance research one measures the
proportion of days that patients correctly take their drug, hereafter denoted as pdays.
Another example is the Barthel-index (Mahoney and Barthel, 1965) which is an Activities
on Daily Living (ADL) scale that (in one version) jumps with steps of 5 from 0 (death
or completely immobilized) to 100 (able to perform all daily activities independently).
This scale is often used in stroke trials to measure the recovery of a patient after an acute
stroke. Finally, in pain and pain-relief studies, a visual analogue score (VAS) is used to
measure the psychological state of the subject.
Bounded outcome scores show a variety of distributions, from unimodal to J- and
U -shaped. These peculiar shapes often motivate the use of non-parametric methods,
like the Wilcoxon test (Lesaffre et al., 1993) when comparing two treatments. However,
possibilities for statistical modeling, e.g., when covariate-adjustment is envisaged, are
then limited. Alternatively, a dichotomized version of the score may be constructed and
analyzed using logistic regression. For instance, the Barthel-index could be split at 0.9. A
value above 0.9 implies that the patients are able to perform most of their daily activities,
and hence the dichotomized Barthel-index has a simple interpretation. However, such
an approach has two disadvantages; firstly the choice of the threshold is usually ad hoc,
and secondly reducing the score to a binary variable may reduce the efficiency of the
comparison. Ordinal regression (McCullagh, 1980) is an alternative method to analyze
bounded outcome scores although this ignores the numeric character of the data.
In this paper we explore the use of the logistic transformation first suggested by
Johnson (1949) to model the distribution of bounded outcome scores on (0,1). However,
many outcome scores are defined on a closed interval. In this case, we assume here
that a latent variable with range (0,1) gives rise to an observed score on [0,1]. Bounded
outcomes on [0,1] can be discrete or of a mixed continuous-discrete type. Here, we will
concentrate on the first type and consider two cases: (1) a proportion where the true
probabilities have a logit-normal distribution on (0, 1), and (2) a discrete score ranging
2
from 0 to 1 interpreted as the grouped version of a continuous latent variable on (0,1). We Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
call the first case the binomial-logit-normal approach, and the second case the coarsening
approach. A typical example of a continuous-discrete score on [0,1] would be the visual
analogue scale, which takes values continuously on (0,1) but can also give the extreme
values 0 or 1 with a non-zero probability.
In Section 2 we indicate the usefulness of the logistic transformation for comparative
clinical research with bounded outcomes on (0,1). We then present methods for analyzing
bounded outcomes on [0,1] and focus on the comparison of two treatments. We consider
the cases of equal and unequal variances on the transformed scale. The competitor to
the coarsening approach is the classical ordinal probit (OP) regression. We will show
that OP regression is very similar to the coarsening approach for equal variances, but
with unequal variances it may give a severely biased estimate of the treatment effect.
Section 4 describes a simulation study evaluating the performance of our approaches
for various distributions on [0,1] in comparison with the two-sample Wilcoxon test and
OP regression. Details of the simulation study are given in supplementary material at
http://www.biostatistics.oxfordjournals.org. In Section 5 we illustrate our method first
on pdays, the primary endpoint of the THAMES study, a recent compliance-enhancing
intervention study performed in Belgium. Further, we re-analyzed the primary endpoint
(Barthel index) of the ECASS-1 study, an early placebo-controlled randomized clinical
trial evaluating the effect of a thrombolytic drug on patients with an acute ischemic
stroke. In Section 6 we look at distributions other than the logit-normal, discuss some
other approaches and look at the goodness-of-fit of the logit-normal distribution. Finally,
in Section 7 we summarize our results and make some suggestions for further research in
this area.
2. The logistic transformation and its application to clinical research
Johnson (1949) suggested the logistic transformation Z = α+β log U −a where α, β, a, b ∈
b−U
and U is a score on the interval (a, b). The aim of Johnson was to achieve standard
normality. In the case of proportions, a = 0 and b = 1. Here we take α = 0 and β = 1 and
assume that the logistic transformation achieves a normal density N (µ, σ2). In general,
when Z has density f (z) ≡ f (z; θ) then U has density g(u) ≡ g(u; θ) = f (logit(u)) 1 ,
u(1−u)
where logit(u) = log u . When Z ∼ N (µ, σ2), then U has a logit-normal (LN) distri-
1−u
bution, denoted as LN (µ, σ2) and θT = (µ, σ2).
3
The LN distribution can take very different shapes depending on the choice of µ and
σ2, as is shown in Figure 1.
[Figure 1 about here.]
Note that when µ changes sign this corresponds to mirroring the distribution around
u = 0.5. Hence, the logistic transformation is very well suited to model a variety of
distributions on (0,1). A similar property holds for the Beta family, but Aitchison and
Begg (1976) indicate that the logit-normal distribution is richer and can approximate
any Beta density.
It is clear that when the bounded outcome scores have a LN distribution the analy- Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
sis could be done on the z-scale using classical statistical analyses assuming a Gaussian
distribution. For instance, suppose we wish to compare the effects of control and new
treatments based on a bounded score with distributions LN (µ, σ12) and LN (µ + ∆, σ22),
respectively. When σ12 = σ22 = σ2, a simple unpaired t-test can be calculated on the final
z-values and a 95% confidence interval can be obtained for ∆ (> 0) on the z-scale. The
interpretation of ∆ is more difficult because it represents a location-shift on the trans-
formed scale. Since the logistic transformation is strictly monotone, ∆ = log ν2/(1−ν2) ,
ν1/(1−ν1)
where ν1, ν2 are the medians for the control and new treatment, respectively, on the orig-
inal scale. Figure 2 gives an example of how the location-shift alternative on the Z-scale
is translated into an alternative hypothesis on the observed U -scale, when σ12 = σ22.
[Figure 2 about here.]
The parameter ∆ can also be interpreted in relation to the Wilcoxon-Mann-Whitney
test (Lehmann and D’Abrera, 1998). Assume that Z1 and Z2 are independent random
variables on the transformed scale corresponding to the control and new treatments,
respectively then P (Z2 > Z1) = P∆ = Φ ∆√ , which is also equal to P (U2 > U1) for
σ2
the corresponding original U ’s. Brunner and Munzel (2000) called P∆ the relative effect
of the treatment, which is therefore seen to be determined by the ratio ∆/σ. In general,
the relative effect is equal to F1dF2, where Fj is the cumulative distribution function
of Zj or here equivalently of Uj, (j = 1, 2). Hence, loosely speaking, P∆ determines
the proportion of individuals better off with the new treatment than with the control
treatment. A 95% C.I. for P∆ can be obtained using the Delta method when estimates
for ∆, σ and their covariance matrix are available. If instead transformation to a logistic
4
distribution is envisaged, then ∆/σ can be directly interpreted as a log-odds ratio of
cumulative distribution functions, see Section 6.
When σ1 = σ2 one could use the Welch test (Welch, 1951) on the transformed z-
scale. This test is also called the unpaired t-test for unequal variances. However, it is
well known that ignoring the inequality of the variances, i.e., applying in this case the
classical unpaired t-test, has no great impact on the type I error as long as n1 ≈ n2 (see
e.g., Wetherill, 1960; Murphy, 1967). Further, in this case the relative effect is equal to
P (Z2 > Z1) = P∆ = Φ ∆/ σ12 + σ22 , which can be estimated in a similar manner as
above.
The logistic transformation is useful for power and sample size calculations in a Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
clinical trial with a bounded outcome score U as primary endpoint because the classical
location-shift alternative is most often not appropriate. While power and sample size
calculations are more difficult, they can be realized by first specifying the relative effect
together with σ.
Finally, the logistic transformation is also useful in statistical modeling of bounded
outcome scores on (0, 1). Indeed, the logistic regression model
log U = xT β + σZ, (1)
1−U
with Z ∼ N (0, 1) has been used in various applications (Kieschnick and McCullough,
2003). This approach is especially useful in clinical trial applications when baseline
covariate adjustment is envisaged. Finally, expression (1) can easily be extended to
allow σ to depend on covariates, as for example in (Pourahmadi, 1999).
3. Modeling bounded outcome scores on [0,1]
In this section we primarily focus on bounded outcomes on [0,1] which we denote by Yi
to distinguish them from Ui ∈ (0, 1). Two different types of outcomes are considered
here. Firstly, the bounded outcome scores Yi are observed proportions equal to ri/Ni
(i = 1, . . . , n) whereby ri is the ith count out of Ni units. In this case, Ui represents
the true proportion measured with error by Yi. For instance, in compliance research,
Yi = ri/Ni, whereby ri is the number of days out of Ni on which the ith patient has
correctly taken the drug. Secondly, Yi is a coarsened version of Ui on (0,1). Typical
examples can be found in Quality of Life research (e.g., the Barthel index) where the Yi
denote sums of individual items each measuring an aspect of the subject’s true quality
5
of life Ui. Such a score can be standardized to lie between 0 and 1 whilst taking a finite
number of values.
3.1 Modeling proportions on [0,1]
When the bounded outcome score is a proportion derived from a series of conditionally
(conditional on the subject) independent Bernoulli experiments, then an obvious choice
is to work with a binomial distribution. Namely, in the binomial-logit-normal approach
we assume that,
ri ∼ Bin(Ui, Ni) (i = 1, . . . , n) (2)
with Ui ∼ LN (µ, σ2) and say that ri has a binomial-logit-normal (BLN) distribution. Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
For each value of Ui one observes Ni binary outcomes Wij (j = 1, . . . , Ni) summing up
to ri. For the compliance example, the recorded adherence is the observed proportion
of days that the patients take their medication correctly (with respect to dosage and
timing) in a period of Ni days. In this case, the Ui could be interpreted as the (true but
unobserved) latent adherence of the ith patient to the drug. Observe that this model
is actually a classical measurement error model (Carroll et al., 1995), specifying the
distribution f (Y |U ).
Model (2) can be extended by replacing µ by xiT β, to give a generalized linear mixed
effects model, whereby conditional on Ui the Wij are assumed to be independent. As
indicated above, a further extension allows σ to depend on covariates. Fitting such a
model can be done with, e.g., the SAS procedure NLMIXED or the function lmer() in
package lme4 in R Development Core Team (2005).
3.2 Modeling discrete bounded outcome scores on [0,1]
When Yi is a discrete random variable on [0,1] (e.g., the Barthel index), but not a
proportion, then it is natural to assume that Yi is a grouped version of Ui. As a specific
example, Yi could have been realized from a coarsening mechanism such that Yi = k/m
when (k − 0.5)/m ≤ Ui < (k + 0.5)/m, where (k = 0, . . . , m). At the boundaries, Yi = 0
when 0 < Ui < 0.5/m and Yi = 1 when (m − 0.5)/m < Ui < 1. The above grid of
boundary values is equal for all subjects and is denoted below as:
a0 ≡ 0 < a1 = 0.5/m < . . . < am = (m − 0.5)/m < a(m+1) ≡ 1
However, our approach also allows a grid varying with the subjects.
6
The framework of coarsened data has been formalized by Heitjan and Rubin (1991)
and Heitjan (1993). In their terminology we consider here only deterministic coarsening.
More formally we assume that as(i) ≤ Ui < as(i)+1 when Yi is recorded. For the likelihood
this implies the following expression
n as(i)+1
L(θ; y) = g (ui; θ) du, (3)
i=1 as(i)
where g(·) is the probability density function of the logit-normal distribution. This leads
to the likelihood:
n zs(u(i)) − xTi β −Φ zs(l()i) − xiT β , (4)
σ σ
L(θ; y) ≡ L(β, σ; y) = Φ
i=1
where zs(l()i) = logit(as(i)), zs(u(i)) = logit(as(i)+1) and Φ(·) is the distribution function of the Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
standard normal distribution. At the boundaries, i.e., a0 = 0 and a(m+1) = 1, the values
of zs(i) become −∞ and ∞, respectively. When σ depends on covariates, the above
expression needs to be adapted. For instance, in the two-group comparison expression
(4) splits up in two parts, one with σ1 (first treatment) and the other with σ2 (second
treatment). For obvious reasons we have called this the coarsening (CO) approach either
assuming equal variances or allowing unequal variances.
The maximum likelihood estimates for this model can easily be obtained using stan-
dard numerical optimisation procedures such the BFGS algorithm (Lange, 2004).
3.3 Ordinal regression modeling versus coarsening approach
An interesting feature of the CO model is its resemblance to ordinal regression models.
Bounded outcomes Yi could also be regarded as ordinal response variables taking values in
{0, 1, . . . , m}, where i = 1, . . . , n. The OP regression model (McCullagh, 1980), specifies
that the following equation should hold:
P (Yi ≤ j) = Φ(θj − xiT γ) (i = 1, . . . , n; j = 0, . . . , m), (5)
where −∞ < θ0 ≤ θ1 ≤ . . . ≤ θ(m−1) < ∞, are unknown ordered cut points that need to
be estimated from the data.
Suppose that, after standardization to the [0,1] interval, Yi satisfies the coarsening
7
model of the previous section and σ does not depend on covariates. Then equations (3,
4) imply that
P (Yi ≤ aj) = Φ z(j+1) − xiT β (i = 1, . . . , n; j = 0, . . . , m), (6)
σ
with zj = logit(aj). Consequently, with θj ≡ zj/σ and γ ≡ β/σ, the coarsening model Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
equates to a classical OP regression model. However, in the coarsening model the cut
points are known up to a translation and scale factor, whereas in the OP regression model
they need to be estimated under constraints. Since maximum likelihood estimators are
consistent, θˆj ≈ zj/σˆ for large n so that in this case the two approaches will give similar
parameter estimates, up to a proportionality factor. One might expect the coarsening
approach to be more efficient, since only two parameters need to be estimated, rather
than m + 1 cut points. We give some numerical results in Section 4.
When σ depends on covariates, e.g., when σ1 = σ2 for the two treatment groups, the
CO model as specified by expression (4) will give biased parameter estimates and thus
also a biased estimate of the treatment effect. The same is also true for OP regression. In
fact, for a binary Yi the true treatment effect is equal to β1/σ2, where β1 is the regression
coefficient of the binary treatment indicator (0 for control and 1 otherwise). On the other
hand, the estimate from the OP regression model will converge to (z1/σ1 −z2/σ2)+β1/σ2,
where z1 = logit(a1) when no covariate adjustment is involved. When covariates are
involved their regression coefficients will also be distorted. In Section 3.2 we have seen
that we can adapt the CO approach quite easily to allow for unequal variances. The OP
regression model can also be extended to accommodate this more general case, as has
been done by Johnson and Albert (1999). Their generalization is quite similar to the
CO model with a variance depending on covariates, but cannot be fitted using standard
software.
When the cut points differ between individuals, as in Heitjan and Rubin (1991) and
Heitjan (1993), expression (4) can also be extended easily. The corresponding general-
ization of the OP regression model again leads to the approach of Johnson and Albert.
Finally, when the bounded outcome score is a proportion, we have suggested to fit
the data using the BLN model. Clearly, in this case the OP regression model is not
equivalent to our approach because it does not take into account the variability with
which the proportions ri/Ni are determined.
8
4. Simulation study Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
In this section we describe a simulation study that we have performed to evaluate
our proposals in Sections 3.1 and 3.2 for analyzing bounded outcome scores on [0,1]
in the two-group situation and under the location shift-alternative on the transformed
scale. For more details we refer to supplementary material, which can be found at
http://www.biostatistics.oxfordjournals.org.
4.1 Set up of the simulation study
We have divided the simulation study according to the type of the bounded outcome
score on [0,1]. For a proportion, we have compared the Wilcoxon test and the BLN
approach, and in some cases also the OP regression model, despite its being not strictly
appropriate for proportions. For coarsened data, we have compared the Wilcoxon test,
the OP regression model and the CO-approach. A variety of scenarios were considered,
all involving two-group comparisons.
One of the main purposes of the simulation study is to show that including covariates
can greatly increase the power in detecting a treatment effect when dealing with bounded
outcome scores. Therefore, we considered cases with and without covariates. Further,
we included a variety of distributions on [0,1]. Three different treatment effects were
evaluated, which could be classified as low, moderate and large. Finally, we also varied
the study size, but in all cases specified n1 = n2.
For a proportion, we compared the probability of the type I error and the power
of the three approaches. For coarsened data, we additionally determined the estimated
treatment effect, except of course for the Wilcoxon test.
For coarsened data on [0,1], we considered both the case of equal variances (σ1 = σ2)
and unequal variances (σ1 = σ2), specifically σ1 = 2σ2 and σ2 = 2σ1. Consequently, we
included two versions of the CO approach: (a) assuming equal variances (CO1-approach)
and (b) allowing for unequal variances (CO2-approach). While OP regression is a popular
approach in this setting, the possibility of unequal variances is often neglected. Therefore,
we have included the additional case of unequal variances for coarsened data to highlight
the impact of ignoring inequality of variances. On the other hand, for proportions the
BLN approach is actually the only appropriate method, and can also be easily extended
to the case of unequal variances. Thus, in this case extensive empirical comparison with
9
OP regression is unnecessary. Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
To determine the performance of the different approaches, we used the following
software: (a) Wilcoxon test: the R-function wilcox.test(); (b) OP regression: R-
function polr() from package MASS; (c) BLN approach: generalized mixed-effects model
(SAS proc NLMIXED and R-function lmer() from package lme4) and (d): CO approach:
R-function grouped() from package grouped written by the authors (available from
CRAN http://cran.r-project.org).
4.2 Simulation results and discussion
4.2.1 Proportions on [0,1] When no covariates are involved the performance of the
Wilcoxon test is nearly identical to that of the BLN model. The Pr(type I error) is close
to 0.05 for all cases even for a sample size of n1 = n2 = 20. Further, we observed the
expected positive association of the power with the sample size and the effect size ∆/σ.
However, the power also seems to depend on the shape of the distribution. In particular,
we observed that the U -shaped distribution yields in general a higher power, followed by
the unimodal and the J-shaped distributions. An explanation of this phenomenon lies
in the fact that the latent score Ui (true probability of success) is not known but only
the observed proportion yi. When all true proportions are relatively close to 0 or to 1
the observed proportions will be relatively close to each other. A proof of this is seen in
the power of the Wilcoxon test which shows a similar behavior.
When a significant baseline covariate is included, the Wilcoxon test had a much lower
power than the covariate adjusted parametric models, and mainly as the effect of the
covariate and σ increases. Finally, the BLN and OP models behaved similarly with a
slight inferiority for the latter.
4.2.2 Coarsened data on [0,1] First we summarize the results when there are no
covariates. When σ1 = σ2, the type I error was well preserved for all approaches. Fur-
ther, overall the power of the CO2-approach was less than for the other approaches,
which is natural because the other approaches are developed under the assumption of
equal variances. In all cases the treatment effect was estimated without bias. When
σ1 = σ2, the type I error was well preserved for the CO2-approach, but was sometimes
severely increased for the other approaches. For the CO1- and the OP regression mod-
els, the reason is that the treatment effect is sometimes estimated with a large bias.
10
The anti-conservative character of the Wilcoxon test is explained by its relationship Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
with ordinal logistic regression (McCullagh, 1980). The power of the CO2-approach was
sometimes much less than for the other approaches, but this can be explained by their
anti-conservative character. Indeed, the power of the other approaches was even higher
than the corresponding power obtained from the Welch-test, i.e. when no coarsening is
involved.
When covariates are available, the first and obvious conclusion is that the power can
be greatly improved depending on the relationship of the covariates with the response.
Apart from that, the conclusions are similar to those reported.
4.2.3 Discussion We expected the BLN approach, and especially the CO approach,
to be more powerful than OP regression since the latter requires more parameters to be
estimated. However, the simulation results showed only small differences. We attribute
this to the low correlation of all estimated cut points (except for θ0) with the estimated
regression parameters in the OP regression model.
When the variances are unequal in the treatment groups, our simulations indicated
that the Wilcoxon test, the CO1 approach and classical OP regression yield seriously
distorted type I errors. The two latter approaches also produced severely biased treat-
ment effects, even with n1 = n2. In Wetherill (1960) theoretical calculations revealed
that the type I error and the power of the Wilcoxon test are fairly insensitive to unequal
variances provided n1 = n2. However, we observed that when the data are coarsened
the performance of the Wilcoxon test is severely affected even when the sample sizes are
equal, probably due to the large number of ties. The sensitivity of the CO1 approach
and classical OP regression is explained by the fact that for non-linear models misspecifi-
cation of the (co)variance structure has an impact on the correct estimation of the mean
parameters (see e.g. Butler and Louis, 1992).
5. Applications
5.1 A compliance-enhancing intervention study: THAMES study
Recently, an open-label, multicenter compliance-enhancing intervention (THAMES)
study was completed in Belgium to measure the effect of a program of pharmaceutical
care, designed to enhance adherence to atorvastatin treatment. Four well-defined dis-
11
tricts were identified, two in Flanders (northern Belgium) and two in Wallonia (southern Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
Belgium). In both Flanders and Wallonia, all pharmacists in one of the districts were to
apply measures to improve compliance and enhance persistence, whereas in the second
district no such measures were taken. There were 187 patients in the intervention group
and 182 patients in the control group. All pharmacists were equipped with the MEMS
system, an electronically monitored pharmaceutical package designed to compile the dos-
ing histories of ambulatory patients taking oral medications (Urquhart, 1997). The total
study duration was 12 months. The number of visits to the pharmacy ranged from 5 to
13. At each visit, the patient’s dosing history was checked by means of the electronic
monitoring system. The period between the first and second visit was considered to be
the baseline period. More details on the setup of this intervention study can be found in
Vrijens et al. (2005).
The primary efficacy parameter of the THAMES study was adherence to prescribed
therapy in the post-baseline period whereby adherence was defined for each patient
as the proportion of days during which the MEMS record showed that the patient had
opened the pill container correctly. This variable was also estimated at baseline (baseline
adherence). Finally for the calculation of the “post-baseline adherence” the post-baseline
period was arbitrarily cut-off at day 300.
5.1.1 Baseline covariates The THAMES study could not be randomized due to prac-
tical difficulties. Therefore, we need to compare the baseline covariates of the intervention
and control groups. In Table 1 we compared: gender, age, weight, work status (unem-
ployed versus employed), a cardiovascular risk score (Vrijens et al., 2005), family history
of CHD and the pdays at baseline with the appropriate statistical techniques.
[Table 1 about here.]
The adherence at baseline is of particular interest and the Wilcoxon test gives a significant
result (p = 0.011). The reasons for this significant difference at the start are not clear,
but it requires that the imbalance at the start needs to be taken into account. We were
aware of the potential dangers of correcting for baseline covariates in the presence of
imbalance at baseline (Wainer and Brown, 2004). However, for illustrative purposes we
still performed an ANCOVA type of analysis in the next subsection.
12
5.1.2 Efficacy comparison Initially, we checked for a treatment effect without cor- Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
recting for any baseline covariates. Both the Wilcoxon test and the BLN model gave
a significant intervention effect with p < 0.001. Figure 3 shows histograms for the two
treatment groups with superimposed kernel estimates and fitted logit-normal distribu-
tions.
[Figure 3 about here.]
The logit-normal distributions provide a good fit to the observed data. Since ∆ˆ = 1.054,
and σˆ = 1.848, the estimated effect size ∆ˆ /σˆ is equal to 0.57 with 95% C.I. = (0.36, 0.79),
supporting the intervention effect. The same conclusion can be drawn from Pˆ∆ = 0.66
with 95% C.I. (0.60, 0.71).
We then re-analyzed the data taking into account baseline covariates, including the
logit of baseline adherence. Table 2 shows the results for the BLN model.
[Table 2 about here.]
The effect of intervention and of baseline adherence are both highly significant (p <
0.001). The estimated value of ∆ decreased from 1.054 to 0.818 after taking into account
the imbalance at baseline. Further, the estimated value of σ decreased from 1.848 to
1.433, because the inclusion of covariates decreased the residual variability. As a result,
the intervention effect remained at ∆ˆ /σˆ = 0.57 with 95% C.I. (0.35, 0.79) and Pˆ∆ = 0.66
with 95% C.I. (0.60, 0.72). An additional analysis, allowing for unequal variances for the
latent adherence score, resulted in practically identical parameter estimates and almost
equal variances.
Thus, we can conclude that the intervention significantly improves the adherence of
the patients despite the fact that the two groups differed in adherence already at baseline.
Finally, we did not fit an OP regression model, for reasons stated above.
5.2 A stroke trial: ECASS-1 study
The ECASS-1 stroke study is a double-blind randomized parallel study designed to
compare the effect of alteplase (a thrombolytic drug) and placebo in patients with an
acute ischaemic stroke. In total 545 patients were randomly allocated to the alteplase
13
arm (268 patients) and to the control arm (277 patients). The primary outcome is patient Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
status at 3 months assessed by the Barthel index standardized such that the values lie
in [0, 1]. The score could be considered as a reflection of a latent scale which measures
the ability to cope with a handicap resulting from, say, a cerebrovascular stroke. In this
way it is plausible that the latent score for most if not all patients is less than 1. This
is medically supported since an observed score of 1 does not necessarily imply complete
neurologic recovery of the patient. Also, patients who survived a stroke with an observed
score of zero may have a true latent score close to, rather than equal to, zero, whereas
non-survivors can be considered as having a true score of zero. One way to treat “death”
in the analysis is to regard it as a separate class and hence to distinguish it from the
zero values of survivors. However, in a clinical trial context this approach does not give
a clear picture of the overall effect of the treatment. For this reason we prefer to analyze
the Barthel index using the approach outlined in Section 3.2.
In accordance with Lesaffre et al. (1993) we first performed a Wilcoxon test. This
gives a non significant treatment effect (p = 0.104). Observe that this analysis addresses
the more practical question of how beneficial the thrombolytic treatment is overall, com-
bining ability with mortality.
To apply the coarsening model, we assumed for the Barthel index the coarsening
mechanism outlined in Section 3.2, with m = 20. Without baseline covariates the coars-
ening model assuming equal variances gives ∆ˆ = 0.51 and σˆ = 4.96 yielding a small,
non-significant effect size, namely 0.10 (95% C.I. = (−0.08, 0.28)). The estimated rela-
tive effect is also close to 0.50, i.e., Pˆ∆ = 0.53 with 95% C.I. of (0.48, 0.58). However,
there is strong evidence that the two variances are unequal. A likelihood ratio test
comparing the CO1 with the CO2 model is highly significant (p = 0.0015). The two
variances are estimated as σˆ1 = 4.26 (control) and σˆ2 = 5.91 (alteplase). The treatment
effect now becomes ∆ˆ = 0.99, with (95% C.I. = (−0.01, 2.00), p = 0.053). We again
observe that histograms for the two treatment groups (figure not shown) are well fitted
by logit-normal distributions now assuming unequal variances.
In a second step the baseline covariates gender and age are introduced into the model.
Again, there is evidence for unequal variances. The p-value for the likelihood ratio test
is now 0.0017. Table 3 presents the results of the CO2-approach.
[Table 3 about here.]
14
The two standard deviations are now estimated as σˆ1 = 4.04 (control) and σˆ2 = 5.58 Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
(alteplase). However, while the estimate of ∆ remained at 0.99, including the covariates
resulted in a significant treatment effect (p = 0.042). We do not wish to overemphasize
the significant result of the treatment effect. There does seem to be some effect of
alteplase on average, but treating acute stroke patients with alteplase may also give more
variable outcomes, either very bad (completely disabled or died) or very good (completely
independent of others). The estimated relative effect now becomes Pˆ∆ = 0.57 with 95%
C.I. of (0.53, 0.61).
Finally, we also performed an ordinal probit regression. Nineteen cut points needed
to be estimated. With the same covariates, the estimated treatment effect is equal to
0.13 with 95% C.I. of (−0.05, 0.32) and p-value equal to 0.16. Hence, the results of the
OP regression model and the CO1-approach are quite close, but they differ considerably
from the CO2-approach.
6. Alternative approaches
In this section we will consider only the case of equal variances. When the logistic
transformation yields a logistic distribution on the z-scale, then the effect-size ∆/σ can
be interpreted as a log-odds ratio for the cumulative distributions on the z-scale, i.e.
∆ = log F0(z)/(1 − F0(z)) , (7)
σ F∆(z)/(1 − F∆(z))
where F0(z), F∆(z) are the cumulative logistic distributions under H0 and Ha, respec-
tively. Thus, a generalization of the proportional odds model to continuous data is
obtained. Further, a more robust approach would be to use the logistic-t distribution
instead of the logit-normal (Lange et al., 1989). Of course, for scores on (0,1) a part
of the attractiveness of the approach is lost because we would need to replace classical
techniques, like the t-test, by more sophisticated ones.
Another extension to the logit-normal distribution is obtained by changing the logistic
transformation. Aitchison and Lauder (1985) suggested a Box-Cox transformation. For
the analysis of clinical trial data, we suspect that this extra flexibility has little to offer.
Trigonometric functions like arctan (suitably scaled to [0,1]) provide another class of
transformations to normality. Dexter and Chestnut (1995) experimented with the arc
sine square root transformation for the analysis of VAS pain data. This is an interesting
transformation since it attains its boundary values. Finally, one referee pointed out that
15
OP regression allows the possibility that an arbitrary transformation of the underlying Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
distribution is normally distributed.
A rather different approach has been suggested in Quality of Life Research (Grooten-
dorst, 2000). In this approach a classical distribution, such as the normal distribution,
is assumed to be censored at the boundaries 0 and 1 on the original scale.
7. Conclusion
The logistic-transform is one of the most used transformations in statistics. Hence, we do
not claim originality when proposing the BLN and CO approaches. However, we argue
that the strategy to analyze bounded outcome scores as laid down in this paper has
some advantages over other well established approaches, like ordinal probit and logistic
regression. First, our strategy makes the distinction between the two types of bounded
outcome scores. Secondly, we believe that explicitly drawing the relationship with a
latent normal variable on the transformed scale will help practitioners in planning a
clinical trial even when they decide to use an ordinal regression model for the analysis.
Thirdly, our approach is quite flexible. The extension to unequal variances could be
quite important in practice. Other extensions, like allowing for a varying grid of cut
points, are also easy to do. Finally, we have developed an approach to perform sample
size calculation based on the coarsening approach, see Tsonaka et al. (2005). Given
that the power of the OP regression model is close to that of the coarsening approach in
the case of equal variances, we believe that this approach is also useful for sample size
calculations for ordinal regression.
With respect to further research, we believe that models for repeated bounded out-
come scores using either a multivariate approach as suggested by Aitchison and Shen
(1980) or incorporating random effects could be useful. In this context it would also be
useful to draw the connection with random effects ordinal probit regression. Further,
it is of interest to develop an approach which can handle a bounded outcome score as
a covariate in a regression model. Recently, Liang et al. (2005) have published a re-
lated paper that deals with a bounded covariate. Finally, we intend to explore various
procedures to analyze VAS-data, again allowing the variance to depend on covariates.
16
Acknowledgements Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
The authors acknowledge support from the Interuniversity Attraction Poles Program
P5/24 − Belgian State − Federal Office for Scientific, Technical and Cultural Affairs.
The authors also thank Pfizer Belgium for the permission to use the data of the THAMES
study and Boehringer Ingelheim for the permission to use the ECASS-1 data. Finally,
the authors also acknowledge the comments of a referee, the associate editor and the
editor which improved the readability of the paper considerably.
References
Aitchison, J. and Begg, C. (1976). Statistical diagnosis when cases are not classified with
certainty. Biometrika 63, 1–12.
Aitchison, J. and Lauder, I. J. (1985). Kernel density estimation for compositional data.
Applied Statistics 34, 129–137.
Aitchison, J. and Shen, S. (1980). Logistic-normal distributions: Some properties and
uses. Biometrika 67, 261–272.
Brunner, E. and Munzel, U. (2000). The non-parametric Behrens-Fisher problem: as-
ymptotic theory and small-sample approximation. Biometrical Journal 42, 17–25.
Butler, B. and Louis, T. (1992). Random effects models with non-parametric priors.
Statistics in Medicine 11, 1981–2000.
Carroll, R., Ruppert, D. and Stefanski, L. (1995). Nonlinear Measurement Error Models,
volume 63 of Monographs on Statistics and Applied Probability. Chapman and Hall,
New York.
Dexter, F. and Chestnut, D. (1995). Analysis of statistical tests to compare visual analog
scale measurements among groups. Anesthesiology 82, 896–902.
Grootendorst, P. (2000). Censoring in statistical models of health status: What happens
when one can do better than ’1’. Quality of Life Research 9, 911–914.
Heitjan, D. (1993). Ignorability and coarse data: Some biomedical examples. Biometrics
49, 1099–1109.
Heitjan, D. and Rubin, D. (1991). Ignorability and coarse data. Annals of Statistics 19,
2244–2253.
Johnson, N. (1949). Systems of frequency curves generated by methods of translation.
Biometrika 36, 149–176.
Johnson, S. and Albert, J. ([pp. 161-3], 1999). Ordinal Data Modelling. Springer, New
York.
17
Kieschnick, R. and McCullough, B. (2003). Regression analysis of variates observed on Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
(0,1): percentages, proportions and fractions. Statistical Modelling 3, 193–213.
Lange, K. (2004). Optimization. Springer-Verlag, New York.
Lange, K., Little, R. and Taylor, J. (1989). Robust statistical modeling using the t
distribution. Journal of the American Statistical Association 84, 881–896.
Lehmann, E. L. and D’Abrera, H. J. M. (1998). Nonparametrics: statistical methods
based on ranks. Prentice Hall, Uppler Saddle River, N. J.
Lesaffre, E., Scheys, I., Fr¨ohlich, J. and Bluhmki, E. (1993). Calculation of power and
sample size with bounded outcome scores. Statistics in Medicine 12, 1063–1078.
Liang, L., Shao, J. and Palta, M. (2005). A longitudinal measurement error model with
a semi-continuous covariate. Biometrics 61, 824–830.
Mahoney, F. and Barthel, D. (1965). Functional evaluation: The Barthel Index. Mary-
land State Medical Journal 14, 56–61.
McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical
Society, Series B 42, 109–142.
Murphy, B. (1967). Some two-sample tests when the variances are unequal: a simulation
study. Biometrika 54, 679–683.
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal
data: Unconstrained parametrisation. Biometrika 86, 677–690.
R Development Core Team (2005). R: A language and environment for statistical comput-
ing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
Tsonaka, R., Rizopoulos, D. and Lesaffre, E. (2005). Power and sample size calculations
for discrete bounded outcome scores. Submitted.
Urquhart, J. (1997). The electronic medication event monitor: Lessons for pharma-
cotherapy. Clinical Pharmacokinetic 32, 345–356.
Vrijens, B., Belmans, A., Matthys, K., Klerk, E. D. and Lesaffre, E. (2005). Effect of
intervention through a pharmaceutical care program on patient adherence with pre-
scribed once-daily atorvastatin. Accepted for publication in Pharmacoepidemiology
and Drug Safety.
Wainer, H. and Brown, L. (2004). Two statistical paradoxes in the interpretation of group
differences: Illustrated with medical school admission and licensing data. American
Statistician 58, 117–123.
Welch, B. (1951). On the comparison of several mean values. Biometrika 38, 330–336.
Wetherill, G. (1960). The Wilcoxon test and non-null hypotheses. Journal of the Royal
Statistical Society 22, 402–418.
18
Density Logit−Normal Distribution Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
012345
mean=0, sigma=1
mean=0, sigma=3
mean=2, sigma=1
mean=−1, sigma=1
0.0 0.2 0.4 0.6 0.8 1.0
U
Figure 1. Different logit-normal distributions
19
H0 H0 Ha Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
g(u)
0.0 0.5 1.0 1.5 2.0 2.5
f(z)
0.0 0.1 0.2 0.3
Ha ∆
0.0 0.2 0.4 0.6 0.8 1.0 -4 -2 0 2 4
u z
Figure 2. Correspondence of the location-shift alternative hypothesis on the transformed
Z-scale to the corresponding alternative hypothesis on the observed U -scale
20
Density Kernel Density Estimation Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
0 2 4 6 8 10 Fitted Logit−Normal distribution
based on the Binomial model
Density
0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Adherence for Intervention Group
Kernel Density Estimation
Fitted Logit−Normal distribution
based on the Binomial model
0.0 0.2 0.4 0.6 0.8 1.0
Adherence for Placebo Group
Figure 3. THAMES study: Proportion of correct dosing days
21
Table 1 Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
THAMES study: Comparison of baseline covariates for difference in the two groups. For
categorical variables frequencies (percentages) and for continuous variables the mean, are
reported.
Variable Levels Intervention Control p-value
Total number − 187 182 −
Adherence − 0.93 0.90
Gender 0.011
Age Males 102(55%) 85(45%) 0.161
Weight − 62 60 0.231
Work − 77 78 0.735
Card. Risk 0.029
Fam. Hist. Unemployed 46(25%) 64(35%) 0.013
− 12.74 10.52 0.911
No
145(78%) 142(78%)
22
Table 2 Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
THAMES study: Parameter estimates, standard errors and p-values for the BLN model for
the Compliance data.
Variable Estimate Std. Error p-value
Intercept 0.545 0.249 0.029
Baseline 0.648 0.051
Intervention 0.818 0.161 <0.001
Gender 0.199 <0.001
Age −0.171 0.011
Weight 0.009 0.006 0.391
Work 0.005 0.233 0.403
Card. Risk 0.012 0.429
History −0.240 0.191 0.302
σ −0.021 0.062 0.064
−0.444 0.021
1.433
23
Table 3 Downloaded from http://biostatistics.oxfordjournals.org/ by guest on June 9, 2016
ECASS-1 study: Parameter estimates, standard errors and p-values for the CO2-approach
model for the Barthel index.
Variable Estimate Std. Error p-value
2.099 0.311 <0.001
Intercept 0.992 0.486
Treatment 0.450 0.042
Gender −0.121 0.020 0.79
Age −0.126 0.267 <0.001
σ1 0.444
σ2 4.045
5.577
24