Tutorials in Biostatistics
Tutorials in Biostatistics
Volume 1: Statistical Methods in
Clinical Studies
Edited by
R. B. D’Agostino,
Boston University, USA
Copyright ? 2004 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,
West Sussex PO19 8SQ, England
Telephone (+44) 1243 779777
Email (for orders and customer service enquiries): [email protected]
Visit our Home Page on www.wileyeurope.com or www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval
system or transmitted in any form or by any means, electronic, mechanical, photocopying,
recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents
Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90
Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the
Publisher. Requests to the Publisher should be addressed to the Permissions Department, John
Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England,
or emailed to [email protected], or faxed to (+44) 1243 770571.
This publication is designed to provide accurate and authoritative information in regard to the
subject matter covered. It is sold on the understanding that the Publisher is not engaged in
rendering professional services. If professional advice or other expert assistance is required,
the services of a competent professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore
129809
John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN 0-470-02365-1
Typeset by Macmillan India Ltd
Printed and bound in Great Britain by Page Bros, Norwich
This book is printed on acid-free paper responsibly manufactured from sustainable forestry
in which at least two trees are planted for each one used for paper production.
Contents
Preface vii
Preface to Volume 1 ix
Part I OBSERVATIONAL STUDIES/EPIDEMIOLOGY
1.1 Epidemiology
Computing Estimates of Incidence, Including Lifetime Risk: Alzheimer’s Disease in the 3
Framingham Study. The Practical Incidence Estimators (PIE) Macro. Alexa Beiser,
Ralph B. D’Agostino, Sr, Sudha Seshadri, Lisa M. Sullivan and Philip A. Wolf
The Applications of Capture-Recapture Models to Epidemiological Data. Anne Chao, 31
P. K. Tsay, Sheng-Hsiang Lin, Wen-Yi Shau and Day-Yu Chao
1.2 Adjustment Methods
Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a 67
Non-Randomized Control Group. Ralph B. D’Agostino, Jr.
1.3 Agreement Statistics
Kappa Coe cients in Medical Research. Helen Chmura Kraemer, 85
Vyjeyanthi S. Periyakoil and Art Noda
1.4 Survival Models
Survival Analysis in Observational Studies. Kate Bull and David J. Spiegelhalter 107
Methods for Interval-Censored Data. Jane C. Lindsey and Louise M. Ryan 141
Analysis of Binary Outcomes in Longitudinal Studies Using Weighted Estimating 161
Equations and Discrete-Time Survival Methods: Prevalence and Incidence of
Smoking in an Adolescent Cohort. John B. Carlin, Rory Wolfe, Carolyn Co ey
and George C. Patton
Part II PROGNOSTIC/CLINICAL PREDICTION MODELS
2.1 Prognostic Variables
Categorizing a Prognostic Variable: Review of Methods, Code for Easy Implementation
and Applications to Decision-Making about Cancer Treatments. Madhu Mazumdar
and Jill R. Glassman 189
v
vi CONTENTS
2.2 Prognostic/Clinical Prediction Models
Development of Health Risk Appraisal Functions in the Presence of Multiple Indicators:
The Framingham Study Nursing Home Institutionalization Model. R. B. D’Agostino,
Albert J. Belanger, Elizabeth W. Markson, Maggie Kelly-Hayes and Philip A. Wolf 209
Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions
and Adequacy, and Measuring and Reducing Errors. Frank E. Harrell Jr.,
Kerry L. Lee and Daniel B. Mark 223
Development of a Clinical Prediction Model for an Ordinal Outcome: The World 251
Health Organization Multicentre Study of Clinical Signs and Etiological Agents of
Pneumonia, Sepsis and Meningitis in Young Infants. Frank E. Harrell Jr.,
Peter A. Margolis, Sandy Gove, Karen E. Mason, E. Kim Mulholland,
Deborah Lehmann, Lulu Muhe, Salvacion Gatchalian, Heinz F. Eichenwald and
the WHO/ARI Young Infant Multicentre Study Group
Using Observational Data to Estimate Prognosis: An Example Using a Coronary Artery
Disease Registry. Elizabeth R. DeLong, Charlotte L. Nelson, John B. Wong,
David B. Pryor, Eric D. Peterson, Kerry L. Lee, Daniel B. Mark,
Robert M. Cali and Stephen G. Pauker 287
Part III CLINICAL TRIALS
3.1 Design 317
Designing Studies for Dose Response. Weng Kee Wong and Peter A. Lachenbruch
3.2 Monitoring 335
Bayesian Data Monitoring in Clinical Trials. Peter M. Fayers, Deborah Ashby and
Mahesh K. B. Parmar
3.3 Analysis 353
379
Longitudinal Data Analysis (Repeated Measures) in Clinical Trials. Paul S. Albert 397
423
Repeated Measures in Clinical Trials: Simple Strategies for Analysis Using Summary
Measures. Stephen Senn, Lynda Stevens and Nish Chaturvedi
Strategies for Comparing Treatments on a Binary Response with Multi-Centre Data.
Alan Agresti and Jonathan Hartzel
A Review of Tests for Detecting a Monotone Dose-Response Relationship with
Ordinal Response Data. Christy Chuang-Stein and Alan Agresti
Index 443
Preface
The development and use of statistical methods has grown exponentially over the last two
decades. Nowhere is this more evident than in their application to biostatistics and, in par-
ticular, to clinical medical research. To keep abreast with the rapid pace of development,
the journal Statistics in Medicine alone is published 24 times a year. Here and in other
journals, books and professional meetings, new theory and methods are constantly presented.
However, the transitions of the new methods to actual use are not always as rapid. There
are problems and obstacles. In such an applied interdisciplinary ÿeld as biostatistics, in which
even the simplest study often involves teams of researchers with varying backgrounds and
which can generate massive complicated data sets, new methods, no matter how powerful and
robust, are of limited use unless they are clearly understood by practitioners, both clinical and
biostatistical, and are available with well-documented computer software.
In response to these needs Statistics in Medicine initiated in 1996 the inclusion of tutorials
in biostatistics. The main objective of these tutorials is to generate, in a timely manner, brief
well-written articles on biostatistical methods; these should be complete enough so that the
methods presented are accessible to a broad audience, with su cient information given to
enable readers to understand when the methods are appropriate, to evaluate applications and,
most importantly, to use the methods in their own research.
At ÿrst tutorials were solicited from major methodologists. Later, both solicited and unso-
licited articles were, and are still, developed and published. In all cases major researchers,
methodologists and practitioners wrote and continue to write the tutorials. Authors are guided
by four goals. The ÿrst is to develop an introduction suitable for a well-deÿned audience (the
broader the audience the better). The second is to supply su cient references to the literature
so that the readers can go beyond the tutorial to ÿnd out more about the methods. The ref-
erenced literature is, however, not expected to constitute a major literature review. The third
goal is to supply su cient computer examples, including code and output, so that the reader
can see what is needed to implement the methods. The ÿnal goal is to make sure the reader
can judge applications of the methods and apply the methods. The tutorials have become ex-
tremely popular and heavily referenced, attesting to their usefulness. To further enhance their
availability and usefulness, we have gathered a number of these tutorials and present them in
this two-volume set.
Each volume has a brief preface introducing the reader to the aims and contents of the
tutorials. Here we present an even briefer summary. We have arranged the tutorials by subject
matter, starting in Volume 1 with 18 tutorials on statistical methods applicable to clinical
studies, both observational studies and controlled clinical trials. Two tutorials discussing the
computation of epidemiological rates such as prevalence, incidence and lifetime rates for
cohort studies and capture–recapture settings begin the volume. Propensity score adjustment
methods and agreement statistics such as the kappa statistic are dealt with in the next two
tutorials. A series of tutorials on survival analysis methods applicable to observational study
data are next. We then present ÿve tutorials on the development of prognostics or clinical
prediction models. Finally, there are six tutorials on clinical trials. These range from designing
vii
viii PREFACE
and analysing dose response studies and Bayesian data monitoring to analysis of longitudinal
data and generating simple summary statistics from longitudinal data. All these are in the
context of clinical trials. In all tutorials, the readers is given guidance on the proper use of
methods.
The subject-matter headings of Volume 1 are, we believe, appropriate to the methods. The
tutorials are, however, often broader. For example, the tutorials on the kappa statistics and
survival analysis are useful not only for observational studies, but also for controlled clinical
studies. The reader will, we believe, quickly see the breadth of the methods.
Volume 2 contains 16 tutorials devoted to the analysis of complex medical data. First, we
present tutorials relevant to single data sets. Seven tutorials give extensive introductions to and
discussions of generalized estimating equations, hierarchical modelling and mixed modelling.
A tutorial on likelihood methods closes the discussion of single data sets. Next, two exten-
sive tutorials cover the concepts of meta-analysis, ranging from the simplest conception of a
ÿxed e ects model to random e ects models, Bayesian modelling and highly involved models
involving multivariate regression and meta-regression. Genetic data methods are covered in
the next three tutorials. Statisticians must become familiar with the issues and methods rele-
vant to genetics. These tutorials o er a good starting point. The next two tutorials deal with
the major task of data reduction for functional magnetic resonance imaging data and disease
mapping data, covering the complex data methods required by multivariate data. Complex and
thorough statistical analyses are of no use if researchers cannot present results in a meaningful
and usable form to audiences beyond those who understand statistical methods and complexi-
ties. Reader should ÿnd the methods for presenting such results discussed in the ÿnal tutorial
simple to understand.
Before closing this preface to the two volumes we must state a disclaimer. Not all the
tutorials that are in these two volumes appeared as tutorials. Three were regular articles.
These are in the spirit of tutorials and ÿt well within the theme of the volumes.
We hope that readers enjoy the tutorials and ÿnd them beneÿcial and useful.
RALPH B. D’AGOSTINO, SR. EDITOR
Boston University
Harvard Clinical Research Institute
Preface to Volume 1
This ÿrst volume of Tutorials in Biostatistics is devoted to statistical methods in clinical
research. By this we mean statistical methods applied to medical problems involving human
beings, either as members of populations or groups in observational and epidemiological
research or as participants in clinical trials. The tutorials are divided into three parts. Here
we brie y mention the general themes of each part and the articles within them.
Part I is on observational studies and epidemiology. These articles clarify the uniqueness
and complications that arise from observational data and present methods to obtain meaningful
and unbiased inferences. Section 1.1 is devoted to epidemiology and contains two tutorials.
The ÿrst, by Beiser, D’Agostino, Seshadri, Sullivan and Wolf, presents a thorough discussion
of epidemiological event rates such as incidence rates and lifetime risks, clariÿes issues such
as competing risks in the calculation of these rates and includes computer programs to carry
out computations. The second tutorial, by Chao, Tsay, Lin, Shau and Chao, describes the
computation of epidemiological rates for capture-recapture data such as would be obtained
from multiple surveys attempting to estimate the disease prevalence rate for, say, hepatitis A
virus or diabetes in a population. The issue of minimizing biases is discussed. Section 1.2, on
adjustment methods, contains one article by Ralph D’Agostino, Jr., on the use of propensity
scores for reducing bias in treatment comparisons from observational studies. This tutorial
has become a standard reference for propensity scoring. The article from Section 1.3, on
agreement statistics, by Kraemer, Periyakoil and Noda, covers in detail the use of the kappa
statistic in medical research.
Section 1.4 presents three tutorials devoted to survival methods applicable to observational
studies. First, Bull and Spiegelhalter clearly identify the complications and other issues in-
volved in using survival methods in observational studies (in contrast to clinical trials). Interval
estimation and binary outcomes in longitudinal studies are then developed in the next two tu-
torials by Lindsey and Ryan and by Carlin, Wolfe, Co ey and Patton. The latter two tutorials
have uses beyond survival analysis in observational studies. We group them in this part of
the volume mainly for convenience. The reader should quickly see the broader applicability
of the methods and not be limited by our classiÿcation.
Part II is concerned with prognostic/clinical prediction models and contains two sections.
Here the aim is to present methods for developing mathematical models that can be used to
identify people at risk for an outcome such as the development of heart disease or for the
prognosis of subjects with certain clinical characteristics such as cancer tumour size. Some
of these tutorials have become major references for clinical prediction model development.
Section 2.1 contains one article by Mazumdar and Glassman on categorizing prognostic vari-
ables. The question is often how best to dichotomize a diagnostic variable so that it can be
used in a clinical setting. Issues such as multiple testing often render useless such ’obvious’
methods as trying to ÿnd the best cut point. A careful review of the ÿeld is presented and
helpful suggestions abound.
ix
x PREFACE TO VOLUME 1
Section 2.2, ‘Prognostic/Clinical Prediction Models’, presents in four detailed tutorials meth-
ods for developing and evaluating multivariable clinical prediction models. The ÿrst, by
D’Agostino, Belanger, Markson, Kelly-Hayes and Wolf, illustrates how to deal with a large
set of potentially useful prediction variables. Methods such as principal components analysis
and hierarchical variable selection methods for survival analysis are highlighted. The next two
articles have Frank Harrell as the ÿrst author and deal in detail with developing prediction
models for time to event, binary and ordinal outcomes. (The ÿrst is authored by Harrel, Lee
and Mark, and the second by Harrell, Margolis, Gove, Mason, Mulholland, Lehmann, Muhe,
Gatchalian and Eichenwald.) Questions of model development are explored completely, as
are issues of making predictions and concerns about validation. The last tutorial in Section
2.2 deals with estimating prognosis based on observational data such as are obtainable in a
registry. It is authored by DeLong, Nelson, Wong, Pryor, Peterson, Lee, Mark, Cali and
Pauker. These four tutorials are among the best literature sources for the development and
appropriate use of clinical prediction models.
Part III is on clinical trials and contains three sections. While given as tutorials, the articles
of this section are innovative in understanding as well as in the presentation of the issues and
methods. Section 3.1 contains a clever article by Wong and Lachenbruch on the optimal design
of dose response studies. Section 3.2, on monitoring in clinical trails, contains an article on
Bayesian data monitoring by Fayers, Ashby and Parmar pointing to the beneÿts of a Bayesian
analysis even in this setting. Section 3.3 contains four articles on analysis. These bring together
ideas and methods available for use, but not presented elsewhere with such completeness and
clear focus. They ÿll a serious void and add wonderfully to the ÿeld. The ÿrst, by Albert, deals
with longitudinal clinical trial data analysis. The second, by Senn, Stevens and Chaturvedi,
deals with generating simple summary numbers from repeated measures studies so that the
analysis and interpretations of the study are intuitive and meaningful. The next article, by
Agresti and Hartzel, concerns binary data from multi-centre trials. Lastly, Chuang-Stein and
Agresti discuss dose responses with ordinal data.
We hope these 18 tutorials will be of use to readers.
Part I
OBSERVATIONAL
STUDIES/
EPIDEMIOLOGY
1.1 Epidemiology
Tutorials in Biostatistics Volume 1: Statistical Methods in Clinical Studies Edited by R. B. D’Agostino
? 2004 John Wiley & Sons, Ltd. ISBN: 0-470-02365-1
3
4 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 5
6 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 7
8 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 9
10 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 11
12 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 13
14 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 15
16 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 17
18 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 19
20 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 21
22 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 23
24 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 25
26 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 27
28 A. BEISER ET AL.
ALZHEIMER’S DISEASE IN THE FRAMINGHAM STUDY 29
30 A. BEISER ET AL.
Tutorials in Biostatistics Volume 1: Statistical Methods in Clinical Studies Edited by R. B. D’Agostino
? 2004 John Wiley & Sons, Ltd. ISBN: 0-470-02365-1
31
32 A. CHAO ET AL.
CAPTURE-RECAPTURE MODELS 33
34 A. CHAO ET AL.
CAPTURE-RECAPTURE MODELS 35
36 A. CHAO ET AL.
CAPTURE-RECAPTURE MODELS 37
38 A. CHAO ET AL.
CAPTURE-RECAPTURE MODELS 39