The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by imstpuk, 2022-02-11 05:03:19

Companion to psychiatric studies Book 1

Companion to psychiatric studies Book 1

Companion to Psychiatric Studies

Morrison, G., O’Carroll, R., McCreadie, R., Roth, M., Tym, E., Mountjoy, C.Q., et al., 1986. Teasdale, J., 1983. Negative thinking in
2006. Long-term course of cognitive CAMDEX: a standardised instrument for the depression, cause, effect or reciprocal
impairment in schizophrenia. Br. J. diagnosis of mental disorder in the elderly relationship. Advances in Behaviour Therapy
Psychiatry 189, 556–557. with special reference to the early detection 5, 3–25.
of dementia. Br. J. Psychiatry 149, 698–709.
Murphy, F.C., Sahakian, B.J., 2001. Trenerry, M.R., Crosson, B., DeBoe, J.,
Neuropsychology of bipolar disorder. Br. J. Russell, A.J., Munro, J.C., Jones, P.B., 1997. Leber, W.R., 1988. STROOP
Psychiatry 178, S120–S127. Schizophrenia and the myth of intellectual neuropsychological screening test manual.
decline. Am. J. Psychiatry 154, 635–639. Psychological Assessment Resources, Odessa,
Nelson, H.E., 1976. A modified card sorting test Florida.
sensitive to frontal lobe defects. Cortex 12, Sahakian, B.J., Owen, A.M., 1992.
313–324. Computerized assessment in neuropsychiatry van den Broek, M.D., 1993. Utility of the
using CANTAB: discussion paper. J. R. Soc. modified Wisconson Card Sorting Test in
Nelson, H.E., Willison, J.R., 1991. The revised Med. 85, 399–402. neuropsychological assessment. Br. J. Clin.
National Adult Reading Test-test manual. Psychol. 32, 333–343.
NFER-Nelson, Windsor. Saykin, A.J., Shtasel, D.L., Gur, R.E., et al.,
1994. Neuropsychological deficits in Veiel, H.O., 1997. A preliminary profile of
O’Carroll, R., 1995. The assessment of neuroleptic naive patients with first-episode neuropsychological deficits associated with
premorbid ability: a critical review. schizophrenia. Arch. Gen. Psychiatry 51, major depression. J. Clin. Exp.
Neurocase 1, 83–89. 124–131. Neuropsychol. 19, 587–603.

O’Carroll, R.E., Conway, S., Ryman, A., Scoville, W.B., Milner, B., 1957. Loss of recent Warrington, E.K., 1984. Recognition Memory
Prentice, N., 1997. Performance on the memory after bilateral hippocampal lesions. Test. Nelson, Windsor.
delayed word recall test (DWR) fails to J. Neurol. Neurosurg. Psychiatry 20,
differentiate clearly between depression and 150–176. Weingartner, H., 1986. Automatic and effort-
Alzheimer’s disease in the elderly. Psychol. demanding cognitive processes in depression.
Med. 27, 967–971. Shah, P.J., Ebmeier, K.P., Glabus, M.F., In: Poon, L.W. (Ed.), Handbook for clinical
Goodwin, G.M., 1998. Cortical grey matter memory assessment of older adults.
O’Carroll, R.E., Moffoot, A., Ebmeier, K.P., reductions associated with treatment- American Psychological Association,
Goodwin, G.M., 1992. Estimating pre- resistant chronic unipolar depression: Washington, DC, pp. 218–225.
morbid intellectual ability in the Alcoholic controlled magnetic resonance imaging study.
Korsakoff Syndrome. Psychol. Med. 22, Br. J. Psychiatry 172, 527–532. Williams, J.M., Ellis, N.C., Tyers, C., et al.,
903–909. 1996. The specificity of autobiographical
Shallice, T., Burgess, P.W., 1991. Deficits in memory and imageability of the future.
Palmer, B.W., Heaton, R.K., Paulsen, J.S., et al., strategy application following frontal lobe Mem. Cogn. 24, 116–125.
1997. Is it possible to be schizophrenic yet damage in man. Brain 114, 727–741.
neuropsychologically normal? Wilson, B., Cockburn, J., Haligan, P., 1987.
Neuropsychology 11, 437–446. Shallice, T., Burgess, P.W., Frith, C.D., 1991. Development of a behavioural test of
Can the neuropsychological case-study visuospatial neglect. Arch. Phys. Med.
Pickup, G.J., 2008. Relationship between Theory approach be applied to schizophrenia? Rehabil. 68, 98–102.
of Mind and executive function in Psychol. Med. 21, 661–673.
schizophrenia: a systematic review. Wilson, B.A., Cockburn, J.M., Baddeley, A.D.,
Psychopathology 41, 206–213. Sitskoorn, M.M., Aleman, A., Ebisch, S.J.H., Hiorns, R., 1989. The development and
Appels, M.C.M., Kahn, R.S., 2004. Cognitive validation of a test battery for detecting and
Premack, D., Woodruff, G., 1978. Does the deficits in relatives of patients with monitoring everyday memory problems.
chimpanzee have a theory of mind? Behav. schizophrenia: a meta-analysis. Schizophr. J. Clin. Exp. Neuropsychol. 11, 855–870.
Brain Sci. 4, 515–526. Res. 71, 285–295.
Wilson, B.A., Baddeley, A.D., Kapur, N., 1995.
Quraishi, S., Frangou, S., 2002. Snitz, B.E., MacDonald, A.W., Carter, C.S., Dense amnesia in a professional musician
Neuropsychology of bipolar disorder: a 2006. Cognitive deficits in unaffected first- following herpes simplex virus encephalitis. J.
review. J. Affect. Disord. 72, 209–226. degree relatives of schizophrenia patients: a Clin. Exp. Neuropsychol. 17, 668–681.
meta-analytic review of putative
Rey, 1964. L’examen clinique en psychologic. endophenotypes. Schizophr. Bull. 32, Wilson, B.A., Alderman, N., Burgess, P.W.,
Presses Universitaires de France, Paris. 179–194. et al., 1996. BADS Behavioural Assessment
of the Dysexecutive Syndrome. Thames
Richards, P.M., Ruff, R.M., 1989. Motivational Spreen, O., Strauss, E., 1998. A compendium of Valley Test Company, Bury St Edmunds.
effects on neuropsychological functioning: neuropsychological tests, second ed. Oxford
Comparison of depressed versus University Press, New York. Woodberry, K.A., Giuliano, A.J., Seidman, L.J.,
nondepressed individuals. J. Consult. Clin. 2008. Premorbid IQ in schizophrenia: a
Psychol. 57, 396–402. Squire, L.R., 1987. Memory and brain. Oxford meta-analytic review. Am. J. Psychiatry 165,
University Press, Oxford. 579–587.
Ridout, N., Astell, A.J., Reid, I.C., et al., 2003.
Memory bias for emotional facial expressions Swainson, R., Hodges, J.R., Gallon, C.J., et al., Young, A.W., Perrett, D.I., Calder, A.J.,
in major depression. Cognition and Emotion 2001. Early detection and differential Sprengelmeyer, R., Ekman, P., 2002. Facial
17, 101–122. diagnosis of Alzheimer’s disease and emotional expressions: stimuli and tests
depression with neuropsychological tasks. (FEEST), Thames Valley Test Company,
Robertson, I.H., Ward, T., Ridgeway, V., 1996. Dement. Geriatr. Cogn. Disord. 12, Bury St Edmunds.
The structure of normal human attention: 265–280.
The Test of Everyday Attention. Journal of
the International Neuropsychology Society 2,
525–534.

140

Genetics in relation to psychiatry 8

Douglas H R Blackwood Walter J Muir

Introduction experience of a social environment where information can be
passed on over generations orally and through media includ-
Since the full sequence of the human genome was published ing books, photographs, recordings and other such artefacts.
in 2003 there have been dramatic developments in understanding The inheritance of genetic factors is mediated by the genera-
how genes contribute to human diseases across all branches of tional passage of the chemical deoxyribosenucleic acid (DNA)
medicine including psychiatry opening up new possibilities in packaged into 46 chromosomes and organised into the
the future for improved diagnosis, treatment and perhaps even estimated 25 000 genes that make up the human genome.
prevention of some common psychiatric illnesses. The extent of The structure of DNA, the well-known double helix first
genetic variation across the genome, the subtle differences in gene described by Watson and Crick in 1953 (Watson & Crick
expression that underlies much of human variation, is much 1953), is built around four nucleotide bases, adenine (A),
greater than previously thought and a principal aim of psychiatric thymine (T), cytosine (C) and guanine (G), arranged in two
genetics is to unravel the complex interplay between environ- antiparallel strands where phosphate bonds link individual
mental factors and variations within genes underlying many types bases and the strands bind through hydrogen bonds according
of mental illness. Over 19 000 genes related to human illness to the base-pairing rule – A with T, C with G. This structure
have been catalogued (Online Mendelian Inheritance in Man, provides a copying mechanism that ensures remarkable fidelity
OMIM http://www.ncbi.nlm.nih.gov/omim) and new technolo- of DNA coding sequence during replication and this flow of
gies including high throughput genotyping and sequencing are information from parents to offspring dictates the develop-
likely to facilitate further gene discovery. In psychiatry this may ment and maintenance of the human form.
lead to new ways to classify illnesses based on neurobiology and
make possible targeted treatments and early detection of people The structure of the chromosome
at risk of psychiatric illness leading to improved outcomes if
appropriate interventions are available. Some of the best recently Present in the nucleus of every autosomal cell in the human
established findings in psychiatry relate to the role of genes in body are 46-rod shaped chromosomes that can be made visible
determining the risk of developing a disorder and discovering under a light microscope by suitable staining methods. Cytoge-
the neurobiological pathways within which these genes oper- netics has an essential role in the diagnosis of many common
ate. The aim of this chapter is not to give in detail the genetic find- inherited conditions caused by alterations in chromosomal
ings for every psychiatric illness, which is found in other chapters of number or structure and has also been a useful tool in mapp-
this volume; instead, the focus is on the basic mechanisms that ing genes implicated in psychiatric illnesses. Chromosomes
underpin these findings and the approaches that psychiatrists can are long linear molecules of DNA which, in the cell nucleus,
take to analyse the contribution of genetics to psychiatric disorder. are coiled together with histones and other chromosome-
associated proteins to form chromatin structures where the
The genetic code and how it is DNA double helix wraps around a protein (nucleosome) in a
bead-on-string fashion, the beads coil into a 30-nm chromatin
transmitted fibre, which in turn is formed into twisted loops that radiate
from a protein scaffold (Fig. 8.1).
Inheritance can be interpreted in several ways. Culture is
transmitted from one generation to the next through the The 46 human chromosomes consist of 22 matched
autosomes and a pair of sex chromosomes (X and Y). By

ã 2010, Elsevier Ltd.
DOI: 10.1016/B978-0-7020-3137-3.00008-5

Companion to Psychiatric Studies

Telomere Compacted Chromosomal the different stages in the life of a cell and includes mitosis
and the time between mitoses called interphase, when
chromatin scaffold the cell is not dividing. Some cells, including fully differen-
tiated neurons, do not divide and remain permanently at
Radial loop interphase.
containing
100 kbp DNA During interphase and preceding mitosis DNA is synthe-
sised in the nucleus of the cell leading to a doubling of the
Centromere genome when the DNA of each chromosome replicates
creating two identical copies or sister chromatids. Mitosis is
Nucleosome DNA double helix the process by which these chromatids supply each of the
new daughter cells with a diploid complement of 46 chromo-
Fig. 8.1 The various levels of DNA compaction at metaphase: from somes. In the first stage a mitotic spindle forms composed of
the level of the whole chromosome (upper left), to the duplex DNA two centrosomes positioned at the poles of the cell from
helix itself (lower right). which radiate a network of microtubules forming an array of
fibres linking the centrosomes. The chromatids become
convention the autosomal pairs are numerically labelled from 1 attached to the microtubules of the mitotic spindle and in
to 22 according to size with chromosome 1 being the largest. the final stages of mitosis the sister chromatids of each
Females carry two copies of X and males have one X and one chromosomes separate and are pulled apart to opposite poles
Y chromosome. Thus a normal complement or karyotype is of the cell. Finally cell division is completed by the complete
46XX in a woman and 46XY in a man, the digit indicating separation of the two daughter cells in a process called
total chromosome number or diploid count. Human chromo- cytokinesis. Errors in mitosis leading to uneven distributions
somes are routinely examined in lymphocytes obtained from of chromosomes are a feature of some cancer cells and varia-
a venous blood sample and visualised by light microscopy tion in the function of genes involved in parts of the cell cycle
when cell division is arrested after culture for 3 days. Chro- are candidates for a role in psychiatric illness including
mosomes in the metaphase stage of cell division at the time schizophrenia and bipolar disorder.
when growth is arrested show a clearly recognisable macro-
structure after staining with specific dyes (e.g. giemsa) pro- Meiosis is the special form of nuclear division unique to
ducing the classical banded appearance at metaphase. diploid germ cells that leads to the formation of gametes, each
At this stage of division each chromosome consists of two containing a haploid number of 23 chromosomes. The first
sister chromatids joined at the centromere, which divides the stage of meiosis is a reduction division of the germ cell
chromosome into two arms: short (termed ‘p’) and long (‘q’) whereby the chromosome number is halved, followed by a
arms. The centromere contains a particular type of repeated second meiotic division, essentially similar to mitosis, and the
DNA (alpha-satellite) and specific proteins which have a role formation of four haploid spermatids or ova originating from
moving chromosomes along microtubules during cell division. the same original germ cell. During the first stage of meiosis
At both ends of chromosomes are the telomeres, specialised each pair of homologous chromosomes become linked at their
structures where a repeated sequence (TTAGGG) caps the free centromeres. Each part of this doublet, now termed a sister
chromosome end, acting to halt replication. During cell division chromatid, exchanges DNA segments by a process termed
there is loss of telomeric DNA and shortening of telomeres is synapsis with the formation of bridges or chiasma. In a com-
part of the normal aging process in cells, leading eventually to plex series of steps mediated by specific proteins that bind
cellular malfunction. to a single strand of the DNA duplex and recognise a comple-
mentary strand in the DNA of the other homologous chromo-
The cell cycle and mitosis some a junction is formed followed by breakage, crossover and
reunion of the DNA strands. Recombination occurs about
All the somatic cells in the body are derived from a single 50 times across the whole genome at each meiosis, thus
fertilized ovum (zygote) and cells proliferate by the process approximately two events per chromosome – in effect shuf-
of mitotic division allowing differentiation, growth and repair fling the sequence of DNA. Each paired sister chromatid unit
of tissues throughout the lifespan. The cell cycle describes has a specific protein-containing disk (kinetochore) at their
fused centromere region which interacts and binds with a
microtubule-based apparatus that radiates from two anchoring
structures (poles) spaced apart within the cell. Through the
addition or subtraction of molecular subunits to microtubules
the spindle apparatus can ‘pull’ or ‘push’ chromosomes within
the cell, either bringing them to the equator of the spindle
(congression) as a prerequisite for recombination or separating
them to opposite poles (segregation) prior to the formation of
daughter cells. The kinetochores are orientated towards one
pole or the other, and after recombination the individual
members of the homologous chromosome pair are drawn
towards the poles. Such segregation is a random event

142

Genetics in relation to psychiatry CHAPTER 8

introducing a mixture of paternal- and maternal-derived chro- 12
mosomes into the germline, adding another element of adap-
tive variation into the transmitted genome. Meiosis I is 3
completed, after the formation of two clusters of segregated Meiosis I
chromosomes, by the disappearance of the spindle apparatus,
the formation of new nuclear membranes and finally by the 4
ingression of the cell membrane in a closing purse-string fash-
ion to separate into two daughter cells. The second meiotic Meiosis II
division (meiosis II) does not involve DNA duplication — it 5
is a reduction division, so that each gamete contains only half
the original chromosomal material. The sister chromatids that Fig. 8.2 Diagrammatic representation of the behaviour of one
were fused at the centromere during meiosis I are split and chromosome pair during meiosis. Stage 1: DNA replication. Stage 2:
pulled (random orientation with respect to the poles ensuring Nuclear membrane dissolves, spindle apparatus forms, duplicated
random selection) to two new poles by the spindle apparatus chromosomes fuse at centromeres (forming bivalents) and move to
of meiosis II. Four gametes are thus produced from each origi- equator, and recombination occurs. Stage 3: Homologous
nal cell, each containing only 23 chromosomes (normally chromosomes move to poles, and cell divides in two. Stages 4 and 5:
either 23,X or 23,Y). In the male germline all four gametes No further duplication, but a reduction division to form four haploid
are retained as sperm; however, in the formation of ova only gametes.
one of the four gametes survives as an ovum; the others form
polar bodies. sequences in ribosomes. There has recently been much inter-
est in another class of genes, RNA genes, that do not code
The above description is simplified, and a large number for proteins but direct the formation of RNA molecules that
of complex interactions occur involving molecular motors, are not transcribed but are end-products in themselves with
checkpoint systems that control progress from one phase to structural, enzymatic or regulatory functions. Important
the next, the spindle apparatus and kinetochores. Meiosis is amongst these are various small RNA transcripts (Moazed
illustrated in diagrammatic form in Fig. 8.2. 2009 ) that include both small inhibitory RNA (siRNA), which
inhibits expression and with its innate biological function can
The processes of genetic recombination or crossing over make it a useful experimental tool, and micro RNAs (miRNA),
during the first stage of meiosis is a remarkable process for one of which recently has been shown to alter glutamatergic
generating genetic diversity permitting adaptation to changing signalling in the NMDA pathway (Coyle 2009 ). Small RNAs
environments because this ensures that each of the four act via direct mechanisms on DNA or mRNA or via alterations
gametes derived from one germ cell will carry a unique assort- in the DNA-associated histone architecture (see below).
ment of chromosomal DNA and no two gametes are likely to Within the gene the basic unit of information is the codon –
contain exactly the same DNA sequences. Aberrations in a triplet of three nucleotide bases whose presence in a
recombination are a frequent cause of chromosomal abnormal- sequence will lead to a specific amino acid insertion into a pro-
ities leading to diseases including Down syndrome, and are also tein. The genetic code maps DNA codons to amino acids, and
of fundamental importance for gene mapping strategies and the code has a degree of redundancy (degeneracy), with sev-
linkage studies, which are discussed in a later section. eral different codons linked to individual amino acids (a triplet
can be taken from four nucleotides in 64 different ways, but
The gene there are only 20 different amino acids). Some codons do
not code for amino acids, but dictate operational events during
This is the basic unit of genetic information that lies within protein formation such as initiation and termination of
each chromosomal DNA sequence. Originally the term ‘gene’
was used in an abstract sense to explain why some traits
seemed to be inherited. In fact the relationship between genes
and proteins was conceived well before the structure of DNA
was known. The number of genes making up the human
genome is currently estimated to be in the region of 25 000.
The gene is now taken to mean a stretch of DNA that contains
the genetic code, read as triplets of nucleotides (adenine,
thymine, guanine, cytosine) necessary to direct the assembly
of amino acids into proteins making up the human proteome.
The emphasis is on functionality – a gene is that segment of
the chromosome that directs the formation of a functional
product. Gene expression leading to protein synthesis involves
the steps of gene transcription to messenger RNA (mRNA)
taking place in the cell nucleus and translation of the mRNA
from the nucleus to the cell cytoplasm where transfer RNA
(tRNA) is involved in the assembly of amino acids into protein

143

Companion to Psychiatric Studies

UUU Phe UCU UAU Tyr UGU Cys (the 50-carbon; pronounced ‘five prime’) is bound to a phos-
UUC UCC UAC UGC phate moiety, the third to a hydroxyl group. Diphosphate
UCA Ser bonds are formed between the 30 and 50 of the first sugar leav-
ing the first 50 free (the ‘50-end’) and the last 30 of the DNA
UUA UCG UAA STOP UGA STOP free (the 30-end). This gives a direction to the DNA sequence.
Thus transcription of DNA into RNA begins at the 50-end of
UUG DNA and proceeds in the 30-direction along what is defined
as the coding or sense strand of DNA and the second DNA
CUU Leu CCU UAG STOP UGG Trp strand aligned 30 to 50 is described as noncoding or antisense.
CUC Normally a sequence of bases termed the untranslated region
CCC Pro (UTR) exists on either side of the open reading frame. The
CUA CCA CAU CGU UTR is transcribed into mRNA but not translated into protein.
CCG CAC
CUG His CGC Arg RNA, normally single stranded, differs chemically from
CGA DNA in its sugar backbone, which is ribose instead of deoxyri-
AAU bose, and by the replacement of thymine with uracil bases.
AUC Ileu ACU CAA Gln CGG During transcription the duplex DNA is partly unwound
AUA ACC CAG to allow access to the key enzyme DNA-dependent RNA
ACA Thr AGU polymerase (RNApol). There are three forms of RNApol in
AGC Ser eukaryotes: RNApol I, which creates ribosomal or rRNA;
ACG AAU RNApol II, which creates messenger or mRNA; and RNApol
AUG START AAC Asn III, which creates transport or tRNA. The initial event is bind-
Met ing of an initiation protein to a non-transcribed promoter
AGA sequence (normally containing a TATA nucleotide motif or
GUU GCU AAA AGG Arg box) near the 30-end of the open reading frame. A complex
AAG of proteins that includes transcription factors (co-activators)
GUC Val GCC Ala Lys and activator proteins that link to more distant enhancer
GUA GCA elements in the DNA sequence itself, regulate transcriptional
GGU activity and also position RNApol correctly at the start of the
GCG open reading frame. Abnormalities in these regulatory proteins
GUG GAU GGC or the DNA motifs in the promoter and enhancer elements
GAC Asp GGA Gly that they bind have been increasingly recognised as pathogenic.
Figure 8.4 summarises the gene unit and its regulatory ele-
GGG ments. Not shown on this diagram are silencer motifs that
act to downregulate transcription by binding specific proteins
GAA Glu (repressors). Taken together this complex set of interacting
GAG proteins serves to position and stabilise RNApol at the tran-
scription start site and to offer many targets for cellular fine-
Fig. 8.3 The genetic code in mRNA. Left-hand side of columns tuning of transcriptional activity.
indicates the coding triplets. Columns are arranged by order of the
second nucleotide in each triplet (U [uracil], C, A or G). The amino Post-transcriptional modification and
acid coded for is given on the right using the standard chemical regulation of mRNA
abbreviations. Note the start (methionine) and stop codons.
The code can easily be referred ‘back’ to DNA or ‘forward’ for All DNA within an open reading frame is initially transcribed
tRNA triplets. forming a primary transcript of RNA which in turn undergoes
a series of modifications creating messenger RNA (mRNA),
polypeptide chain assembly (start and stop codons; start which is transported to the cytoplasm where peptide synthesis
codons also code for methionine, the initial amino acid in any takes place in the process called translation. These stages of
polypeptide; Fig. 8.3). RNA processing are capping, polyadenylation and splicing.
Capping involves the addition of a methylguanosine structure
An open reading frame (ORF) is a complete set of codons to the 50-end of RNA during transcription and immediately
bracketed by start and stop codons. The fidelity of DNA is after transcription a polyadenylate tail (a sequence of repeated
obviously of prime importance to cellular function, and a vari- adenine nucleotides) is added at the 30-end. These are essen-
ety of mechanisms exist to preserve it. Ionising radiation or tial to mRNA stability, protecting it from exonuclease enzyme
reactive oxidative molecules can induce double-stranded attack as well as facilitating export from the nucleus and effi-
breaks. These can be repaired by homologous recombination cient translation. Splicing is the removal from the primary
(one key protein in this process BRCA1 is defective in an transcript of the non-coding sequences termed introns and
inherited form of breast cancer) or by a special repair mecha- the splicing together of the exons to form mature mRNA in
nism (non-homologous end-joining). Ultraviolet light exposure a complex process involving a special group of proteins and
leads to DNA damage by causing dimerisation of bases on a
single DNA strand, and a specific photoreactive enzyme sys-
tem exists to reverse this. In some cases a bypass mechanism
exists (sloppy copiers) which simply skips over damage and
allows replication to continue but at the expense of intro-
ducing mutations into the protein. Cells are usually unable
to cope with extensive damage to the genome, however, and
apoptosis, or programmed cell death mechanisms are activated
during the cell cycle to eliminate damaged cells from the
healthy cell population.

Transcription: DNA to mRNA

Transcription, the first step of gene expression, is the trans-
mission of the genetic code from DNA to RNA by an RNA
polymerase enzyme and takes place in the nucleus. To describe
transcription a nomenclature based on the sugar chemistry
of DNA has developed. The fifth carbon of deoxyribose

144

Genetics in relation to psychiatry CHAPTER 8

Core Open reading frame Translation: mRNA to polypeptide
promoter Exon Intron
Once mRNA has moved to the cytoplasm (stripped of splicing
TATA-box proteins and actively transported through nuclear pores), it
associates with ribosomes in the final steps towards protein
Transcription start point synthesis. Again the process is complex. Cytoplasmic trans-
lational factors must bind the mRNA to allow it to interact
Enhancer elements Regulatory region Basic components of with the ribosome. In some cases inhibiting factors will
gene, including key prevent its translation (e.g. ferritin mRNA is usually inhibited
Co-activator complex regulatory regions by aconitase which binds with the initial 30 bp, forming a
(binds basal transcription loop that cannot link to ribosomes; only in the presence of
factors) Activator proteins cellular iron, which binds aconitase, is this inhibition
(bind enhancer motifs) removed). Normally a large multiprotein initiation complex
forms that positions the mRNA correctly in the ribosomal
RNA polymerase groove.

T tf1 tf2 A type of RNA termed transfer RNA (tRNA) found in the
cytoplasm is involved in the incorporation of amino acids into a
Enhancer T: TATA binding protein peptide chain in the ribosomal complex. Amino acids are
binding attached to specific tRNA molecules by activating enzymes
motifs tf1, tf2: basal transcription and shuttled to the ribosomal complex. The ribosome moves
factors along the mRNA and at each step the amino acid dictated by
the genetic code is provided by the appropriate tRNA and
Binding factors that coordinate RNApol activity to joined by peptide bond to the growing polypeptide chain.
transcription start site There are three stages to translation:

DNA loops back to allow • initiation, in which the leading strand of the mRNA is
distant elements to bind orientated exactly to the initiation protein complex and
activator proteins at binds the initiation tRNA (carrying methionine); this is
promoter site critical to correct reading of the transcribed open reading
T tf1 tf2 frame;

Loop-back action of distal enhancer elements • elongation, in which the mRNA codon adjacent to the
to influence transcription initiation codon is exposed to tRNA binding, selecting the
correct tRNA codon to match and placing its amino acid
Fig. 8.4 The basic elements of eukaryotic gene structure and directly next to methionine, with which it then bonds
regulation. through a peptide linkage; and

small nuclear ribonucleoproteins (snRNPs) that recognise ter- • conformational change in the ribosome, brought about
minal intronic sequences. Alternative splicing introduces by the first two steps, that releases the first tRNA from
another important regulatory step in gene expression because its amino acid and the ribosome itself and moves the
a single gene transcript might be processed in one of several second tRNA into its place, exposing a new codon
different ways generating different mRNAs, which in turn lead of mRNA.
to the production of different proteins. About a third of genes
undergo alternative splicing and disruption to normal splicing Thus the process moves stepwise linearly through the mRNA
by mutations in key genes may lead to the production of pro- sequence to the termination step, the stop codon, at which
teins that damage the cell causing pathology. Exons are often time the polypeptide chain and mRNA are released from the
spliced together in different ways by tissue-specific signals cre- complex again in the presence of specific protein release
ating different isoforms with diverse functions in different factors.
organs. In this way the mammalian genome vastly increases
its repertoire of protein types above the relatively limited The control of gene expression from DNA to polypeptide
number of human genes available. Neurons have a specific can thus be seen as a complex and highly regulated pathway.
splicing regulatory system, dysfunction in which can lead to The gene unit itself can vary tremendously in size from the
several disorders. Chromosome 17-linked frontotemporal very small genes for human tRNAs of around 100 base pairs
dementia and parkinsonism is associated with mutations in long to the very large genes of some structural proteins such
the tau gene, leading to aberrant function of tau in microtubule as dystrophin (2 Mbp and 79 exons) which, when disrupted,
assembly. Eleven exons are normally spliced into six different leads to Duchenne and Becker muscular dystrophies. Most
protein isoforms. Exon 10 has a number of repeated microtu- genetic disorders involve disruption to the primary DNA
bule binding domains, and various mutations are associated sequence either as large-scale events such as chromosomal tri-
with increased spliced-in copies of this exon leading to neuro- somies, monosomies and genomic disorders, or in the small
degenerative disease. In spinal muscular atrophy there are scale as mutations within genes. However, changes in gene
mutations or small deletions at the exon splice sites producing expression resulting from alterations in transcription, splicing
the abnormal protein called survival of motor neuron protein and translation may be an important component of genetic risk
(SMN), which causes pathology. Other neurological disorders factors to many common disorders.
including spinocerebellar ataxia 8 and amyotrophic lateral
sclerosis may also involve splicing dysfunction.

145

Companion to Psychiatric Studies

The genome outside genes 5% of the genome, are 10–400 kbp long and show very high
sequence identity. Their homology can lead to aberrant recom-
Chromatin is the term used to describe the nucleic-acid– bination at meiosis, resulting in segmental duplications and
protein complex that gives structure as well as function to deletions, which are nowadays grouped under the title of
the chromosome. In the stage of the cell cycle following copy number variation (CNV) whose clinical manifestations
mitotic cell division large stretches of chromatin in the nucleus with the larger CNVs include the genomic disorders such as
become de-condensed and disperse throughout the nucleus Prader-Willi syndrome, velocardiofacial syndrome and many
and genes are embedded within this ‘true’ chromatin or others (Lupski 2007).
euchromatin. Other areas do not de-condense but remain com-
pacted at all stages of the cell cycle. This in general is termed Gene mutations
heterochromatin, subclassified as constitutive when it is present
in all cell types and facultative when it is heterochromatic only At the simplest level, mutation can be taken to mean any alter-
in certain cells such as the Barr body, which represents the ation from normal in the DNA sequence of chromosomes.
inactivated X-chromosome in females. Constitutive hetero- Various sizes of mutations can be delineated, ranging from
chromatin predominates around the centromeres and telo- duplications and deletions of whole chromosomes (trisomy
meres of the chromosome. The centromeres are bracketed and monosomy) through segmental changes involving short
by long stretches of large repeats of several hundred nucleo- parts of chromosomes (partial monosomy and trisomy), to
tides length termed satellite DNA, which form into dense micro deletions, duplications or inversions involving only one
heterochromatic non-coding regions or C-bands. This replicates or a few genes (CNV). Large chromosome variants are covered
at a different time of the cell cycle (late S phase) from euchro- in Chapter 20, and the focus here will be on DNA changes
matin, and has large blocks of repeated DNA sequences. affecting single gene units.
Although, historically, heterochromatin was considered inert,
more recently it has been shown to have a host of regulatory From the discussion of the genetic code it is clear that the
functions, especially those related to epigenetic programming open reading frame of a gene must be transcribed accurately
(see below). by the polymerase enzyme for the correct amino acids to be
assembled into polypeptide. Any change in the sequence of
It is now very clear that the histone proteins that package base pairs of DNA within the ORF can lead to errors in
DNA are important in maintaining the distinct regional archi- reading, and such mutations usually involve one (point muta-
tecture of chromosomes and in the dynamic regulation of gene tions) or a few bases. Some mutations arise during the process
expression. Histone methyltransferase enzymes selectively of recombination, some are caused by errors in the systems
methylate histones at lysine residues and methylation is vital that preserve DNA fidelity and some by external mutagenic
for maintaining the transcriptional silence of the heterochro- agents including radiation and certain chemicals. Clearly muta-
matin. forming part of a complex system maintaining the state tions occurring in the germline can be transmitted to offspring
of heterochromatin. This system may also play a role in epige- and could underly heritable illness. In non-coding regions of a
netic control of transcriptional activity. The histone tail has gene, including the introns, mutations are less likely to have
other modifiable regions including a ‘bromodomain’ that inter- biological effects but the immense amount of both coding
acts with de-acetylases, and hypoacetylated histone (H4) is and non-coding polymorphisms now found to be present
another hallmark of heterochromatin. Histone modification throughout the genome are valuable tools to study patterns
is therefore a key modulator of transcriptional activity in of inheritance and discover genes involved in illnesses.
euchromatin and heterochromatin, and is also crucial to the for-
mation and maintenance of parental-sex-specific methylation Most mutations within the ORF causing changes in tran-
that underlies the phenomenon of imprinting. scription involve the substitution of one base for another, and
50% of substitutions lead to missense mutations where a triplet
Up to 90% of euchromatin is transcriptionally inactive at coding for one amino acid is altered to one that codes for a
some loci and is highly repetitive in sequence. The polymor- different amino acid. Depending on the position of the amino
phic nature of this DNA has made it useful in linkage studies acid within the polypeptide and the type of amino acid substi-
and it has attracted its own nomenclature. Small tandem tuted, this can result in increased or decreased protein activity
DNA repeats of a few nucleotides long are termed microsatel- or sometimes to complete loss of function. Nonsense mutations
lites, of several tens of nucleotides minisatellites. Also present (30%) arise when the substitution changes the codon to a stop
are transposable elements making up about 45% of all DNA. signal; these are often pathological and usually lead to a
Long interspersed nuclear elements (LINEs) are an ancient decreased level or complete absence of mRNA. The mecha-
feature of around 6 kbp long that make up 20% of the genome nism for silencing the production of the abbreviated mRNA
and contain all the necessary information for self-transposition. is unclear. The exon position of the nonsense mutation is also
Embedded within them are Alu sequences of about 300 bp important; those in the terminal exons have sometimes been
forming some 10% of the genome. Other repeat elements found to have less effect. Nonsense mutations may also
include small interspersed nuclear elements (SINEs) and long produce ‘exon skipping’ where the exon containing the stop
terminal repeats (LTRs or retroposons). One major role of codon is completely bypassed during transcription and an aber-
methylation could be to silence transcription of these ele- rant mRNA is formed. Silent mutations are those that have
ments. However, repeat sequences can have unwanted effects. no effect on the phenotype. This can be the result of the
Low-copy repeats (LCRs or segmental duplications) form over

146

Genetics in relation to psychiatry CHAPTER 8

redundancy built into the genetic code as when a mutation Classical autosomal
substitutes an alternative triplet for the same amino acid (syn- dominant
onymous mutation). Insertions and deletions of bases however
can often lead to drastic consequences if they are frameshift Classical autosomal
mutations causing disruption of the entire downstream recessive
sequence of the reading frame resulting in an aberrant mRNA.
The disruption is maximal when the deletion or insertion is Fig. 8.5 Autosomal inheritance. Affecteds are in black.
not a multiple of three. Chromosomes are given for certain individuals and show mutated
(shaded) and wild-type alleles. Squares denote males and circles
Mutations in Cytosine-p-Guanine dinucleotide pairs (CpG) females. A central dot indicates an unaffected carrier. Consanguinity
account for up to a third of the total of single-base-pair muta- is shown by a double bar.
tions. The human genome is heavily methylated by conversion
of methyl-cytosine to thymine on the coding DNA strand at
CpG pairs (including exons) except at specific CpG-dense
clusters termed CpG islands that usually lie near or within
the promoters of actively expressed genes.

Mutations can also cause differences in the way mRNA is
spliced and splice-junction mutations are a common cause of
human disorder and account for about 10% of all mutations.
There may be failures in the splicing mechanisms or mutations
may create new splice sites in aberrant positions. The cell
may attempt compensation for loss of a splice site by splicing
at a second illegitimate site (cryptic splice site), or by exon
skipping (as in spinal muscular atrophy). Mutation in 50 and
30 UTRs can also be pathogenic. 50 disruption can lead to
disturbance of translational efficiency (as in Charcot-Marie-
Tooth syndrome), and 30 disruption to changed mRNA stabil-
ity. Finally, mutations in remote gene promoter and regulatory
elements are increasingly recognised sometimes up to
hundreds of kilobase pairs away from the main gene unit.

Patterns of inheritance

Complementary genes on both the paternally and mater- frequency of mutations in the population is high and this
nally derived autosomes in a cell are normally both transcribed is explained either by frequent new mutations, or by a repro-
(biallelic expression), and a gene mutation on the chromosome ductive advantage conferred by the heterozygous state. If both
derived from one parent usually leads to a reduction (haplo- parents carry one copy of the mutant gene then, on average,
insufficiency), not absence, of a gene product. The level of one in four of their children will be affected. No cases may
product may be sufficient for normal cell function and cause be detected in previous generations, the disease being apparent
no phenotypic consequences unless the complementary gene only in the children.
is also mutated. This situation would present as a classical
recessive inheritance pattern. At the other extreme, when Dominant disorders, which occur in 7:1000 live births, are
silencing or downregulation of one gene of a pair is functionally more common than recessive disorders and are more likely
uncompensated by the other, the resulting illness would show to be associated with late-onset conditions. Fully dominant
a dominant pattern (Fig. 8.5). conditions result in a phenotype when the person is heterozy-
gous or homozygous for the mutant genes. Affected individuals
Mendel studied phenotype patterns in plant breeding are found in every generation, and on average half of any
experiments, and suggested that quantal genetic factors were sibship are affected. Fully dominant disorders are rare, and
being inherited. Disorders due to the inheritance of a single usually the clinical outcome is variable, especially in heterozy-
mutated gene (monogenic disorders) are often said to be gote individuals because of reduced penetrance, measured as
Mendelian. Recessive conditions, which occur in about 2:1000 the proportion of people heterozygous for the mutation who
births, rely on mutations in both genes of the pair and the show phenotypic features.
affected person is usually homozygous for the same mutation.
If the gene mutations in the paternally and maternally derived Mutations in genes on the sex chromosomes form a third
chromosomes differ, the offspring is said to be a compound Mendelian group. Males can only pass on their Y chromosome
heterozygote. The chances of inheriting rare gene mutations to their sons, so that there will be no male-to-male vertical
from both parents is increased if they are themselves closely inheritance in X-linked conditions, but affected males are
related, and an increased rate of consanguinity is often found more common in X-linked pedigrees because the mothers of
in recessive disorder. With more common recessive disorders affected males must be carriers of the gene mutation, and will
such as cystic fibrosis present in 1:2500 live births, the be either unaffected if the condition is X-linked recessive or
affected if the condition is X-linked dominant. A typical

147

Companion to Psychiatric Studies

XX YX where variation in any single gene is neither necessary nor suf-
ficient on its own to account for the phenotype. Quantitative
XX trait loci (QTLs) are genes that contribute quantitatively to
YX the variance of a continuous trait, as distinct from single gene
mutations that may have a major effect on risk of disease.
XX Adult height is a good example of a quantitative trait because
it is highly heritable with up to 90% of height variation in
YX the population explained by genetic variation. Rare single gene
mutations causing extreme changes in height have been found
Fig. 8.6 Classical X-linked inheritance. In this case the inheritance is but these make only a small contribution to the distribution of
recessive. Affecteds are in black; they are all male. Females are either height in a general population. However, genome-wide associ-
normal or disease-free carriers (the latter are shown with a central ation analyses have identified 20 contributing genetic variants,
dot). The symbols follow the convention of Fig. 8.5. Chromosomes each of small effect size and it is likely that many more risk
are given for certain individuals and show mutated (shaded) and genes await discovery. Common conditions and attributes
wild-type alleles on the X chromosomes. The Y chromosomes do including hypertension, obesity, cognitive abilities and person-
not express the given allele. ality dimensions are understood as quantitative traits, and this
model explains aspects of the heritability of common psychiat-
presentation is shown in Fig. 8.6, and various sex-linked ric conditions including depression, anxiety, schizophrenia and
disorders are considered in detail in Chapter 20. bipolar disorder. Detecting genes with small effect in common
disorders requires large, well-defined study populations, and
Sometimes one gene may interact with another gene on the genome-wide association studies involving large numbers of
same or a different chromosome to influence the expression of single nucleotide polymorphism (SNP) markers have provided
a phenotype, a situation termed epistasis. In psychiatry an confirmation that many genetic variants with small effect
interesting example of this is the effect on the risk of dementia sizes in genes with a wide range of functions make additive
of interactions between variations in the gene for apoli- contributions to the risk of schizophrenia and bipolar disorder.
poprotein E and variations in the gene for amyloid protein.
Mutations in completely different genes may cause the same However, it is also becoming apparent that a substantial
clinical picture (locus heterogeneity). For example, early-onset proportion of genetic risk in these psychiatric disorders can
Alzheimer’s disease can be caused by mutations in genes for be explained by multiple rare variations in single genes with a
presenilin I (chromosome 14), presenilin II (chromosome 1) large effect size in some individuals and families suggesting
and amyloid precursor protein (chromosome 21). Some dis- that more than one model of inheritance is needed to explain
eases may result from different pathogenic mutations within the complex heritability of psychiatric illnesses.
one single gene (allelic heterogeneity). Over 600 different
mutations (mainly missense) causing cystic fibrosis have been Dynamic mutations
found in the cystic fibrosis transmembrane conductance
regulator gene (CFTR). Many inherited disorders do not fall neatly into Mendelian
classes due to a number of different genetic mechanisms
Non-Mendelian inheritance that have a bearing on the phenotype. Anticipation describes
the situation when a disease has an earlier age-at-onset and
Quantitative trait loci (QTLs) increased severity in succeeding generations. One possible
cause is dynamic mutation involving the expansion of repeat
Many human traits and disorders have strong heritability with sequences during gamete formation leading to alteration of
a genetic contribution shared by several (oligogenic) or many gene expression. The classical dynamic mutation is the triplet
(polygenic) genes. These phenotypes, termed quantitative repeat expansion in the FMR1 gene on chromosome X asso-
traits and described under a liability/threshold model, are ciated with fragile X syndrome. The non-coding repeat in
the result of the additive or interactive effects of different loci fragile-X syndrome at the 50 untranslated end of exon 1 of
FMR1 leads to gene silencing by methylation and is fully
described in the chapter on learning disability psychiatry.
Repeats occurring in the ORF of genes, termed coding
dynamic mutations, often have GAG as the repeated triplet
base, which then codes for a polyglutamine tract within the
protein. This is the case in Huntington’s disease, where the
huntingtin gene on the short arm of chromosome 4 has over
35 GAG triplets within the first exon, and the size of the
repeat is related to age-at-onset of the disorder, explaining
up to 60% of the variance. The aberrant huntingtin protein
contains a polyglutamine sequence that leads to a gain of
function with pathological consequences. The altered protein
may interact with nuclear transcription factors and similar pro-
teins to disrupt the expression of other genes. It also forms

148

Genetics in relation to psychiatry CHAPTER 8

abnormal complexes with cytoplasmic and nuclear proteins patterns in somatic cells. In germ cells, however, further
that may be toxic and play a role in neurodegeneration. Most reprogramming is needed to remove those methylation pat-
known GAG expansion diseases have a clinical picture involv- terns that escaped the first general round of de-methylation.
ing neurodegeneration and typically are also late-onset. The This is followed by sex-specific re-methylation, completed
myotonic dystrophy type 1 repeat (CTG) is located at the 30 in developing sperm at a very early stage (premitotic) but
UTR of the gene, and the pathology is thought to arise from much later in ova (premeiotic stage). Thus methylation is
accumulation of abnormal mRNA in the nucleus that then one way in which the cell can ‘mark’ or ‘imprint’ its DNA in
interferes with splicing and other functions. Myotonic dystro- a sex-specific pattern.
phy type 2 is associated with an untranslated tetranucleotide
repeat (CCTG) in the first exon of a completely different It has been pointed out that DNA methylation is a system of
gene, ZNF9 on chromosome 3. The strikingly similar clinical ‘cellular memory’ that senses that a silent state is to be stabi-
picture may result from similar patterns of mRNA accumula- lised and invokes the mechanisms required for this (Jaenisch &
tion leading to general effects on the function of the cell Bird 2003). However, the actual primary marks that set this
nucleus. Apart from the neurodegenerative conditions and in play are still unclear. Both in somatic and germ cells the
disorders associated with learning disability such as myotonic establishment of the correct methylation pattern is very
dystrophy and fragile X syndrome, no convincing evidence important, and several disorders can result from its disruption.
has yet emerged for the involvement of repeat expansions in Some chromosomal regions that escape the first round of
any form in general psychiatric illness. embryonic de-methylation end up with differential imprinting
between loci inherited from mother or father. In such cases
Epigenetics, imprinting and parent-of-origin monoallelic gene expression is the normal outcome. Such genes
effects usually come in clusters, and there is often a local DNA region
that acts as a specific imprint control centre, as in the Prader–
Epigenetic processes take place during development and lead Willi/Angelman region (see Chapter 20). Disruption of this
to stable changes in the ability of cells to transcribe DNA. specific imprinting pattern by deletions, uniparental disomy
These alterations to chromosomes are heritable from parent (both chromosomes of a pair originating from only one parent)
to child but do not involve direct mutations in the DNA itself or imprinting centre mutations can result in these syndromes.
(Suzuki & Bird 2008). There are many interacting routes by Other genes that are normally differentially imprinted include
which such controls on expression are effected: chromatin several important fetal growth factors such as the insulin-
remodelling, histone alteration and direct DNA methylation. related series. Disruption of epigenetic mechanisms may have
The latter two may sometimes act sequentially to establish a consequence for cognitive development and more generally,
the stable state represented by chromatin remodelling. Thus epigenetic mechanisms may underlie many non-Mendelian fea-
DNA methylation may be the signal for histone modifica- tures of psychiatric disorders including discordance rates in
tion that may then act to recruit other chromatin modelling monozygotic twins, age-at-onset effects, sex-specific expres-
proteins, which rearrange chromatin into a stable unexpressed sion and parent-of-origin effects. The childhood neurodevelop-
state. However, it is apparent that the whole system is mental disorder with features of autism, Rett syndrome, is
dynamic and reversible in its interplay. External influences caused by mutations in the MePC2 gene on chromosome X.
can also alter histone modification (e.g. drugs such as sodium The MeCP2 protein binds to methylated DNA at CpG islands
valproate) linking environmental effects to gene expression where it is involved in the regulation of the expression and
(McKechanie & Muir 2009). DNA methylation occurs mainly silencing of genes. Recent findings in mice models of Rett
at CpG sequences and follows a developmental route that syndrome show that the neurological features are reversible,
differs in males and females. It functions as a signal that holding out hope for therapy in the future (Guy 2007).
recruits cellular factors leading to the transcriptional silencing
of associated coding regions of DNA. Originally it may have Mitochondrial (cytoplasmic) inheritance
developed as a represser of unwanted transcription of the
widespread transposable elements that occur in the mamma- The mitochondria contain DNA that resides outside the
lian genome, as well as to differentially regulate developmental nucleus and thus does not segregate in meiosis. In man the
gene expression. In the zygote the genome inherited from mitochondrial DNA (mtDNA) is small (around 16.5 kbp),
the father is actively stripped of methylation within hours of has been entirely sequenced and contains genes encoding about
fertilisation, whereas methylation of the genome inherited 13 proteins particularly involved with oxidative phosphoryla-
from the mother passively decreases in later cleavage divisions. tion. Since sperm have no mitochondria, inheritance is purely
This de-methylation seems to be an essential ‘slate-cleaning’ maternal through the mtDNA contained in the mitochondria
process removing most (but importantly not all) of the inher- of the ovum. However, most cells have several copies of this
ited parental chromosomal DNA methylation patterns. An genome, and mutations in mitochondrial genes leading to dis-
extensive de novo re-methylation follows, that then decreases ruption of the mitochondrial respiratory chain have been asso-
in a tissue-specific fashion, releasing coding regions from their ciated with a variety of human neurodegenerative diseases
inactive states to produce the necessary proteins for cellular typically with features that include encephalopathy, dementia,
proliferation and differentiation. After methylation, erasure ataxia, deafness and opthalmoplegia. The mutation may only
differences between the developmental routes in male and occur in some of the copies of mtDNA in cells, a situation
female embryos lead to sexually dimorphic methylation termed heteroplasmy, leading to very variable expression of
these conditions.

149

Companion to Psychiatric Studies

Population genetics Table 8.1 Relative risk (lr) of common psychiatric conditions,
derived from family studies
Family, adoption and twin studies
Relative risk in relatives
A disease with a significant genetic causation will tend to aggre-
gate in families, and comparison of rates of illness in relatives Attention deficit hyperactivity disorder 55
with the general population rate will provide a measure of the Autism 45
strength and nature of the genetic contribution. Clustering of Schizophrenia 10
disease in families may be entirely attributable to shared envi- Bipolar disorder 7
ronment, for example when infections are related to poor living Alcoholism 6
conditions, or entirely due to shared genetic inheritance as in Generalised anxiety disorder 2–5
the case of Huntington’s disease where all affected individuals Anorexia 2–4
in a family have inherited a mutated huntingtin gene. However, Unipolar depression 1.5–3
most psychiatric conditions are influenced by a combination of After McGuffin et al (2002).
interacting genetic and environmental factors. The main strate-
gies used to dissect the relative contributions of nature and
nurture in family members are twin and adoption studies.

Family studies In a recent population-based study of bipolar disorder and
schizophrenia in Sweden, data contained in a national register
In general the association between a genetic risk factor and a of hospital discharges between 1973 and 2004 was matched
disease in a population can be expressed as a relative risk with a national register of data on families to create a cohort
(RR) defined as the ratio of the incidence of disease in an of over 9 million people including over 35 000 probands with
‘exposed’ group and ‘unexposed’ groups and measured by schizophrenia and 40 000 with bipolar disorder (Lichtenstein
studying the illness in population cohorts. To study aggregation et al 2009). Analyses were made of rates of illness among dif-
of disease within families, it is useful to compare the relative ferent classes of relatives and the contributions to liability of
risk of illness in specific groups of relatives with the general genetic and environmental factors were estimated by compar-
population risk. Data is often most easily obtained from sib- ing rates in offspring, siblings, half-siblings and both biological
lings, and the relative risk (or risk ratio) for relatives (lr) or and adoptive parents. The relative risk in siblings (ls) was
specifically for siblings (ls) is frequently used to evaluate the 9 for schizophrenia and 7.9 for bipolar disorder. Heritability
strength and significance of familial aggregation of a disease. was 64% for schizophrenia and 59% for bipolar disorder and
ls is defined as the recurrence risk to siblings compared with shared environmental effects were small (4.5% and 3.4%,
the risk of the disorder in the general population. A high value respectively). A novel and striking finding was the comorbidity
of ls reflects a strong genetic effect: for example, in a single between schizophrenia and bipolar disorder due in large part to
gene disorder with dominant inheritance such as Huntington’s genetic effects shared by both conditions. The relative risk for
disease, ls is about 5000; and in cystic fibrosis, a recessive dis- schizophrenia was 3.9 among siblings of bipolar probands and
order, ls is around 500. Few psychiatric illnesses show such the relative risk for bipolar disorder among siblings of pro-
clear-cut familial aggregation (Table 8.1). For schizophrenia bands with schizophrenia was 3.7. Genetic factors shared
ls is about 10, and in complex disorders ls greater than 2 is between schizophrenia and bipolar disorder have been
generally taken to indicate a significant genetic component. detected also in recent genetic association data raising the pos-
Caution is needed when evaluating l because the value sibility that future classifications of common psychiatric disor-
depends on the population prevalence of the disease and in ders based on biology and genetics may challenge current
general a strong genetic effect in a common disease will gener- nosologies based largely on a description of symptoms by
ate a smaller l than the same effect in a rare disease. patients.

Family and population data have been extensively studied in Analysis of segregation
schizophrenia. In a highly influential book reviewing the results
of several family studies Gottesman (1991) concluded that the If a disease, measured as a discrete trait, is found to be herita-
risk of developing schizophrenia in relatives of probands with ble, it is often useful to know whether or not the disease
the disorder is significantly increased among all classes of follows Mendelian rules, or if more complex modes of inheri-
relatives. In summary the life-time risk of developing schizo- tance need to be considered. The laws of Mendelian inheritance
phrenia and related illness was around 1% in the general popu- allow precise predictions of the expected number of affected
lation, 10% among first-degree relatives, dropping to about 3% and unaffected offspring of affected parents, measured as seg-
in second-degree relatives. Offspring both of whose parents regation ratios, and the statistical methods of segregation analy-
had schizophrenia had a 40–50% chance of becoming ill. sis applied to data on the observed distribution of a disease in
families lead to predictions of whether the disease is caused
Family and population studies in bipolar disorder have also by one or more loci. Studies of segregation of illness in families
confirmed an increased risk of illness in relatives of bipolar
probands, with an estimated relative risk in first-degree rela-
tives of 5–10% (Mortensenet al 2003).

150

Genetics in relation to psychiatry CHAPTER 8

with most psychiatric disorders, including schizophrenia, bipo- reliability of earlier diagnostic methods, Kendler et al
lar disorder and depression, are not consistent with a model (1994) re-analysed the data from the Danish study,
of illnesses caused by variants in a few major genes, and a mul- applying strict DSM-III criteria and confirmed a diagnosis
tifactorial threshold model gave the best fit to observed family of schizophrenia in 8% of first-degree relatives of
data in schizophrenia (McGue & Gottesman 1989). In bipolar schizophrenic adoptees, contrasting with only 1% among
disorder some segregation studies support a model in which sin- relatives of control adoptees with no history of
gle genes of relatively large effect cause illness in some families schizophrenia. The possibility of strong shared
(Rice et al 1987) while other studies support a polygenic model environmental influences in utero was addressed by Kety
explaining illness as the result of additive or interacting gene (1976) who studied the rate of illness in a group of paternal
variants, each one alone being neither sufficient nor necessary half siblings of schizophrenic adoptees and demonstrated
for illness to develop (Craddock et al 1997). However, linkage an increased incidence of schizophrenia in paternal half
studies in single large families with schizophrenia or bipolar dis- siblings that could not be attributed to pre- and perinatal
order indicate that illness can sometimes be attributed to the effects.
effect of a single locus (Venken & Del-Favero 2007) and accu-
mulating data from association studies in schizophrenia, bipolar • Cross-fostering design. Children adopted shortly after birth
disorder and depression are consistent with genetic heterogene- will still have experienced the pre- and perinatal
ity and a major role for common variants with small effect and environment provided by their biological mother and after
also a substantial contribution to genetic risk made by rare adoption may suffer greater stress by virtue of being an
penetrant variants. adoptee. These potential limitations of adoption studies
were addressed by the cross-fostering design comparing the
Adoption studies rate of illness in two groups of adoptees: one group has ill
parents and after adoption has been raised by well parents,
Adoption studies are one of the most powerful ways to dis- the second group has well biological parents but has been
entangle genetic from environmental influences on a disease. brought up in a family where a parent has become ill.
There are three main designs of adoption studies, and the Children adopted into a home where an adoptive parent
choice of design will depend on the available methods of becomes ill do not have an increased risk of illness.
ascertainment.
Adoption studies in bipolar disorder have similarly confirmed
• Parent as proband. One of the first studies of the adopted increased rates of affective disorder in biological compared
offspring of mothers diagnosed with schizophrenia was with adoptive relatives of adoptees.
carried out in Oregon, where 47 individuals, adopted
shortly after birth when their mothers were receiving Twin studies
institutional care, were traced in adulthood (Heston 1966).
Rates of illness among these adoptees were compared Monozygotic (MZ) or identical twins result from a single ferti-
with those in 50 adopted offspring of mothers without lised ovum and therefore share all genes, whereas dizygotic
psychiatric illness. The striking finding of the Oregon (DZ) or fraternal twins are the result of the implantation of
Adoption Study was a significant increase in schizophrenia two separate fertilised ova and generally share about 50% of
in the adoptees whose mothers were schizophrenic: 5/47 genes and are no more alike than other siblings. Since, in gen-
compared with 0/50 in the control group. The Finnish eral, twins share a very similar cultural, family and educational
Adoption Study, a large ongoing national study of adoptees environment, a comparison of MZ and DZ twins allows an
with high and low genetic liability for schizophrenia, found estimate of genetic as well as environmental contributions to
a 6.7% lifetime prevalence of schizophrenia among the their phenotype. Concordance rate measures the similarity of
offspring of mothers with schizophrenia, significantly phenotype between twins. If both members of a pair of twins
greater than the 2% prevalence in a control low risk group develop a disease they are said to be concordant for that con-
of adoptees. The liability in adoptees also extended to a dition. For a fully genetic disease showing a dominant pattern
broad spectrum of psychotic and non-psychotic disorders of inheritance, concordance will be 100% in MZ twins and
(Tienari et al 2000). Further study of adoptive families has around 50% in DZ twins. In the case of a recessive disorder,
highlighted the importance of a healthy family rearing DZ concordance will be about 25%. When a disease is entirely
environment as protection against a pathological outcome due to environmental causes we expect to find no difference in
establishing an interplay between genetic risk and family concordance rates in MZ and DZ twin pairs. Most psychiatric
environment (Wynne et al 2006). disorders are likely to be a result of both genetic and non-
genetic factors, and concordance rates in MZ twins may be
• Adoptee as proband. In this approach, adopted children quite small, but a significant genetic contribution will be indi-
who become ill are ascertained and rates of illness are cated by the comparison of MZ and DZ rates. The simplest
compared in their biological and their adoptive families. way to measure this is pairwise concordance defined as the
Studies carried out in Denmark, where national registers number of concordant twin pairs divided by the total number
facilitate the tracing of adopted children, showed that 20% of pairs studied. More commonly the probandwise concor-
of 118 biological relatives and only 6% of 224 adoptive dance is quoted, and this is the number of affected co-twins
relatives had a diagnosis of schizophrenia. This difference with an affected proband divided by the total number of co-
between adoptive and biological relatives was significant twins in the study. The mode of ascertainment of the sample
(Kety et al 1994). To remove any doubts about the of twins used in the study is very important and unless an

151

Companion to Psychiatric Studies

entire population has been systematically screened proband- clear understanding of the underlying neurobiology, the range
wise concordance will be different from pairwise concordance of possible candidates could include most of the 10 000 or
because some twins will be counted twice if they have been more genes expressed in the brain. A ‘positional’ strategy aims
independently ascertained for probandwise analysis. to identify the approximate chromosomal location of genes
using the methods of linkage analysis in families, association
In a classic twin study of schizophrenia, Gottesman & studies in populations or mapping of cytogenetic anomalies
Shields (1972) found in their sample 11 concordant and 11 in individuals. A positional approach has been successful in
discordant pairs of MZ twins and 3 concordant and 30 discor- identifying genes responsible for many single-gene disorders –
dant DZ pairs of twins. This gives a pairwise concordance rate including cystic fibrosis, Huntington’s disease, muscular dys-
of 11/22 ¼ 50% for MZ and 3/33 ¼ 10% for DZ twins. In the trophy and some familial cases of Alzheimer’s disease – and
same sample, proband concordance was calculated to be 58% does not rely on prior knowledge of the biology of the disease
for MZ twins and 12% for DZ twins. The difference between permitting the discovery of previously unknown genes.
the two methods of analysis arose because in 4/11 pairs of
concordant MZ twins both of the twins were ascertained inde- Isolated populations that have low levels of out-breeding
pendently, so in effect counted twice, to calculate pairwise can be used to detect rare recessive conditions by examining
concordance. Similarly 1/3 pairs of concordant DZ twins were regions inherited in common from shared ancestors (homozy-
ascertained independently. This illustrates the importance of gosity mapping). This also reduces the problem of locus het-
ascertainment in twin studies. Both analyses yielded significant erogeneity where several causal genes may give rise to the
differences between MZ and DZ concordance, proving a same phenotype. Consanguinity has helped map many reces-
genetic effect. An important study of the offspring of MZ sive genes (Botstein & Risch 2003). Severity of disease also
twins discordant for schizophrenia found that the children of tends to correlate strongly with severity of the underlying gene
unaffected co-twins inherited the same increased risk of mutation. For many diseases, identifying the mutation under-
schizophrenia as their cousins who were offspring of the lying the severe form has revealed other mutations that have
affected twin (Kringlen & Cramer 1989). From a meta-analysis milder outcomes (e.g. Dystrophin mutations in Duchenne
of 12 published twin studies the heritability in liability to muscular dystrophy and the milder Becker form). In fact it
schizophrenia was 81% and there was also a small but significant is such severe high-risk mutations that are selected for in
contribution to risk from shared environmental influences. The most linkage studies, and subsequent gene cloning leads to
interpretation of ‘environmental’ risk in twin studies requires identification of more frequent, but milder mutations.
explanation. Concordance rates for schizophrenia between
MZ twins are reported to be around 50%, but it would be Linkage analysis
wrong to conclude that half of the variation in phenotype can
be attributed to peri- or postnatal causes because environment Linkage studies look for the co-segregation of polymorphic
may include not only family upbringing and life events but DNA markers with disease in families. Studies on single large
extends to a great variety of biological processes including pre- pedigrees, on many small two-generation families or on large
natal exposures to infections, toxins and nutrients and epige- numbers of pairs of affected siblings are widely adopted
netic risk factors such as DNA methylation that influence strategies.
the expression of genes in an individual but do not involve
changes in the genome (Sullivan et al 2003). The basis of linkage analysis is the recombination that takes
place between pairs of homologous chromosomes during the
In bipolar disorder, twin studies yield essentially similar first stage of meiosis. On average, recombination takes place
findings as in schizophrenia. MZ concordance rates of 40– at two places on each chromosome during every meiosis, and
60% are significantly greater than DZ concordance rates. the recombination fraction (denoted by y) is the probability
Because unipolar depression is common in the general popula- that a recombination event will occur between two markers
tion, heritability can be measured from large community-based or between a marker and a disease locus. The recombination
twin registers, and estimates of heritability of 30–40% were fraction is a useful measure because over short distances it is
made from the Virginia Twin Register (Sullivan et al 2000). a measure of the physical distance between two markers on a
A further observation from this cohort was an almost complete chromosome. When two genes are physically far apart on a
correlation between generalised anxiety disorder and major chromosome they will usually become separated by recombi-
depression, indicating that the same genetic factors contribute nation during meiosis and will assort independently of each
to depression and anxiety (Kendler et al 1992). other just as if they were on separate chromosomes (Mendel’s
Law of Independent Assortment). In general, the closer a poly-
Mapping and finding genes morphic marker is to a disease locus, the more likely it is that
the two will remain together from one generation to the next,
When family, adoption and twin studies have identified a sub- because the chance of a recombination taking place between
stantial genetic contribution to a disease, different strategies them is proportional to the physical distance that separates
can be followed for identifying genes. A ‘functional’ approach them. If two points are separated by a million base pairs of
requires the selection of candidate genes to be directly exam- DNA (one megabase Mb) then recombination will occur
ined for mutations. The selection of candidate genes is usually between them roughly once in every 100 meioses, which is
based on knowledge of the biology or pharmacology of the dis- 100 generations, and the statistical unit to describe this rate
ease; but because, for most psychiatric conditions, we have no of recombination is termed the centimorgan (cM). The effect

152

Genetics in relation to psychiatry CHAPTER 8

of recombination is analogous to cutting a pack of cards. The by a disease employs a large number of polymorphic markers,
chance that two cards will be separated by the cut is propor- microsatellites or single nucleotide polymorphisms (SNPs),
tional to how far apart they are in the pack and two consecu- chosen to be evenly spaced at intervals of less than 10 cM
tive cards are the least likely to be separated by repeated across all chromosomes.
cutting. Genes far apart on the same chromosome co-segregate
randomly and have a 50:50 chance of remaining on the same Affected pairs of siblings are easier to recruit than large
chromosome following meiosis. In a family linkage study the pedigrees, and if very large cohorts are studied the approach
recombination fraction (y) will therefore lie between 0 (indi- is suited to the detection of genes of small effects predicted
cating that a polymorphic marker is a physical part of the gene under a polygenic model. However, if substantial locus hetero-
responsible for the disease so the marker and the gene never geneity is present, this approach requires unrealistically large
become separated) and 0.5 (completely independent assort- samples and in this situation linkage in single extended pedi-
ment of marker and gene). For analysis of linkage the recombi- grees can be informative. A large number of linkage projects
nation fraction between a marker and disease locus can be using both strategies in schizophrenia bipolar disorder and
measured directly in a family or group of families simply by depression have been completed in the past two decades,
counting the number of recombinant individuals, divided by and evidence for linkage, supported by more than one study,
the total number of offspring. Figure 8.7 illustrates the segre- has emerged in several chromosomal regions. Two meta-
gation of two markers in a family and the principles of linkage analyses of linkage studies in schizophrenia using different
analysis. In large families calculation of linkage is complex and statistical approaches have combined the results of over 20
the recombination fraction is estimated by the method of genome-wide linkage scans (Badner & Gershon 2002; Lewis
maximum likelihood and calculated using a variety of ‘linkage’ et al 2003) and identified several chromosomal regions
programmes (Haines & Piricak-Vance 1998). The conventional expected to house genes contributing to the development
statistical method to test for linkage is to calculate the LOD of illness. Badner and Gershon found strongest evidence for
(Log of the Odds) score from the recombination fraction. It susceptibility loci on chromosomes 8p, 13p and 22q. Lewis
is conventionally accepted that a LOD score of 3 (odds in identified regions in descending order of probability on chro-
favour of linkage of 1000:1) is considered proof of linkage mosomes 5q, 3p, 11q, 6p, 1q, 22q, 8p, 20q and 14p. Two
and conversely a LOD score of À2 (odds against linkage of regions on chromosomes 8p and 22q were highlighted by both
100:1) is accepted as exclusion. meta-analyses, which both demonstrated consistency across
linkage studies, so it is safe to assume that some or all of these
Linkage analysis is an important tool for the analysis of regions contain genes that increase susceptibility to schizo-
genetic loci, exploiting the immense amount of sequence vari- phrenia in some populations.
ation found across the human genome. Typically a genome-
wide screen for linkage in a group of families multiply affected Similarly meta-analyses have been conducted with the large
number of genome-wide linkage studies in bipolar disorder.

Fig. 8.7 The segregation of two polymorphic markers at two A1 2 23
hypothetical points that lie close together on an autosome in a B3 4 44
small family. Both markers are bi-allelic; for example, they might
represent a single-nucleotide polymorphism (SNP). The 13 12
allelotypes at locus A are (arbitrarily) labelled 1 and 2, and those 34 34
at B are labelled 3 and 4. In this example the two loci are tightly
linked together, and within this pedigree there are no examples 22 12
of recombination between the homologous chromosomes. 34 43
However, the presence of different allele combinations in those
marrying into the pedigree suggests that this would not be the 12 22
case if we could examine a considerable number of meioses. In 33 44
addition to being close to each other on the chromosome, the
markers have a combination of alleles (shaded 1, 3) present in 11
all individuals with the proposed disorder (the black pedigree 34
symbols). Linkage analysis tests statistically whether such
apparent segregation is likely to have occurred by chance. If
this is unlikely, then a locus that increases the susceptibility of
the diathesis lies close to the markers on the chromosome.
Also note that, although apparently autosomal dominant, the
disorder is not fully penetrant, and an unaffected individual who
must carry the susceptibility locus occurs in the middle
generation.

153

Companion to Psychiatric Studies

Badner and Gershon (2002) identified regions on chro- QTL increases the liability to develop dementia but is neither
mosomes 13q and 22q and Segurada (Segurado et al 2003) sufficient nor necessary to cause illness. Linkage disequilib-
combining data from 18 genome-wide linkage scans found no rium may also occur between a disease and a polymorphic
region with genome-wide significance but suggestive linkage marker situated very close to the disease-related gene but
at 9p, 10q and 14q. Fewer linkage studies have been carried not involved in causing the disease phenotype. When the
out in depression but in cohorts with recurrent early onset marker and the disease gene are physically close on the
major depressive disorder linkage has been detected at loci genome they are less likely to become separated by recombina-
on chromosomes 2, 12 and 15 and evidence for linkage on tion over many generations. In this situation, association
chromosome 2 and 12 was stronger in women than men sug- between a disease and a polymorphic marker in a population
gesting that some susceptibility loci are sex specific (Abkevich can be explained by a founder effect when a significant propor-
et al 2003; Zubenko et al 2003; Levinson et al 2007). tion of people with the disease are the descendants of a single
founder person who introduced the disease to the population
Some genes of apparently large effect in schizophrenia and many generations previously. Descendants with the disease
bipolar disorder have been detected in studies of extended will have inherited the disease-related gene together with the
families, although these may be relatively rare causes of ill- DNA sequence immediately surrounding that mutation.
ness in the general population. In a single family, carriers of a The more closely to the gene a DNA polymorphism is located
reciprocal translocation t(1;11)(q42.2;q21) that was stably the less it is likely to be separated by recombination, and the
inherited in a large Scottish pedigree were shown to have very association of the polymorphism with disease will remain over
high rates of major psychiatric illness when compared with many generations. The ideal setting for an association study
non-carriers. The strongest evidence for linkage (LOD score would thus be a completely isolated island population where
of 7.1) was found with a phenotype that included both schizo- the disease had been introduced many generations previously
phrenia and affective psychosis (Blackwood et al 2001). Can- by a single founder and all the people with the disease being
didate genes for schizophrenia, Disrupted In Schizophrenia 1 studied were descended from that person. In such a situation
(DISC1) were detected by cloning the translocation break- the mutation causing the disease would lie within the same
point (Millar et al 2000). Linkage and association was con- haplotype of markers in all cases. In practice, association stud-
firmed with schizophrenia and depression in several different ies are successfully carried out, for example in European popu-
populations (Chubb et al 2008; Muir 2008), and DISC1 lations, provided cases and controls are carefully matched
appeared to have a role in neurodevelopment. The observation for age and ethnic origin. Linkage disequilibrium typically
that loss of function of DISC1 impairs proliferation of neural extends over very short distances of a few hundred kilobases,
progenitor cells in regions of the brain including the dentate giving much higher resolution for mapping genes than can be
gyrus suggests a mechanism by which pathways involving achieved by linkage strategies. Very many candidate gene asso-
DISC1 may contribute to psychiatric illness (Mao et al 2009). ciation studies have been performed in psychiatric illnesses but
the field entered a new phase when genome-wide association
Family linkage studies have identified several chromosome studies (GWAS) became feasible using arrays containing up
regions likely to harbour genes implicated in bipolar disorder. to one million SNPs to genotype samples of several thousand
The task of finding genes in these regions, using methods of cases and controls. GWAS data from the limited number of
linkage disequilibrium mapping and direct sequencing of can- studies published to date for schizophrenia and bipolar disor-
didate genes, is not trivial, because linkage typically has low der have revealed important new insights into susceptibility
resolution for locating genes and defines a broad chromosome loci. A combined analysis of three studies in bipolar disorder
region often spanning many genes. totalling over 4000 cases and 6000 controls confirmed associa-
tion at some known candidate genes including DISC1 and
Association studies identified novel regions including the ankyrin-G (ANK3) gene
on chromosome 10q and the voltage-gated calcium channel
Linkage analysis is performed on families with more than one gene CACNA1C on chromosome 12 suggesting that variation
affected relative whereas association studies compare the fre- in neuronal ion channels may have a fundamental role in bipo-
quency of alleles of a DNA marker in populations of patients lar disorder (Ferreira et al 2008). Two GWAS in schizophrenia
and healthy controls. In the search for genes these two have independently led to the discovery of an important role
approaches are entirely complementary. The idea of associa- for rare microdeletions and microduplications (called copy
tion is simpler and is well known through the long-established number variations or CNVs) in schizophrenia (International
association between HLA subtypes and some common dis- Schizophrenia Consortium 2008; Stefansson et al 2008). Both
eases, including diabetes, rheumatoid arthritis and ankylosing studies confirmed an already well-described link between
spondylitis. When one allele of a DNA polymorphism, for schizophrenia and the 22q11.2 deletion syndrome (also called
example a SNP or a microsatellite marker, is found more com- velocardio facial syndrome) (Karayiorgou et al 1995) but also
monly in a disease population than controls the marker and identified novel regions with deletions and duplications asso-
disease are said to be associated or in ‘linkage disequilibrium’. ciated with schizophrenia on chromosomes 1q and 15q. The
This could occur because the polymorphic marker itself is a frequency of CNVs greater than 100 kb in length considered
variant that directly influences the phenotype. One example rare occurring in less than 1% of the population was greater
is the association of late-onset Alzheimer’s disease with in schizophrenia than controls. This observation is a major step
ApoE4. The frequency of ApoE4 is about 0.4 in individuals towards understanding the nature of genetic susceptibility
with Alzheimer’s disease compared with 0.15 in controls. This

154

Genetics in relation to psychiatry CHAPTER 8

factors in common psychiatric illnesses by demonstrating that inhibitor PDE4B which interacts with DISC1 (Millar et al
substantial contributions to risk of illness come from both 2005);
common and rare variants.
• the analysis of candidate genes including COMT, PRODH2
Cytogenetic studies and the G-protein-coupled receptor kinase 3 (GRK3) in
the region on chromosome 22 deleted in the
Cloning disrupted genes from rare chromosomal rearrange- velocardiofacial syndrome;
ments has been a very fruitful approach for a wide variety of
inherited neurological conditions because abnormalities of • the discovery of DIBD1 (Disrupted in Bipolar Disorder 1),
chromosomes can precisely pinpoint the position of disrupted a mannosyltransferase gene disrupted by a translocation
genes. Examples of the success of this approach are: breakpoint on chromosome 11 is one of the first novel
• the discovery of the novel genes DISC1 and DISC2 at the genes described thought to have a role in bipolar disorder
(Baysal et al 2002); and
breakpoint of a translocation on chromosome 1 in a family
with schizophrenia and the role of the phosphodiesterase • the discovery of a role for the glutamate receptor gene GRIK4
and the transcription factor gene NPAS3 in schizophrenia and
bipolar disorder following the examination of chromosomal
breakpoints (Pickard 2008, 2009).

Further reading

Alberts, B., Johnson, A., Lewis, J., et al., 2002. Nussbaum, R.L., McInnes, R.R., Willard, H.F. Baltimore, MD and National Center for
Molecular biology of the cell, fourth ed (Eds.), 2007. Thompson & Thompson: Biotechnology Information, National Library
Garland Science – Taylor & Francis Group. Genetics in medicine, seventh ed. Saunders/ of Medicine, Bethesda, MD. http://www.
Elsevier, Philadelphia. ncbi.nlm.nih.gov/omim/.
Haines, J.L., Pericak-Vance, M.A. (Eds.),
Genetic analysis of complex disease, Online Mendelian Inheritance in Man, OMIM Ott, J., 1999. Analysis of human genetic linkage,
second ed. (TM). McKusick-Nathans Institute of Johns Hopkins University Press, Baltimore.
Genetic Medicine, Johns Hopkins University,

References

Abkevich, V., Camp, N.J., Hensel, C.H., et al., disorder are implausible. Am. J. Med. Genet. chromosome 22q11. Proc. Nat. Acad. Sci.
2003. Predisposition locus for major 74, 18–20. U. S. A. 92, 7612–7616.
depression at chromosome 12q22-12q23.2.
Am. J. Hum. Genet. 73, 1271–1281. Ferreira, M.A., O’Donovan, M.C., Meng, Y.A., Kendler, K.S., Neale, M.C., Kessler, R.C., et al.,
Jones, I.R., et al., 2008. Collaborative 1992. Major depression and generalized
Badner, J.A., Gershon, E.S., 2002. Meta-analysis genome-wide association analysis supports a anxiety disorder: same genes (partly)
of whole-genome linkage scans of bipolar role for ANK3 and CACNA1C in bipolar different environments? Arch. Gen.
disorder and schizophrenia. Mol. Psychiatry disorder. Nat. Genet. 40 (9), 1056–1058. Psychiatry 49, 716–722.
7, 405–411.
Gottesman, H., 1991. Schizophrenia genesis: the Kendler, K.S., Gruenberg, A.M., Kinney, D.K.,
Baysal, B.E., Willett-Brozick, J.E., Badner, J.A., origins of madness. Freeman, New York. 1994. Independent diagnoses of adoptees and
et al., 2002. A mannosyltransferase gene at relatives as defined by DSM-III in the
11q23 is disrupted by a translocation Gottesman, H., Shields, J., 1972. Schizophrenia provincial and national samples of the Danish
breakpoint that co-segregates with bipolar and genetics: a twin study vantage point, Adoption Study of Schizophrenia. Arch.
affective disorder in a small family. Academic Press, New York. Gen. Psychiatry 51, 456–468.
Neurogenetics 4, 43–53.
Guy, J., Gan, J., Selfridge, J., Cobb, S., Bird, A., Kety, S.S., 1976. Studies designed to disentangle
Blackwood, D.H., Fordyce, A., Walker, M.T., 2007. Reversal of neurological defects in a genetic and environmental variables in
et al., 2001. Schizophrenia and affective mouse model of Rett syndrome. Science 315 schizophrenia: some epistemological
disorders – cosegregation with a translocation (5815), 1143–1147. questions and answers. Am. J. Psychiatry
at chromosome 1q42 that directly disrupts 133, 1134–1137.
brain-expressed genes: clinical and P300 Haines, J.L., Pericak-Vance, M.A., 1998.
findings in a family. Am. J. Hum. Genet. Approaches to gene mapping in complex Kety, S.S., Wender, P.H., Jacobsen, B., et al.,
69, 428–433. human diseases, John Wiley & Sons, New 1994. Mental illness in the biological and
York. adoptive relatives of schizophrenic adoptees.
Botstein, D., Risch, N., 2003. Discovering Replication of the Copenhagen Study in the
genotypes underlying human phenotypes: Heston, L.L., 1966. Psychiatric disorders in rest of Denmark. Arch. Gen. Psychiatry 51,
past successes for mendelian disease, future foster home reared children of schizophrenic 442–455.
approaches for complex disease. Nat. Genet. mothers. Br. J. Psychiatry 112, 819–825.
33 (suppl), 228–237. Kringlen, E., Cramer, G., 1989. Offspring of
International Schizophrenia Consortium, 2008. monozygotic twins discordant for
Chubb, J.E., Bradshaw, N.J., Soares, D.C., et al., Rare chromosomal deletions and duplications schizophrenia. Arch. Gen. Psychiatry 46,
2008. The DISC locus in psychiatric illness. increase risk of schizophrenia. Nature 455, 873–877.
Mol. Psychiatry 13, 36–64. 237–241.
Levinson, D.F., Evgrafov, O.V., Knowles, J.A.,
Coyle, J.T., 2009. MicroRNAs suggest a new Jaenisch, R., Bird, A., 2003. Epigenetic et al., 2007. Genetics of recurrent early-onset
mechanism for altered brain gene expression regulation of gene expression: how the major depression (GenRED): significant
in schizophrenia. Proc. Natl. Acad. Sci. U. S. A. genome integrates intrinsic and linkage on chromosome 15q25-q26 after fine
106 (9), 2975–2976. environmental signals. Nat. Genet. 33 mapping with single nucleotide
(Suppl.), 245–254. polymorphism markers. Am. J. Psychiatry
Craddock, N., Van Eerdewegh, P., Reich, T., 164, 259–264.
1997. Single major locus models for bipolar Karayiorgou, M., Morris, M.A., Morrow, B.,
et al., 1995. Schizophrenia susceptibility
associated with interstitial deletions of

155

Companion to Psychiatric Studies

Lewis, C.M., Levinson, D.F., Wise, L.H., et al., schizophrenia. Hum. Mol. Genet. 9, Sullivan, P.F., Kendler, K.S., Neale, M.C., 2003.
2003. Genome scan meta-analysis of 1415–1423. Schizophrenia as a complex trait: evidence
schizophrenia and bipolar disorder, part II: from a meta-analysis of twin studies. Arch.
schizophrenia. Am. J. Hum. Genet. 73, Moazed, D., 2009. Small RNAs in transcriptional Gen. Psychiatry 60, 1187–1192.
34–48. gene silencing and genome defence. Nature
457 (7228), 413–420 [Review]. Sullivan, P.F., Neale, M.C., Kendler, K.S., 2000.
Lichtenstein, P., Yip, B.H., Bjork, C., et al., Genetic epidemiology of major depression:
2009. Common genetic determinants of Mortensen, P.B., Pedersen, C.B., Melbye, M., review and meta-analysis. Am. J. Psychiatry
schizophrenia and bipolar disorder in et al., 2003. Individual and familial risk 157, 1552–1562.
Swedish families: a population-based study. factors for bipolar affective disorders in
Lancet 373, 234–239. Denmark. Arch. Gen. Psychiatry 60, Suzuki, M.M., Bird, A., 2008. DNA methylation
1209–1215. landscapes: provocative insights from
Lupski, J.R., 2007. Structural variation in the epigenomics. Nat. Rev. Genet. 9, 465–476.
human genome. N. Engl. J. Med. 356, Muir, W.J., Pickard, B.S., Blackwood, D.H.,
1169–1171. 2008. Disrupted-in-Schizophrenia-1. Curr. Tienari, P., Wynne, L.C., Moring, J., et al., 2000.
Psychiatry Rep. 10 (2), 140–147. Finnish adoptive family study: sample
Mao, Y., Ge, X., Frank, C.L., et al., 2009. selection and adoptee DSM-III-R diagnoses.
Disrupted in schizophrenia 1 regulates Pickard, B.S., Knight, H.M., Hamilton, R.S., Acta. Psychiatr. Scand. 101, 433–443.
neuronal progenitor proliferation via et al., 2008. A common variant in the 3’UTR
modulation of GSK3beta/beta-catenin of the GRIK4 glutamate receptor gene Venken, T., Del-Favero, J., 2007. Chasing genes
signaling. Cell 136, 1017–1031. affects transcript abundance and protects for mood disorders and schizophrenia in
against bipolar disorder. Proc. Natl. Acad. genetically isolated populations. Hum. Mutat.
McGue, M., Gottesman, H., 1989. A single Sci. U. S. A. 105 (39), 14940–14945. 28, 1156–1170.
dominant gene still cannot account for the
transmission of schizophrenia. Arch. Gen. Pickard, B.S., Christoforou, A., Thomson, P.A., Watson, J.D., Crick, F.H., 1953. Molecular
Psychiatry 46, 478–480. et al., 2009. Interacting haplotypes at the structure of nucleic acids; a structure for
NPAS3 locus alter risk of schizophrenia and deoxyribose nucleic acid. Nature 171,
McGuffin, P., Owen, M.J., Gottesman, H., bipolar disorder. Mol. Psychiatry 14 (9), 737–738.
2002. Psychiatric genetics and genomics, 874–884.
Oxford University Press, Oxford. Wynne, L.C., Tienari, P., Nieminen, P., et al.,
Rice, J., Reich, T., Andreasen, N.C., et al., 1987. 2006. I. Genotype-environment interaction
McKechanie, A.G., Muir, W.J., 2009. Can The familial transmission of bipolar illness. in the schizophrenia spectrum: genetic
epigenetics help in the discovery of Arch. Gen. Psychiatry 44, 441–447. liability and global family ratings in the
therapeutics for psychiatric disorders, Finnish Adoption Study. Fam. Process 45,
especially schizophrenia? Expert Opinion in Segurado, R., Detera-Wadleigh, S.D., 419–434.
Drug Discovery (In press Due May 2009). Levinson, D.F., et al., 2003. Genome scan
meta-analysis of schizophrenia and bipolar Zubenko, G.S., Maher, B., Hughes 3rd, H.B.,
Millar, J.K., Pickard, B.S., Mackie, S., et al., disorder, part III: bipolar disorder. Am. J. et al., 2003. Genome-wide linkage survey for
2005. DISC1 and PDE4B are interacting Hum. Genet. 73, 49–62. genetic loci that influence the development
genetic factors in schizophrenia that regulate of depressive disorders in families with
cAMP signaling. Science 310, 1187–1191. Stefansson, H., Rujescu, D., Cichon, S., et al., recurrent, early-onset, major depression. Am.
2008. Large recurrent microdeletions J. Med. Genet. B Neuropsychiatr. Genet.
Millar, J.K., Wilson-Annan, J.C., Anderson, S., associated with schizophrenia. Nature 455, 123B, 1–18.
et al., 2000. Disruption of two novel genes by 232–236.
a translocation co-segregating with

156

Research methods, statistics 9
and evidence-based practice

Andrew M McIntosh Michael Sharpe Stephen M Lawrie

Introduction such as height are measured to the nearest unit and are actu-
ally discrete variables. Discrete data can be further subdivided
Much of contemporary medicine remains on comparatively into four types according to their properties:
shaky scientific foundations, and it is only through carefully
conducted study that we will be able to further improve on • nominal or categorical;
what we can currently offer our patients. That is why we need
research. Laboratory-based basic science and population-based • ordinal;
epidemiology aim to deliver a greater understanding of the
causes of disease, with the prospect of earlier detection, better • interval; and
treatment and prevention. Therapeutic and health services
research seeks to refine currently available treatments, opti- • ratio.
mise their delivery to those most likely to benefit and be able
to test new therapies as they arise. Progress may seem slow, Nominal or categorical data have values which are qualita-
but that is inevitable if the science is to be rigorously tively different from one another but have no particular order
conducted and evaluated. The application of epidemiological (e.g. diagnosis). When there are only two possible values
and statistical principles to everyday clinical decisions used to (e.g. gender), nominal data are sometimes also referred to as
be described as clinical epidemiology (Sackett et al 2005) dichotomous. Ordinal data have a logical order in their values,
and has led to the development and implementation of but the differences between each of the values are not neces-
evidence-based medicine. In this chapter, we shall describe sarily equal and the measurement has no true zero point
the principles of research measurement, the architecture of (e.g. social class I–V, ward observation level). Interval and ratio
clinical research, the statistics used to interpret the results data have very similar properties; in both, the differences
from these studies and finally the application of these findings between values are equal (e.g. the difference between a value
to clinical practice. of 1 and 2 is the same as the difference between 33 and 34).
Ratio data have the additional property of having a true zero
Measurement point (i.e. a complete absence of the quantity), though this dis-
tinction is rarely important. Distinguishing the various data
Types of data types may seem like an irrelevant abstraction, but is essential
in determining the correct method of data description and anal-
Measurement is key to research. Measurement produces data, ysis. Discrete interval, discrete ratio and continuous data can be
which may be either discrete or continuous. Discrete data can analysed using a group of related techniques called parametric
only take a limited set of possible values (e.g. socioeconomic statistics (see below). Data which are nominal or ordinal are
status, sex) whereas continuous data have a potentially infinite analysed using non-parametric or distribution-free methods.
series of values (e.g. height, temperature). In practice, the
accuracy of measurement usually means that continuous data A further distinction is made between variables that are
dependent and those that are independent. Dependent variables
are the characteristics we wish to investigate, whereas an inde-
pendent variable refers to the data classification. For example,
in a study where one compared the whole brain volume of
patients with various psychiatric diagnoses, brain volume
would be the dependent variable and diagnosis would be the
independent variable.

ã 2010, Elsevier Ltd.
DOI: 10.1016/B978-0-7020-3137-3.00009-7

Companion to Psychiatric Studies

Validity and reliability of measurement For more information about Cronbach’s alpha the reader is
referred to Bland & Altman (1997). For further information
• The validity of an instrument refers to whether it measures about the validity of scales the reader should consult Bland &
what it purports to measure (Box 9.1). Altman (2002).

• The reliability (or reproducibility) of a measure is based on Reliability
whether repeated measurements of the same data give
similar results (Box 9.2). Researchers and clinicians often wish to obtain data from
patients. Sometimes obtaining the information may be a long
There are several methods of determining the reliability and and protracted process (e.g. full clinical interview). On other
validity of a measure. occasions it may entail the risk of harm to the patient (e.g.
direct measurement of the CSF levels of drugs). In order to
Validity obtain the information more quickly, or with less risk to the
patient, new methods may emerge whose level of agreement
Validity of measurement can be assessed in many ways, but with the old techniques needs to be assessed. Measurements
perhaps the simplest form of validity is when something ‘looks using a given technique should also be repeatable from
right’. When a scale ‘looks right’ on inspection then it is said to researcher to researcher. For example, during a 10-year cohort
have face validity. A related measure of validity is content study of people with schizophrenia it would be unlikely for the
validity. This refers to the individual items or content of a same researchers to make all the measurements. We therefore
scale. For example an anxiety scale containing many items need to be satisfied that, given the same material, the
about psychosis has poor content validity. researchers would have a high level of agreement on its mea-
surement. This form of reliability is sometimes known as
With a new measurement technique there is often an exist- inter-rater reliability. The method of quantification of reliabil-
ing ‘gold standard’ measure against which it can be compared. ity between different raters (inter-rater) or between different
If the new measurement gives consistently similar results to techniques depends on the data type. For categorical data (e.g.
the ‘gold standard’ it can be said to be valid. This form of whether someone has schizophrenia or not) the reliability of
validity is called criterion validity, which can be further sub- the results is usually measured using the kappa (k) statistic.
divided according to whether the measures give similar results
at one point in time (‘concurrent validity’) or one predicts the This method can be extended to situations where diagnoses
other (‘predictive validity’) in the future (Last 1995). If there or measurements are made in three or more ranked categories
is no clear gold standard then one has to compare the new (e.g. no, borderline and definite diagnoses of mental illness), in
measure with another existing measure of uncertain validity which case kappa can be modified, or weighted, to account
to see if it gives the results one could reasonably expect. For for partial agreement. The interested reader should consult
example, a new scale to measure depression might be com- Streiner & Norman (1996) for further details.
pared to others. This property of having an appropriate rela-
tionship with other measures is known as construct validity. Quantification of the reliability of interval, ratio or continu-
A final measure of the validity of a scale is whether the items ous measurements uses other methods. The most misused
on a scale are related to one another. This is known as internal method is to consider the association or correlation between
consistency and can be measured using Cronbach’s alpha. the ratings of two or more observers using the Pearson or
Spearman correlation coefficient. This approach is unsuitable
Box 9.1 because one investigator may consistently under-rate or over-
rate compared with his colleague, and yet the degree of
Types of validity correlation between their scores could be very high. A useful
alternative to this approach is to use the intra-class correlation
Face validity coefficient (ICC). The ICC distinguishes the bias in the mea-
Content validity surement from random variation of the instrument and pro-
Criterion validity – concurrent / predictive duces a more accurate and representative value of reliability.
Construct validity It can also be adapted to situations where there are several
Internal consistency observers and can be easily calculated from an analysis of
variance (ANOVA) table (Streiner & Norman 1996).
Box 9.2
While no statistical test is a substitute for clear graphical
Reliability representation, a common error is to represent agreement
between two or more investigators by plotting the observations
Can be between methods or raters (inter-rater) or within the on a graph where the values given by each investigator are
same instrument or rater repeated on more than one occasion represented on each axis. This approach is correlational and
(intra-rater, test–retest) can lead to misleading results. A more informative approach
is to plot the difference between the investigators’ measure-
Can be graphically represented on a Bland–Altman plot ments against the mean of the measurements (Fig. 9.1). Plot-
Should not be measured by a simple correlation between scores ting the difference against the average will demonstrate
whether the difference between the two methods is gener-
ally very small or very large. A mean difference between the

158

Research methods, statistics and evidence-based practice CHAPTER 9

5.57267Difference there is only one group sometimes also come under this
heading.
–7.57267 • Analytical observational studies – these generally compare
two subject groups, cases and controls, and are suitable for
11 47.5 hypothesis testing. Ideally, the study and control groups
Average are similar in all but the characteristic of main interest,
i.e. demographically similar, but with and without disease
Fig. 9.1 Bland–Altman plot showing the differences between two (in case–control studies) or having been subject to a
measurements. particular exposure or not (in cohort studies).
• Experimental studies – these can be used to directly infer
methods can be calculated and, in situations where the mean causation or to assess the effect of treatment, as something
difference is not zero, whether or not one method gives sys- is given or done to an experimental group but not to a
tematically higher or lower scores than the other can be evalu- control group (of patients). Controlled trials of treatment
ated. Limits of agreement can be added around the average are clinical experiments.
difference (as d Æ 2s) as a further refinement by calculating
the mean difference (d) and its standard deviation (s). When These distinctions are, however, not absolute. Some audits
these limits of agreement do not incorporate any clinically and surveys include control groups, while some cohort studies
meaningful difference, the two methods may be used inter- (e.g. of the prognosis of a disease) do not. Similarly, some so-
changeably. Further information about this approach is available called ‘clinical trials’ are not actually experiments but merely
from Bland & Altman (1986). uncontrolled case series. There is also another type of research
that does not fit easily into this scheme – so-called secondary
From the simulated example, illustrated in Fig. 9.1, of two research – that is, systematic reviews and meta-analyses of
raters both using the same 60-point scale, the limits of agree- existing research findings. These can generate the most reliable
ment (reference range for difference) are À7.57 to 5.57, and results of all (see below) and are increasingly used to inform
the mean difference ¼ À1.000. Since the reference range for the development of clinical guidelines.
the difference includes differences of up to 7 points on a
60-point scale, the inter-rater agreement may be too unsatis- Descriptive studies
factory for the results of one rater to be considered inter-
changeable with that of the other. Similar techniques can be Case reports and case series
used to assess the reliability of a method repeated twice or
more on the same material. As the variability between mea- These are simple descriptions of clinical observations, some-
surements is likely to be less, some of the sources of variation times with results of investigations, in one or a small group
have been removed, and the methods above will need to be of patients. The patients tend to be opportunistically identi-
adapted for this purpose. fied from a single location, a process subject to all kinds of
selection biases and hence the results are likely to be unrepre-
In psychiatric research reliable diagnosis can be problematic sentative. Furthermore, in the absence of a control group
(see Chapter 1). This is because there are currently no con- observed associations may be coincidental (Grimes & Schulz
firmatory laboratory investigations, clinical practice varies 2002). For example, it was quite common 10–15 years ago
and several different diagnostic criteria exist. Structured clini- to see reports of two or more ‘specific’ delusional syndromes
cal interviews based on standard diagnostic criteria have in a particular patient when the patient may simply have had
greatly improved the reliability of psychiatric diagnosis (e.g. an unusual case of schizophrenia. Case studies are susceptible
the Present State Examination or PSE, the Structured Clinical to misdiagnosis, bias and measurement error. This often makes
Interview for DSM or SCID) and symptom severity (e.g. the them unreliable and unrepeatable. Nevertheless, case reports
Positive and Negative Symptoms Scale or PANSS). have had, and continue to have, an important role. Many of
the clinical entities we still recognise today were initially
Types of clinical research described in this way. For rare diseases the description of
the common features in a number of cases may be the best
There are essentially three classes of clinical research available evidence. It is also possible to seek out particular
(Feinstein 1985): cases to disprove general theories – if, for example, one or
• Descriptive observational studies – these usually describe preferably more patients have a disease but not the proposed
risk factor, that factor can be ruled out as a general cause
the characteristics of a clinical population without a control (Farmer 1999). Uncontrolled trials of a series of cases may
group, and are, therefore, suitable only for hypothesis obviate further study in clinical trials. If trials to evaluate a
generation rather than testing. Examples include case new drug have included a total of 1000 patients, an adverse
reports and series, audit and surveys. Cohort studies where reaction which occurs in 0.1% of treated patients may well
not have occurred at all by chance alone. Hence, uncommon
and even some common adverse effects are more likely to
be detected in uncontrolled postmarketing surveillance.

159

Companion to Psychiatric Studies

Box 9.3 tools. It is regrettable that more effort has not been made to
systematically address these problems.

Characteristics, strengths and limitations of case Surveys
reports and series
Survey is an ambiguous word, being merely ‘the systematic
Describe the clinical features or outcome in one or more patients collection of information’ (Last 1995). Indeed, one sees at
least three different uses of the word in the medical literature.
Strengths It is relatively common to see the term ‘questionnaire survey’,
particularly in accounts of often hastily cobbled together ques-
Arise from clinical practice tions being sent out in the hope of eliciting opinions, and usu-
Quick and easy ally obtaining low, uninformative response rates. The term
May have important general implications survey also applies to more rigorous epidemiological surveys
to determine the prevalence of a disease and other types
Limitations of cross-sectional study. These should not be confused with
surveillance studies, e.g. the active or passive monitoring of
Apparent associations may be due to coincidence adverse drug effects in phase IV ‘clinical trials’. The key
Non-systematic retrospective studies feature of cross-sectional studies is that they simultaneously
Unreliable results: misdiagnosis, measurement error, bias and ascertain the presence or absence of disease and the presence
or absence of an exposure, at one particular point in time.
confounding (see below) They are, therefore, studies of prevalence or frequency and
sometimes called so, and cannot reliably distinguish between
However, case reports and case series should generally be cause and effect.
regarded only as means of identifying priorities or hypotheses
for further and more comprehensive study (Box 9.3). It is Surveys need to pay particular attention to potential biases
worth noting that clinical experience is essentially a series of in population sampling. In an attempt to ensure that the
extended case series, in which one is most likely to remember population studied is representative, researchers may take a
the most dramatic successes and failures (an example of random sample from, for example, census data, the electoral
so-called recall bias). roll or even a telephone directory. All of these are probably
better than attempting to accost people in the street, but will,
Clinical audit of course, miss those who move frequently, do not pay their
‘community charge’ or do not have a telephone – and psychiat-
Audit, as the ‘examination of the extent to which a condition, ric patients are probably over-represented in each category.
process or performance conforms to pre-determined standards Patients will be difficult to contact at home if they are
or criteria’ (Last 1995), is not strictly speaking a research currently hospitalised or in prison. Making false associations
method, but is included here as it typically uses research through ‘sampling bias’ is known as Berkson’s fallacy (or bias).
methods and arguably should be done routinely by every med- For example, the original description that hospitalised
ical practitioner. The key features include the measurement of schizophrenic patients tended not to have epilepsy led to the
clinical performance against a standard, some attempt to introduction of ECT, whereas it is now clear that people with
improve this performance and then a re-audit, to ‘close the schizophrenia are, if anything, more likely to suffer from
audit cycle’, to assess any improvement. Whilst it may be epileptic fits than the general population – the sampling bias
argued, however, that differences between patients before presumably arising because people with both epilepsy and
and after some initiative prove its effect, that is not necessarily schizophrenia either had their schizophrenia unrecognised or
so; such ‘historical controls’ are notoriously prone to bias as were treated for epilepsy in different hospitals.
there are so many other influences on medical services,
patients’ well-being and outcomes that may vary over time. Close attention must also be paid to how potential partici-
The same concerns are pertinent in so-called ‘mirror image’ pants are approached. In large studies, such as epidemiological
studies of the effects of treatments in what are essentially surveys of rare illnesses, it is standard to send out postal ques-
uncontrolled trials. tionnaires because interviewing everybody would be too costly.
Edwards et al (2002) systematically reviewed randomised
Almost any aspect of clinical care can be audited, namely: controlled trials to compare methods of maximising response
rates to postal questionnaires; they found that response was
• service structure, e.g. personnel, equipment; approximately doubled when questionnaires were sent out by
• a process of care, e.g. note-keeping, investigation use and recorded delivery, were short, were designed to be of interest
to participants and when monetary incentives were offered.
utility, treatments; and Response rates were also increased by such factors as persona-
• clinical outcomes, e.g. morbidity, mortality. lising questionnaires and letters, using stamped returned envel-
opes, contacting participants before sending questionnaires,
Clinical outcomes are arguably the most important, as follow-up contact, providing non-responders with a second
they measure local clinical effectiveness, but can be difficult questionnaire and even using coloured ink. Questionnaires that
to define and measure. Outcomes are rarely systematically
recorded in hospital records. Current drives to national
bench-marking, hospital performance league tables and even
routine outcome measurement are usually biased by the case
mix, different practices in particular hospitals, and the diffi-
culties in selecting and reliably rating appropriate assessment

160

Research methods, statistics and evidence-based practice CHAPTER 9

included sensitive questions were less likely to be returned. information are referred to excellent introductory (Hennekens &
Consideration of these issues is crucial to obtaining an ade- Buring 1987) or more advanced (Feinstein 1985) epidemiology
quate response rate (i.e. at least 70% and preferably 80%). textbooks, to a series of recent articles in the Lancet (Grimes &
Telephone or personal contact for relatively brief interviews Schulz 2002) or to Lawrie et al (2000).
are alternatives that may be more suitable for patients with
psychiatric illness. Analytical studies

A particular type of cross-sectional survey, the ecological Case-control studies
or correlational study, avoids the potential problems of
response bias by examining pre-existing data on exposures In a case-control study, individuals with a particular condition
and outcomes in populations rather than individuals. Usually, or disease (the cases) are selected and compared with indivi-
routinely collected data on disease rates is compared with data duals without the condition or disease (the controls). The
on the general level of exposure between particular geographi- case–control comparison is the frequency of previous expo-
cal regions (or in regions over time in ‘secular trend analysis’ or sures or attributes potentially relevant to the development of
‘time series studies’). If two or more computerised records the condition under study. Therefore, disease status is deter-
are used, this is known as ‘record linkage’. These approaches mined before exposure status (Fig. 9.2). The development
obviously have the additional advantages of being relatively and increasing use of case-control studies, such that they are
cheap and quick, as well as the more dubious advantage of now the most common type of study published in medical
being able to examine an almost limitless range of potential journals, has largely followed their initial use in cancer epide-
associations. The price to pay is the inability to link exposure miology. Their relatively simple approach can identify impor-
to outcome in individuals or to control adequately for con- tant effects. They are particularly suitable in rare diseases
founding factors. In other words, an association discovered in or those that take a long time to develop, as in most psychiat-
an ecological study may not apply at an individual level and ric disorders. Many potential exposures can be investigated
may arise from confounding by other factors. Incorrect con- simultaneously (Box 9.4).
clusions from associations in ecological studies are called the
‘ecological fallacy’. For example, deprived inner city areas have They do, however, have many problems that limit their
a high proportion of Afro-Caribbean residents and higher than value. Paramount amongst these is the appropriate identifica-
average rates of suicide, but individual Afro-Caribbeans do not tion and recruitment of the cases and controls. Case-control
appear to. The association probably occurs because inner cities studies are population based, and participants should be (but
have more people from ethnic minorities and more people frequently are not) representative of their respective popula-
who live alone, and only the latter are at increased risk of tions. All too often, patients are recruited as a ‘convenience
suicide. These false associations can sometimes be corrected sample’ of hospital-based patients who are prepared to take
for statistically but depend upon the pertinent information part. They are then compared with a similar convenience con-
having been collected. Where it is not, and indeed in general, trol sample of ‘willing volunteers’, e.g. hospital workers. The
any such associations need to be confirmed in analytical cases and controls will, therefore, differ in a very lengthy list
studies. of attributes and exposures (wealth, intelligence, adversity,
etc.). This may lead to spurious associations being observed.
In summary, descriptive observational research is compara- For example, early structural brain imaging studies in schizo-
tively quick and easy, and useful to develop new hypotheses, phrenia were particularly likely to find case-control differences
but cannot reliably test hypotheses or discern causal relation-
ships. The main features of the individual types of study are
summarised in Table 9.1. Those interested in further

Table 9.1 Summary features of descriptive observational studies

Type of study Essential features Advantages Disadvantages

Case report Observations in a single case Cheap and easy way of generating Liable to coincidence, error, bias and
hypotheses confounding
Case series Disease characteristics in a number May be the best information on very rare No comparison group, so cannot test
of cases diseases hypotheses
Gives information on service delivery Unreliable estimate of effectiveness
Audit Examines service provision and
outcome Can illuminate complex issues May be unreliable
Identifies patterns of disease Cannot distinguish cause and effect
Qualitative study Elicits opinion Can use prerecorded data Describes populations rather than
individuals
Cross-sectional study Measures rates of disease

Ecological study Measures associations of disease

161

Companion to Psychiatric Studies

Case-control study comes from studies of the association between life events
and depression, where depressed mood may also influence
Exposure Disease/Not the recall of particular events. For this reason, life events
researchers now focus on so-called ‘independent events’ such
? as bereavement, and especially those that can be verified as
preceding the outcome, to avoid the possibility of ‘reverse
? causality’ (see below). A similar bias can occur in the research-
ers (so-called ‘observer bias’) if they are not blind to whether
Prospective cohort study Disease the patient is a case or control. Some of these problems can
Exposed/Not be reduced, but not avoided entirely, by careful measure of
? exposure status.

? One of the common methods for dealing with selection
bias is to match cases and controls on one or more important
Fig. 9.2 In a case-control study disease status is established before characteristics such as gender. This has the advantage of
exposure status; vice-versa in a cohort study. improving the power of the study, but runs the risk of seri-
ously limiting the number of suitable subjects. It can also cause
Box 9.4 ‘over-matching’ and thereby reduce the ability to find associa-
tions, and makes it impossible to examine the effects of the
Case-control studies matched variables statistically. Similarly, it is usually better
to measure any potential confounder and to correct for it sta-
Compare cases and controls on the relative frequencies of one or tistically unless its effect is particularly large and can only be
more exposure quantified with difficulty. Either approach, however, requires
that the investigators know in advance what the confounders
Strengths are or might be.

Relatively quick and inexpensive For these reasons, exposures are, in general, more reliably
Suitable for rare diseases related to particular diseases in prospective cohort studies.
Can evaluate distant and multiple exposures Schulz & Grimes (2002a) have, however, recently suggested
five guidelines for investigators planning (or appraising) case-
Limitations control studies:

Unsuitable for rare exposures • explicitly define case diagnosis and eligibility criteria;
Susceptible, in particular, to selection bias • select controls from the same population as cases,
Susceptible to recall bias and reverse causality
independent of the exposure of interest;
because the patients tended to be males with relatively sev- • blind the data gatherers to case or control status (or at least
ere illnesses, whilst controls were hospital staff who were
‘supernormal’. In such instances, case-control studies may find to the main hypothesis of the study);
factors that relate to the severity of a disease rather than its • train data gatherers to elicit exposure in a similar manner
aetiology. These types of selection bias can be minimised by
true random sampling of, for example, all patients and controls from cases and controls, e.g. using memory aids to facilitate
in the community, or ‘nesting’ a case-control study within a and balance recall in both; and
cohort study. Unfortunately these techniques make the study • address confounding factors, either in the design stage or
more difficult and expensive. Sampling from cases who are during analysis.
in contact with medical services is the most practical and
common method but demands that researchers pay careful Cohort studies
attention to potential sources of bias (Lewis & Pelosi 1990).
This selection bias can be compounded by response bias if only Cohort studies first classify subjects according to whether
some of the patients and controls choose to participate. or not they have been exposed to a suspected risk factor, or
have a particular attribute, and then follow the exposed and
Case-control studies also have to deal with various types of non-exposed for a period of time (often years) to compare
‘information bias’ as the exposure is being assessed retrospec- the frequency of an outcome such as disease (see Fig. 9.2).
tively. In particular, if subjects are asked whether or not they Prospective follow-up obviously needs to be of sufficiently
have been exposed, ‘recall bias’ commonly arises because those long duration, have sufficient subjects develop the condition
affected ‘search after meaning’ in seeking an explanation for of interest and not lose too much information from dropouts
their illness. This can apply equally to relatives or carers. to reliably test the effect of exposure on the development of
Perhaps the best example of this in the psychiatric literature the disease. The principal advantage of cohort studies is that
it is clear that the exposure predated the onset of disease,
and this can be measured without any bias in relation to
disease status. For this reason, it can be argued that poorly
understood diseases should first be evaluated in cohort studies,
and once potential confounders have been reliably identified,

162

Research methods, statistics and evidence-based practice CHAPTER 9

case-control studies can then proceed more economically Box 9.5
(although the reverse is usually the case). Cohort studies also
have the ability to study multiple possible outcomes from a Summary of the essential features of cohort studies
single exposure. They are, however, expensive and time-
consuming, usually include relatively few cases and are unsuit- Typically, those exposed or not are identified and then followed-up
able for studying rare diseases (unless the outcome is very to compare frequency of subsequent disease
common amongst those exposed, i.e. the attributable risk
percentage is high). Strengths

It should be noted that cohort studies do not necessarily Ideal for rare exposures
need a control group. Even in aetiological studies, it is possible Can examine many outcomes of a single exposure
to follow-up exposed subjects (e.g. on the basis of genetic risk) Generally less liable to bias than other observational studies,
with a plan to conduct a nested case-control study of those
who do and do not develop the outcome of interest. This will, particularly recall bias and reverse causality
however, be unable to examine the effect of the selected
exposure other than in a ‘dose–response’ analysis or as inter- Limitations
actions with other putative risk factors. More commonly, the
prognosis of a disease or the adverse events associated with a Can be expensive and lengthy
particular treatment have no need of a control group. Such Usually unsuitable for rare diseases
studies do, however, have to be sufficiently large and lengthy Losses to follow-up can affect validity
to be of value, and in particular, they need to consist of repre-
sentative samples of patients with a particular disorder or Only (randomised) controlled trials are truly experimental.
exposed to a particular treatment. It is also preferable that they The randomised control trial (RCT) is the gold standard of
commence from first onset or first treatment, respectively, to clinical medical therapeutic experimentation, but there are
avoid selecting potentially atypical cases. an increasing number of variants which we will describe here
(Table 9.2) once we have considered some general issues
It is possible to conduct retrospective cohort studies, where (Everitt & Wessely 2003).
the cohort has already been exposed and has developed the
disease or not. These are particularly useful if the latent period General issues
between exposure and disease is long, where researchers
would be likely to lose contact with large numbers of partici- What is the aim of the study?
pants in a prospective study. This approach has, for example,
been adopted to study neurodevelopment in children who go The question addressed needs to be clearly formulated and
on to develop psychotic disorders, using data collected from clinically relevant (i.e. the answer would help clinicians and
national surveys of child development and linking them to their patients to make therapeutic decisions), ethically accept-
registers of hospital admission. Such studies are relatively easy able and not have been satisfactorily answered already (e.g. in a
to do, but obviously depend upon the existence of good quality systematic review or meta-analysis).
data. This data has usually been collected for other purposes
and may be incomplete or deficient in information on potential Which patients are to be studied?
confounders.
Are they representative of the population being studied? For
Cohort studies are mainly liable to two types of bias. First, example, many trials in psychiatry have excluded patients with
participants who drop out of the study or move away and comorbid substance abuse with the result that the study
become uncontactable are likely to differ from those who do results may be inapplicable to many of the patients that psy-
not, often because they have developed the condition of inter- chiatrists see in practice. Similarly excluding involuntarily
est. As a general rule, follow-up rates of less than 80% are detained patients has the effect that psychiatrists treating the
liable to deliver unreliable results. Second, the assignment of most severely ill patients are often doing so in the absence of
disease or outcome status can be subject to ‘observer bias’ any reliable evidence.
unless such outcomes are rigorously defined and outcome
assessors are blind to exposure status. In psychiatric studies, Which interventions and how?
this demands at least a structured psychiatric interview.
Cohort studies are less liable to the other main sources of bias Any kind of therapeutic intervention can be tested in a trial:
because of the prospective collection of data (Box 9.5). this includes psychotherapy and systems of delivery of care
as well as the more familiar drug trials. Whatever the inter-
Experimental studies vention it is essential to be clear about the nature of the treat-
ment being tested. For example in a drug trial, one needs to
A clinical trial is one kind of experiment. It is any type of consider the dose, type, formulation and route of administra-
study designed to establish the effects of a particular thera- tion of the medication. An important choice has to be made
peutic intervention. This includes uncontrolled trials, phase I between allowing only fixed doses and allowing clinicians in
(development) and phase IV (surveillance) studies in evalu- the trial to tailor the dose to a particular patient. Fixed-dose
ating new treatments – see Pocock (1983) for more details. trials generally give results that are easier to interpret, but they
are obviously different from how a drug is used in clinical

163

Companion to Psychiatric Studies

Table 9.2 Summary of the features of clinical trials

Type of study Essential features Advantages Disadvantages
No controls
Uncontrolled ‘trials’ All subjects are given one Cheap and easy
treatment No randomisation
Controlled trials Relatively straightforward Expensive and time-consuming
Randomised Two treatments are compared
control trials Randomisation reduces selection bias and Often difficult to find enough clusters to give
Cluster trials Random allocation of confounding adequate power
treatment Historical controls; order effects; carryover effects
Crossover trial Can assess the efficacy of certain health
Groups of individuals are services As above, only applicable to certain types of
N-of-1 trial randomised chronic diseases
Can study treatment of rare, chronic
Subjects are their own disorders
controls
Can establish effectiveness in an individual
A single subject patient

practice, and many patients will receive non-optimal doses. which have not (Moncrieff et al 1998). Having independent
Flexible dosing is more clinically representative, but difficult outcome assessors can go some way to mitigating these pro-
if not impossible to achieve without unblinding the responsible blems, but raters may still guess which treatment the patient
clinicians. One must also decide what, if any, other drugs are had. A useful method to quantify the success or otherwise of
permissible for participating patients. For example, anti- blinding in a trial is to simply ask patients, clinicians and/or
psychotic trials usually permit the concomitant prescription outcome raters what treatment they think a particular patient
of one or more anticholinergic drugs and ‘rescue’ medication has received – if blinding has been successful, they should
for behavioural disturbance. score no better than chance (Even et al 2000).

How long? Which outcomes should be measured and how?

Many trials in psychiatry are short, lasting 6 weeks or less. It is essential to choose a primary outcome. Clinical outcomes
This is simply because they are easier to do. Long trials are can be measured categorically or continuously. Dichotomous
complex, expensive and liable to suffer from large numbers categorical outcomes are probably the most clinically mean-
of people dropping out of the trial for various reasons. Clinical ingful. ‘Dead or alive’ is the standard and least subject to bias
practice should, however, be informed by long-term clinical primary outcome, but these events are too rare for practical
trials, particularly if we advise our patients to take a treatment use in psychiatric studies. Typical but less clear alternatives
long term. are ‘readmitted or not’, ‘recovered or not’, ‘relapsed or not’
and ‘still on treatment or not’. In many psychiatric trials,
Is blinding possible? recovery or relapse have been determined using arbitrary cut-
off points on symptom severity or behavioural scales. For
Controlled trials may be open (if the doctors and patients both example, recovery from acute illness is quite often measured
know what treatment is given), single-blind (if the patient is by a percentage reduction in symptom severity. Psychiatric
not told what treatment he or she is getting), double-blind research also tends to use many different rating scales (many
(if the treating doctor does not know, or cannot work out, of dubious validity, reliability etc., which is why we mention
what treatment his or her patient is getting) or triple-blind so few in this chapter), which can make it difficult to com-
(if, in addition, the outcome assessors do not know what pare the outcomes of trials. For example, the first 2000 con-
treatment a patient has received). The purpose of single-, trolled trials in schizophrenia used 640 different rating scales
double- and triple-blinding is to improve the reliability of trial (Thornley & Adams 1998).
findings, by reducing observer bias. Blinding is, however, rarely
entirely complete. Patients and doctors can quite often guess Controlled trials
whether that patient has received an active treatment accord-
ing to whether they have responded or not and whether they Uncontrolled trials are a means of establishing whether a
have side-effects. Single- and double-blinding are especially treatment works at all and what sort of adverse effects are
difficult, if not impossible, in non-drug trials. Even in com- prominent, but an unreliable one. The absence of a control
parative drug trials, particular side-effects can suggest that a group means that some, or even all, of these apparent effects
specific drug has been administered: e.g. trials of the tricyclic could be attributable to many different causes (the type of
antidepressants have commonly included anticholinergic com- patient, severity of disease, a true treatment effect, placebo
pounds in the placebo, and trials which use this approach gen- effect, etc.). If any treatment is given or taken with enough
erally find less advantage of tricyclics over placebo than those

164

Research methods, statistics and evidence-based practice CHAPTER 9

enthusiasm some people are likely to benefit. Controlled trials The problem with all these procedures is that they can
are obviously necessary if a particular treatment is to be evalu- potentially increase the chances of revealing the allocation
ated against placebo or against a pre-existing treatment, but (Schulz & Grimes 2002b). They are, however, essential in
even controlled studies tend to overestimate therapeutic small trials. In large trials, of say more than 500 patients, prop-
benefits. For example, there were numerous reports of erly conducted simple randomisation will usually, but not
apparent benefits from renal dialysis in schizophrenia until a always, generate sufficiently even groups by chance alone.
series of randomised control trials found no beneficial effects
(Carpenter et al 1983). The main potential problem with Cluster randomised trials
non-randomised controlled trials is that patients getting the
new treatment tend to be selected (consciously or not) to have One other important variant of randomisation is to randomise
a slightly less severe illness and/or better prognosis, so that the subjects in groups or clusters rather than as individuals. These
beneficial effects of the new drug or procedure are typically so-called ‘cluster randomised trials’ are most commonly used
overestimated by 30% (Schulz et al 1995; Juni et al 2001). in evaluating more global aspects of health services than one
This was the main reason for the ‘invention’ of the randomised particular treatment or where allocation of individual subjects
control trial. is not practicable (Gilbody & Whitty 2002). For example,
educating general practitioners about the detection and treat-
Randomised controlled trials ment of depression is ideally suited to cluster randomisation,
as general practitioners tend to work in group practices and it
Randomisation has two main purposes. First and foremost, it would be difficult in practice to educate one partner without
evenly distributes both the known and, more importantly, that education being communicated to others in the practice.
the unknown confounders (e.g. age, sex, prognostic factors) The main disadvantage of cluster randomisation is that the
of any observed therapeutic effect between the treatment unit of randomisation is effectively the unit of analysis, so
groups. Second, it avoids the potential selection biases that large numbers of clusters are required to give the trial
described above. This benefit depends on allocation conceal- adequate power.
ment, however. To be properly randomised, a trial should use
a randomisation schedule that is not predictable. For example, Explanatory and pragmatic approaches
if patient treatment is allocated by date of birth, it is possible to trial design
for investigators to bias the random allocation process by
finding reasons to include or exclude particular patients from The question addressed by a trial might be primarily to see
one or other treatment as they will know in advance which if a treatment can work and study the mechanism of therapeu-
the patient will be allocated to. tic action or it may be whether it does work in actual practice.
The former approach is referred to as an explanatory or
Effective randomisation is important. Chalmers et al (1983) efficacy trial and the latter as a pragmatic or effectiveness trial.
reported that 24% of studies with randomisation that allowed An explanatory trial is typically a small trial, with highly
‘easy cheating’ found a significant effect, whereas a sig- selective entry criteria, which focuses on one specific issue
nificant effect was found in only 9% of studies in which it and often uses intermediate or surrogate outcome measures.
was hard to subvert randomisation. The best way to pre- A ‘pragmatic’ trial is usually larger, with broad entry criteria,
vent interference with random allocation is to use an indepen- in which the interventions can feasibly be provided in routine
dent computer-based system. Once a patient has consented healthcare and the outcome is a major clinical event (Hotopf
to participate, an investigator contacts the randomisation ser- et al 1999). In practice there are not really two distinct types
vice (by telephone, fax or computer) and the patient’s base- of trial but rather a spectrum of approach according to the
line data are entered. Only then will the computer allocate question to be addressed and what is practical (McMahon
the patient using an algorithm and the operator informs the 2002).
investigator.
One type of pragmatic trial is the patient preference trial.
Special methods of randomisation are sometimes used to In this design, patients who are not willing to be randomised,
ensure that trial groups are balanced in terms of numbers or because they have a preference for a particular treatment,
patient characteristics. are not randomised but are given that treatment and are
followed-up as in the trial. The idea is that the results in
• Block randomisation simply randomises patients in groups these patients (who would otherwise not have participated)
(of four, six, etc.) to ensure that numbers are equal in the can be compared with those who did agree to be randomised;
two groups. see Bedi et al (2000) for an example. The problem with this
method is that although recruitment may be increased, and
• Stratified randomisation allocates patients on the basis of generalisability possibly enhanced, the results are likely to be
prognostic variables to ensure that these are evenly more biased and less reliable than if all eligible patients had
distributed, but requires an additional schedule for each been randomised. Another type of pragmatic trial is to only
stratum. randomise patients for whom the clinician is uncertain about
the best treatment. If a particular patient’s response to a
• An alternative is minimisation (or adaptive) randomisation
in which each patient is allocated to a particular group
by minimising any differences in important variables as
each particular patient is entered into the trial (Pocock
1983).

165

Companion to Psychiatric Studies

particular drug is already known, then it makes little sense to (but educational) subgroup analysis by astrological sign showed
run the risk of randomising them to a known ineffective treat- a particularly strong effect for those born under Capricorn,
ment. The main advantage of this is that the trial addresses a and no effect in Librans and Geminians (Peto et al 1993).
definite area of clinical uncertainty; the potential disadvantage
is that the generalisability of the findings is reduced. Intention-to-treat analysis

Crossover trials This refers to the approach of including all the participants
randomised in the trial, regardless of whether they completed
These are trials in which all participants receive two or more the trial in the analysis (McMahon 2002). It is distinguished
interventions, one after the other, with the order randomly from analysis on only those who completed the trial; so-called
allocated (Louis et al 1984). Crossover trials are an option in per protocol or ‘completer’ analysis. Intention to treat analysis
relatively rare diseases where the numbers of available parti- is essential in pragmatic trials. However, this means that some
cipants may not permit a randomised parallel-group controlled patients included in the analysis may have missing outcome
trial as two sets of comparison can be made. There are, how- data. One approach to estimating this missing data is to use
ever, also disadvantages of crossover trials: it is difficult to the ‘last observation carried forward’ (LOCF), i.e. the last
ensure that the trial is long enough to ensure therapeutic available measure for a particular subject is used as their final
effects are manifest in a reasonable number of participants, measure. This practice is, however, dubious for both statistical
but short-lived enough to avoid contaminating the subsequent and clinical reasons. As most patients who drop out from a
crossover phase when participants are given the alternative trial do so either because the drug has been ineffective and/
treatment. While for some trials it may be possible to include or side-effects of the treatment cannot be tolerated, even
a no-treatment ‘washout period’ this introduces new difficul- allowing for those who drop out for other reasons, they should
ties with sudden cessation of potentially effective treatments be regarded as treatment failures. Taking the LOCF, there-
and in ensuring the washout period is long enough to be effec- fore, may unpredictably over- or underestimate the benefits
tive, but short enough to be ethical. The one type of crossover of the treatment depending on the treatment and condition
trial which is probably underutilised is the N-of-1 trial in which evaluated. This may be trivial if dropouts are relatively few,
a particular patient is given two treatments blind. This has some but for example, most of the new ‘atypical’ antipsychotics
potential in individual patients in which it is simply not known have dropout rates of more than 30% in 6–12 weeks, and for
which treatment they should receive, but obviously requires a one or two of the drugs the dropout rate is more than 50%.
consenting patient and cooperation from the local pharmacy.
Systematic reviews and

Specific statistical issues in trials meta-analysis

Multiple hypothesis testing Any one study – no matter how large or well conducted –
needs independent replication. If, as is often the case, study
A power calculation is necessary before any trial to determine results vary, systematic reviews and meta-analyses can help
how many participants will be required to detect a specified deliver consensus. These are examples of ‘secondary research’
difference between the treatment group at a given level of sta- in that they synthesise the available relevant evidence in a
tistical significance. One therefore needs to define the primary given research area. They are generally seen as increasingly
outcome measure and to choose a minimal important differ- important because, provided they are done well, they provide
ence to detect. This is defined in terms of proportions for a the single most reliable piece of evidence on a particular topic.
dichotomous outcome or mean difference for a continuous Systematic reviews identify and cite studies in a pre-specified
measure (with estimated standard deviation). Secondary out- and reliable way; while meta-analyses provide an overall numer-
comes must also be considered. These might include measures ical effect across several studies. Done properly, systematic
of adverse effect frequency, levels of functioning or quality- reviews are important for two main reasons. First, as alluded
of-life measures, as ‘surrogate outcome measures’. Multiple to above, any individual study, no matter how rigorously con-
outcome measures may be thought of as giving better coverage ducted, analysed and interpreted, can only be an estimate of
of therapeutic response, but they also increase the chances of the true underlying effect and needs to be independently repli-
finding statistically significant differences by chance alone cated; but subsequent studies often apparently disagree with
because a number of statistical tests were done. Multiple the original for a wide variety of reasons. Second, the amount
hypothesis testing is acceptable if these were pre-specified of literature in many areas is now so vast that doctors need
hypotheses, but some sort of statistical correction for multiple access to high-quality review articles which reliably and con-
testing should usually be made. Exploratory testing, some- veniently provide an up-to-date summary. In addition, system-
times known as ‘data dredging’, is unacceptable unless clearly atic reviews and meta-analyses are a crucial first step for
acknowledged. Such analyses are and should be seen as planning future research and providing the evidence-base for
hypothesis-generating for further investigations (Pocock et al valid clinical guidelines.
2002). Perhaps the best example of this problem comes from
the ISIS-2 study, which found among other things that aspirin Raising awareness about the importance of these methods,
reduced mortality in acute myocardial infarction; an absurd and stimulating researchers to refine the techniques, has argu-
ably been the greatest achievement of the evidence-based

166

Research methods, statistics and evidence-based practice CHAPTER 9

medicine ‘movement’ thus far. Traditional ‘narrative’ reviews, Box 9.6
in journals or books, are generally based on a selective citation
and/or reading of the literature, and books in particular tend to Guidelines for assessing research reviews
be out-of-date. A review is a study of studies and just as prone
to selection bias. For example, thrombolysis for acute myo- Were the questions and methods clearly stated?
cardial infarction could have been known to be beneficial about Were comprehensive search methods used to locate relevant
10 years before textbooks were saying so if someone had done
a systematic review and meta-analysis (Antman et al 1992). studies?
There is no such dramatic example in psychiatry, but there
are numerous treatments which are not routinely used despite Were explicit methods used to determine which articles to
good evidence for their efficacy (e.g. education and family include in the review?
interventions for schizophrenia).
Was the validity of the primary studies assessed?
Systematic reviews have the following key components: Was the assessment of the primary studies reproducible and

• the formulation of a specific question; free from bias?

• prespecification of the types of article to be included and Was variation in the findings of the relevant studies analysed?
excluded; Were the findings of the primary studies combined appropriately?
Were the reviewers’ conclusions supported by the data cited?
• prespecification of the outcome(s) of interest and
important potential confounders or effect modifiers, and A tendency of researchers to multiple publication can have
of any planned subgroup comparisons; the same effect. It is alarming, but nevertheless true, that
some researchers seem to adopt the strategy of conducting
• use of several search strategies to identify articles from essentially one study and then serially publishing it over many
a number of sources (chiefly computerised databases), years and sometimes even decades. The most dramatic exam-
supplemented by searching through the references of ple of this we are aware of is that one antipsychotic drug trial
included studies and preferably by contacting researchers has been published in one way or another in almost 100 sepa-
for any published or unpublished studies of which they are rate publications (Gilbody & Song 2000). Indeed, we suggest
aware; this is sometimes augmented by a hand search of that if systematic reviewers do not find examples of multiple
key journals and/or by searching the ‘grey literature’ in, publication bias, their search probably has not been detailed
for example, textbooks and theses; and enough. Publication bias may become less of a problem than
it used to be, particularly for randomised control trials, as
• identifying the relevant articles and extracting the relevant there are now a number of registers of completed and ongoing
data. clinical trials (e.g. in the Cochrane Library), but it remains a
particular problem for reviewers of observational research
Properly conducted systematic reviews are, therefore, time- (Egger et al 2001). There are many other potential sources of
consuming, protocol-driven studies in their own right; studies bias in systematic reviews, and these must be very carefully
of publications rather than patients. The key feature is that addressed to minimise bias in estimating the overall effects
such a review be repeatable, i.e. given your methods other (Egger et al 2001).
researchers would identify the same articles and draw the
same conclusions. Reliability is augmented if two or more Meta-analyses mathematically combine the results of differ-
researchers independently examine the (electronic) searches ent studies to give one summary estimate of a given effect.
to identify papers for possible inclusion and search through Different studies may find apparently incompatible results
such papers for data to be extracted. Oxman et al (1994) for several reasons, including inadequate power, measurement
helpfully devised eight questions for assessing the quality of error and different liabilities to bias and confounding. Simple
research reviews (Box 9.6). The Department of Health ‘vote-counting’ summary techniques, comparing the number
(2001) psychotherapy guidelines, for example, included only of positive studies to the number of ‘negative’ studies, are
reviews that met at least six of these eight criteria. unreliable as they do not include an assessment of study size
or quality and themselves have low power. Large studies tend
The desirability of contacting other researchers for any generally to be of higher quality and provide more precise esti-
unpublished studies is that ‘publication bias’ is a large threat mates of any given effect than smaller studies and should have
to the validity of a systematic review. Publication bias is the relatively greater influence on a summary effect. This is
tendency for researchers to be more likely to write up, and achieved by ‘weighting’ studies according to their size and/or
(especially leading) journals to publish positive rather than quality. The choice of techniques to perform and interpret
negative studies. As such, if one only searches for or finds meta-analyses is wide, and anyone undertaking a meta-analysis
published studies it is possible that the summary estimate will should have expert statistical advice and support. Under some
be biased – although this can be checked statistically (see circumstances, if there is substantial heterogeneity between
Gilbody & Song 2000). Publication bias is more likely if one studies, which cannot be explained, it may not be appropriate
searches only one database (e.g. Medline is biased towards to generate an overall estimate of effect at all. In other cases,
North American studies, whilst BIDS and Psychlit are more for example where the review combines a number of studies
biased towards European studies). This is also the case if one with poor methodology (e.g. trials with poor allocation con-
extracts only English language articles (‘language bias’) as there cealment) and yields an apparent benefit of 20–30%, seem-
are empirical examples of European researchers reporting ingly clinically worthwhile effects can merely be due to bias.
negative results in their native languages and positive results
in English (although the reverse has also been described).

167

Companion to Psychiatric Studies

In other words, the treatment may have no effect and the Chance
apparent effect could simply be due to poor methodology.
Chance or ‘random error’ are related to issues of study power
While no one can seriously challenge the desirability of sys- and statistical significance. Suffice to say here that if a study
tematic reviews, there remain many critics of meta-analysis. is insufficiently powered, whether the result is positive
Meta-analysis certainly has its limitations, but not all of the or ‘negative’, chance remains a likely explanation. Further,
most frequent criticisms are justified. Some ‘authorities’ state studies which do not find a statistically significant association
that meta-analysis is akin to ‘combining apples and oranges’ are, in fact, neutral rather than negative studies unless they
with the only result being a ‘fruit salad’. Notwithstanding the have been specifically powered to be able to show ‘non-
fact that apples and oranges share many important similarities, equivalence’, and this generally requires several hundreds of
there are now specific techniques which can and should be subjects in each group. In other words, a lack of evidence of
employed in any meta-analysis to assess whether studies can an effect is not the same as evidence of a lack of effect.
be meaningfully combined (tests for ‘heterogeneity’). A more
reasonable criticism, of ‘rubbish in, rubbish out’, is patently Reverse causality
true but is not a reason for not doing meta-analyses. The result
one gets from combining studies will depend on the quality of Reverse causality simply refers to the situation where an effect
the originals, but it is far better to objectively assess study (e.g. disease) actually leads to the apparent cause (e.g. an
quality than to resort to the alternative of ‘experts’ citing the exposure) rather than vice versa. This is particularly likely in
strengths of favoured studies, which are often their own, and case-control and descriptive studies. For example, the original
denigrating others. Indeed, there are a number of reliable ways descriptions of an association between schizophrenia and low
of measuring study quality, and comparing the effects in high- social class have subsequently been attributed to social down-
and low-quality studies is an important part of a meta-analysis. ward drift after the onset of the disorder (although there are
It may, however, be best to focus on specific indices of quality now suggestions that living in a city or overcrowding may also
(e.g. allocation concealment) rather than on generic quality have an aetiological role). Similarly, non-independent life
scores (Juni et al 2001). This may even be the most important events can often follow depression rather than precede it.
part of an observational meta-analysis in that clues to why
studies find particular results can inform future studies of Bias
that issue.
Bias can be defined as ‘any process at any stage of inference
More pertinent criticisms of meta-analyses include that which tends to produce results or conclusions that differ sys-
they are unreliable if the studies they include cover few out- tematically from the truth’ (Sackett 1979). There are numer-
come events and if there were only a small number of small ous subtypes of bias that can arise at any stage of research, and
studies. Importantly, since meta-analysis was devised to deal Sackett mentions 35, but there are two broad categories: selec-
with the results from randomised control trials, there are still tion (or recruitment) bias and information (measurement or
no ways of satisfactorily combining the results of randomised observation) bias. These can be usefully subdivided according
and non-randomised clinical trials, and the liabilities to bias to whether they are primarily introduced by researchers or
in observational research require great caution in interpreting subjects themselves (Box 9.7).
a meta-analysis of observational studies. A particular problem
occasionally arises where a meta-analysis disagrees with the Selection bias stems from an absence of comparability
results from a large RCT or when two or more meta-analyses between the groups being studied. This tends to arise either
of the same issue find opposing results. Nonetheless, the because researchers recruit from an unrepresentative popu-
advantages of meta-analysis generally outweigh the problems. lation of subjects, e.g. hospital inpatients (sampling bias).
There are now standardised methods of reporting the results The only way to avoid this is to randomly sample from popu-
of observational and experimental meta-analyses (Moher et al lation registers, but such attempts can always fail because of
1999; Stroup et al 2000) which are complementary to similar non-representative participation by subjects (response bias).
articles on the reporting of individual randomised control trials Random samples are those in which every individual in a
and serve as excellent introductions for potential reviewers.
Particularly interested readers are referred to an excellent Box 9.7
book (Egger et al 2001) that discusses all the foregoing in far
greater detail. The main types of bias in medical research

Causal inference Selection bias

Factors to consider as possible Sampling: non-representative identification/recruitment of
explanations for research findings subjects

Any association found in medical research could theoretically Response: unrepresentative participation by subjects
be attributable to one or more of the following explanations:
chance, reverse causality, bias and/or confounding. Information bias

Interviewer: differential data recording in subject groups by
researchers

Recall: historical data is selectively filtered by subjects

168

Research methods, statistics and evidence-based practice CHAPTER 9

population has an equal chance of being included or excluded unconsciously or not. If the exposure is measured after dis-
in a study, but non-response is far from random. As a rule, ease, ‘exposure suspicion bias’ can influence both the intensity
people are more likely to participate in research if they per- and the outcome of a search for exposures in affected subjects.
ceive a likely benefit for themselves or people like them, but Conversely, ‘diagnostic suspicion bias’ can arise if researchers
the determinants of research participation have been poorly strive harder to detect disease in those known to have been
studied. These issues are particularly important in case-control exposed. Generally, all observers, researchers and patients,
studies as they do not have the advantage that surveys and tend to make observations that concur with their expectations
cohort studies do of being able to identify a representative (‘expectation bias’).
sample of controls from among large numbers of non-cases.
Randomised control trials use the same patients as potential These problems can be reduced by using self-administered
subjects and controls prior to randomisation. At least in such questionnaires and computerised assessments, but these are
cases, it is possible to compare those who are identified and probably not suitable for illiterate and psychotic individuals.
participate with those who do not in terms of possibly impor- Highly structured interviews, searching for unambiguous or
tant confounders. An alternative to attempt random sampling relatively ‘hard’ information and blinding researchers to sub-
in case-control studies is to use the ‘snowball’ technique to ject group membership are alternatives. Lay interviewers may
recruit subjects (ask those who do participate to nominate be preferable to medical interviewers as they are more likely
others they know of with the design characteristics who might to follow instructions and less likely to use their own ‘judge-
also take part). Similarly, it is sometimes appropriate to recruit ment’, but this is obviously not possible if the information that
as controls genetic or non-genetic relatives or acquaintances of is sought requires medical training to elicit.
those who are affected.
Recall bias is the equivalent of interviewer bias, but intro-
Important subtypes of sampling bias include: duced by subjects rather than researchers. Subjects generally
tend to alter their responses in the direction they perceive is
• admission (Berkson) bias – where hospitalisation rates desired by the investigator (‘obsequiousness bias’). Patients
differ for particular exposure/disease groups such that the and their relatives are likely to ‘search after meaning’ for pos-
relationship between exposure and disease is distorted in sible exposures to explain their disease (also called ‘rumination
hospitalised patients; bias’). A good psychiatric example is the maternal recall of
obstetric complications in their schizophrenic children –
• referral filter bias – when patient referrals from primary to although this can potentially be cross-checked with obstetric
secondary care increase the concentration of rare exposures records. Conversely, if subjects know they have been exposed,
and severe diseases; they may be more likely to report symptoms of a disease.

• diagnostic purity bias – where the exclusion of comorbidity In theory, any bias can increase or decrease the strength of
results in a non-representative sample; an association, but the investigators’ desire to find a positive
result means that such biases tend to produce more false posi-
• membership bias – where group affiliation is used to tives than negatives. As a rule, bias is introduced by poor
identify subjects, e.g. members of a patients’ organisation, research techniques and can be minimised, though probably
hospital staff controls; not entirely avoided, by careful consideration of these issues
in study design. It is difficult to measure bias and impossible
• historical control bias – where secular changes in disease to correct for it once it has occurred in a particular study.
definition, exposures, treatments, etc. render such controls It is, however, sometimes possible to examine potential roles
incomparable; and of bias in systematic reviews and meta-analyses of several
studies combined.
• ascertainment bias – where two groups of subjects are
recruited in different ways and differ because of this. Confounding

Important subtypes of response bias include: Confounding arises when the effects of two processes are not
separated. It can be defined as ‘the distortion of the apparent
• non-respondent bias – as non-respondents are often those effect of an exposure brought about by the association with
who are most ill – and the reverse situation of volunteer other factors that can influence the outcome’ (Last 1995).
bias; Put another way, confounding mixes the effects between an
exposure and a disease and a third factor associated with both.
• unacceptable disease bias; and This can lead to a false association (positive confounding) or
obscure a true association (negative confounding). A con-
• missing data bias – e.g. on sensitive questions about founder is therefore an independent risk or protective factor
sexuality, etc. for a disease that varies systematically with another exposure.
Further, a confounder is in a triangular relationship with the
Information (measurement or observation) bias arises through other variables, rather than on the causal pathway (Fig. 9.3).
the systematic misclassification of disease or exposure, or both, For example, there is increasing evidence that being born and
by researchers and the instruments they use or subjects them- raised in cities is a risk factor for schizophrenia. The social
selves. If people know they are being observed, they tend to drift of previous generations of people with schizophrenia
normalise their behaviour and minimise any perceived deviation could, however, account for this association, i.e. genetic
from the norm (‘attention bias’). Examples of this include the
so-called ‘Hawthorne effect’. Similar principles may at least
partly underlie the often remarkably beneficial effects of simply
asking patients, e.g. with eating disorders, to monitor certain
aspects of behaviour.

Interviewer bias arises when researchers are not blind to
exposure or disease status and tend to alter their approach,

169

Companion to Psychiatric Studies

A RF1 Disease and interactions with matched variables can be studied), and
if one inadvertently matches on variables that are not con-
RF2 founders this will actually tend to reduce statistical power.
RF2 confounds the relationship between RF1 and the disease In general, therefore, it is preferable to measure a potential
confounder and control for it statistically. Matching is, how-
B RF1 RF2 Disease ever, preferable if it is difficult to accurately measure or
RF2 is the mediator between RF1 and the outcome classify confounders (e.g. genetic liability to depression).

C RF1 Disease Statistical control of confounding involves two methods.
The first is a ‘stratified analysis’ where a potential confounder
RF2 (a … n) such as age is treated by separating subjects into age groupings,
Different levels of RF2 modify the effect of RF1 on the or ‘strata’. Standardised mortality rates are an example of this.
likelihood or severity of the outcome The second, although related approach, is to conduct multivar-
iate analyses using regression. As before, however, if it is diffi-
D RF1 cult to measure a confounder accurately, attempted statistical
control for it is likely to leave some ‘residual confounding’.
Disease This is a particular problem when so-called proxy measures
are used, e.g. using paternal social class as an index of child-
RF2 hood environment. If, as is common, adjustment for confound-
ing reduces the association, residual confounding should be
RF1 and RF2 independently cause the outcome considered. On the other hand, confounding can be over-
controlled if two variables are strongly related or if an apparent
Fig. 9.3 Schematic diagrams of (A) confounding, (B) intervening/ confounder actually lies on the causal pathway. This is a partic-
mediator variable, (C) effect modifier/moderating variable and ular potential problem in psychiatric studies, as our knowledge
(D) independent risk factors (RF, risk factor). of the mechanisms of most psychiatric disorders is rather
limited. Statistical control for confounding should, therefore,
liability to schizophrenia and urban drift could positively con- always present both uncorrected and adjusted values of
found the apparent association between urban upbringing and associations.
the disease. Age, sex and social class are common confounders.
Confounders can, however, only exert effects if they differ The language of risk
between study groups, and confounding can, sometimes, be
reduced in the design stage or measured and controlled. A risk factor is ‘an attribute or exposure that is associated with
an increased probability of an outcome, such as disease’ (Last
Controlling confounding 1995). The term is, however, rather loosely used and merits
clarification and subdivision. Offord & Kraemer (2000) sug-
The common methods for reducing confounding are: gest that there are three different types of risk factor:
• randomisation (as in RCTs);
• restriction; and • a fixed marker – a risk factor that cannot be changed,
• matching. e.g. sex;
Restriction is done by selecting only subjects who have a par-
ticular range of values of a potential confounder, e.g. young • a variable marker – a risk factor that can be manipulated
age. Matching, on the other hand, at least in ‘individual match- but, even so, does not change the risk of an outcome; and
ing’, selects case-control pairs with similar properties, e.g. age
and sex. Rather confusingly, ‘frequency matching’ lies some- • a causal risk factor – which is both manipulable and changes
where between restriction and individual matching, as the the probability of an outcome.
values of particular variables are ‘balanced’ overall between
two groups. Matching reduces confounding and improves sta- For example, poverty is a risk factor for conduct disorder, but
tistical power, but is not to be used lightly. Identifying poten- evidence thus far suggests that increasing income does not
tial subjects becomes more difficult; one cannot study the reduce the risk; whereas improving parenting practices does
association between the disease and exposure on the matched appear to reduce the incidence. This refinement of terms,
variables (although this may be possible with balanced groups, which may seem rather abstract, has important implications.
First, risk factors are not necessarily causal. Second, risk
factors may differ in different populations and at different
points in the history of a disease. Greater attention to these
definitions might well help to improve our understanding and
management of psychiatric disorder: aetiological research and
screening programmes might best focus on fixed or variable
(trait) markers, while pathophysiological and therapeutic
studies should arguably focus on causal risk factors and state-
related changes.

Similar complexities need to be appreciated about variables
that potentially intervene between risk factors and outcomes

170

Research methods, statistics and evidence-based practice CHAPTER 9

Box 9.8 main stages in designing a research project (but see e.g. Lawrie
et al 2000 for further details).
Bradford Hill’s criteria for causation
1. Have an idea This may arise from clinical observations,
Temporal sequence discussion with other staff or reading the journals.
Dose–response relationship Published articles often discuss what further studies are
Strength of association required. Simply attempting to replicate a particular study
Consistency with minor alterations to clarify one issue can be an
Biological plausibility important contribution. Turning the idea into a question
may help focus the idea, by clarifying the aims and
(Kraemer et al 2001). Strictly speaking, intermediate or inter- hypotheses, and suggest a research design. It can be helpful
vening variables are ‘mediator variables’, which are caused to to think in terms of the ‘four part clinical question’: patient
vary by independent variables and then cause variations in problem, exposure, outcome and control subjects or
dependent variables (Last 1995). Distinguishing this from con- interventions. Note that one aim and hypothesis is generally
founding clearly depends upon available biological knowledge. preferable, as it is more likely to be thoroughly addressed
In contrast, ‘effect modifiers’ or ‘moderators’ vary the expo- than when attempting to answer two or more.
sure effect across different levels of that variable. Any distinc-
tion between mediators and moderators in many research 2. Review the relevant literature Truly original ideas are very
reports is commonly ambiguous (but see Fig. 9.3). Kraemer rare. Getting up to speed with the available research is
et al (2001) suggest definitions of what they call proxy risk often the most time-consuming process in preparing a
factors and overlapping risk factors (both of which are con- study, but is time well spent. It will identify what needs to
founders), independent risk factors, mediators and moderators be done, identify the strengths of weaknesses of particular
(which are not). research designs, what measures are potentially relevant,
etc. If there is no recent good review article then you
Criteria for causality should write one or, preferably, do a systematic review.
If you are thinking about a therapeutic question, look at the
Establishing causation in multifactorial disorders is clearly Cochrane Library. If you are prepared to do a systematic
complex and difficult. Old models, such as Koch’s postulates review, consider a Cochrane review, as you will get
and causes being ‘necessary and sufficient’, are inadequate. considerable methodological support and training to do it.
Bradford Hill indirectly proposed some criteria for deciding
whether an exposure caused an outcome (Box 9.8). Of these, 3. Consider the advantages and disadvantages of the various
perhaps the most important is that of establishing a temporal research designs You must realistically evaluate what you
sequence, in demonstrating that the cause precedes the out- are able to do. What sort of patients and controls, and how
come, and that of an increase in the risk as the degree of expo- many of them, do you have access to? It is wise to study the
sure increases (a dose–response relationship). The strength of sorts of patients you are likely to see anyway in your clinical
the association, i.e. size of the odds ratio or relative risk, is work. How much time will you have? If, for example, you
another important guide. Some authorities recommend that have one 4-hour session a week for research, and each
newly identified exposures or risk factors should only be taken subject’s measures will take an hour, it will take you at least
seriously if the odds ratio is at least 3, at least until replica- 6 months to recruit and assess 50 patients and 50 controls.
tion. Replication with different study designs and different Even this can only start once you have ethical approval
methods is crucial. Finally, any observed association should (let alone grant funding) and have conducted a pilot study.
have biological plausibility, based on what is known about Will you have any help? One assistant will halve the time
pathophysiology. required and will be able to ensure that any measures have
‘inter-rater reliability’ if that is required.
Developing your own research ideas
4. Write a research protocol (Box 9.9). At this stage it is
Doing research is intellectually stimulating and rewarding, the worth consulting an experienced researcher and/or
best way of appreciating the strengths and limitations of dif- statistician if you have not already done so. Statistical and
ferent approaches, and can even generate valuable new knowl- methodological support is very important at all stages of a
edge. An enquiring mind, careful planning and a determination study. Good research supervision will help to focus your
to finish are enough for many research projects. Personal and idea, clarify the design and use appropriate methods of
financial assistance are not essential for descriptive and case- analysis. A detailed protocol is essential for properly
control studies, or systematic reviews, although they do help. planning the study and will serve as a template for
On the other hand, sizeable grants and research teams are any ethical approval or grant applications you need to
generally required for cohort studies, most clinical trials and apply for.
technological investigations. We shall quickly run through the
A protocol starts with a title and introduction. This should
introduce the issue you are addressing, briefly summarise
the main findings and limitations of previous studies, and
describe the aims of your study. If possible write a specific
hypothesis to test. You may find it helpful, in writing the
introduction, to ask yourself why does this study need to be
done, at this time and in this way?

171

Companion to Psychiatric Studies

Box 9.9 Basic demographics (age, sex, social class) can be obtained
from simply asking the subjects, although some descriptors
A typical study protocol and suggestions on how (e.g. intelligence) will require specific scales. Relevant
to complete it disease parameters, such as duration, medication and
number of admissions, can be easily and fairly reliably
Title obtained from case notes. Symptom severity is generally
worth measuring and may be the outcome of primary
Keep it brief interest. These variables may modify any effects you
discover, either in reality or as confounders. Important
Introduction potential exposures and confounders need to be measured,
as do outcomes. If you are interested in something for
Why do the study? which there is no scale or questionnaire, it is probably
Previous and pilot studies better to devise one yourself than to forcefully adapt
Aims and/or hypotheses something else. Devised instruments should be focused,
brief and clear. If you need a simple measure of severity, for
Methods example, do not be afraid to use a simple 0 to 10 scale.

Design (e.g. case-control) You will also need to give some thought to how the data
Subjects (who, where from and how defined); inclusion and will be recorded, stored, checked and analysed. This and
indeed all other aspects of your study can and should be
exclusion criteria examined in a pilot study.
Measures (demographics, disease descriptors, exposures and
5. Do a pilot study This may use up time and subjects, but is a
outcomes) crucial test of whether your aims, subject recruitment plans
Power calculation and statistical tests and methods are realistic, as it is better to identify
problems at this stage. A pilot study may give you the best
References available information for a power calculation to determine
exactly how many subjects you will need to examine in the
Essential references only study. Further, many grant-giving bodies and ethical
committees now require evidence from pilot studies to
Patient information and consent forms ensure that proposed studies are feasible.

In lay language 6. Plan the statistical analysis This should be done before you
start collecting data. The analysis should address your
The protocol should then describe the precise methods you question. The actual tests used will depend on the types
intend to employ, particularly how you will identify and and likely distribution of your data (i.e. parametric or non-
define subjects, the measurements you plan to make and the parametric statistics), whether you are simply describing
plan for statistical analysis of your findings. Cases and subjects or comparing them (i.e. descriptive or inferential
controls should be as representative as possible of the entire statistics), the number of groups and any planned analysis
populations from which they come. Controls are usually of potential confounders (e.g. by subgroups, analyses
even more difficult to identify and recruit. Any psychiatric of covariance, regression) – see next section.
diagnoses must be verified, preferably by actually
interviewing all the subjects – although this will take about 7. Write a patient information sheet and consent form
an hour or more with most of the available structured Writing these will help you present your research idea in
psychiatric interviews. the sort of lay language that is required for ethical
Inclusion and exclusion criteria are usually based on approval submissions (which always require information
diagnosis and demographics, such as age. Do not make and consent sheets and often place the greatest emphasis
inclusion criteria too restrictive or exclusions too numerous, on them) and may be required for some grant-giving
as your study will lose representativeness and the ability to bodies.
examine any differences within your patient group, and may
well suffer from the ‘disappearing patient’ phenomenon 8. Keep your data safe and backed up Store your protocol,
(sometimes called ‘Lasagna’s law’). Inclusions are best kept correspondence, any patient records and collected data in a
to the diagnosis of interest, and exclusions limited to locked, fireproof filing cabinet. Make regular copies or
important potentially confounding conditions such as, for backups of your data and store this separately from your
example, neurological disease. computer, preferably in a fireproof safe. All too often,
You must then describe the measurements you plan to computers are stolen, or floods and fire damage research
make. Measures should ideally be reliable, valid, objective offices, and data from important studies are irretrievably
and standard. Only directly relevant data should be lost. Unfortunately, most researchers do not start to make
collected, to minimise the amount of time it takes you and regular backups until they have suffered some kind of major
the subjects to complete the study and to ensure that you data loss.
get good quality data (sometimes less is more!). Using at
least some of the measures in some of the previous studies 9. Write your paper A good protocol will provide you with the
may aid comparison. Some of the more commonly used bulk of the introduction and methods sections of any papers
scales are described in the next section of this chapter. you write. The purpose of a paper is to communicate the

172

Research methods, statistics and evidence-based practice CHAPTER 9

essence of a research project so that it can be appraised in of the probability of readmission following an acute episode
the context of others. An introduction should briefly of schizophrenia. In a study of 100 people over 1 year, let
summarise what is known and what needs to be found out. us say that 40 are readmitted. What is the probability of read-
A detailed methods section should be written so as to allow mission over 1 year? From the above calculation we see that it
your study to be exactly replicated (if anyone so desires). is 40 (the number of events) divided by 100 (the number of
The results section should focus on answering your question possible events). Therefore the probability or risk of readmis-
and only give important positive and negative findings. sion is 0.4 or 40%.
The discussion should consider the results, strengths and
limitations, and implications of your findings in the context There are alternative ways of stating the possibility that an
of other studies. While you are waiting for the research to event will occur. Those who are familiar with betting will rec-
be published, you could always try to answer the new ognise the term odds. This statistic is simply the ratio of events
questions it will inevitably raise. of interest to the non-events. To take the example of the die,
the odds of throwing a 6 are 1 (the number of events) divided
Statistics by (the number of non-events) 5. Taking the example of
relapse in schizophrenia, we can see that the odds of relapse
The term statistics literally means ‘numerical data’, and statis- are 40 (the number of events) divided by 60 (the number of
tics as a discipline is the science of assembling and interpreting non-events) or 0.67. These examples illustrate several impor-
numerical data (Bland 2000). Some form of numerical analysis tant points. The first is that we do not have to throw a die
is useful in most research, and all doctors should be able to 100 times to calculate the probability of throwing a 6;
understand and interpret the findings presented in medical we already have prior knowledge of its behaviour or probability
journals that inform their clinical practice. distribution (i.e. it has a uniform distribution). In the second
example, we do not know the probability distribution of
Almost all statistics are based on the premise that study of a readmission in schizophrenia so we determine it empirically.
sample of people with a condition allows the inference of Secondly, note that the probability and odds are quite similar
something more general about the population from which they in the first example of the die, whereas in the second example
came. To do this, studies need to be valid (i.e. well designed they are quite different. This is because the event of interest in
and conducted) and present clear results using the appropriate the first example is less frequent than in the second example.
tests. Statistics is concerned not merely with the results of an For very rare events the odds provide a good approximation to
experiment or study, but also with its design. Badly designed the probability and are sometimes preferred because of their
studies are unlikely to yield reliable results, no matter how mathematical properties, particularly for logistic regression
sophisticated the statistical analysis may be. (for further reading see the reference list). Thirdly, while
probabilities range from 0 to 1, odds range from 0 to infinity
Most of the features determining the validity of a study and are often plotted on a logarithmic scale.
have already been covered in this chapter. This section will
explain the most commonly used descriptive and inferential Where there is only one group of people and we wish to
statistics relevant to data analysis and interpretation. By give an estimate of the chance with which an event will occur,
descriptive statistics we mean the summary, tabulation and the odds or probability will usually suffice. Odds and probabil-
graphical display of numerical information. By inferential sta- ity can be converted into one another using the following
tistics, we mean the inferences about the population that are formulae:
drawn from the sample.
odds ¼ 1 P P ðprobability of an event=
Probability and risk À

Probability and risk are interchangeable terms used to quantify probability of a non À eventÞ
uncertainty or the chance of an observation (or event) occur-
ring. Probability always takes a value between 0, indicating P ¼ ð1 0 0Þ ðp, probability; o, oddsÞ
that an event cannot occur, and 1 where it is a certainty. The þ
probability of an event can be easily understood by referring
to an unbiased six-sided die. If we ask ourselves the probability If we want to know the probability of two independent events
of throwing a 6, almost intuitively we come up with the solu- (i.e. the outcome of one event will not influence the probabil-
tion 1 in 6, or 0.167. Without realising it we are undertaking ity of the other), we multiply their probabilities together.
the following calculation: Therefore, the probability of throwing two 6s in a row is 1/6
times 1/6, i.e. 1/36. If two events are mutually exclusive, how-
Probability ¼ number of events=number of possible events ever (i.e. one event precludes the other), then the probabilities
are added. For example, the probability of throwing a six or a
We know that each side of the die is equally likely (if the five is 1/6 plus 1/6, which equals 2/6 or 1/3. Similarly, in
die is unbiased) and that there are six sides. Therefore the a study where 40 out of 100 people relapse and 60 do not,
probability that any side will turn face up is 1 in 6. The prob- the probability of either relapse or non-relapse is 40/100 plus
abilities of events in medicine are rarely quite as simple as 60/100, i.e. 1.
this although the principles are the same. Consider a study
In a clinical trial we generally compare the probability or
odds of one or more outcomes in the two groups. If the treat-
ment is efficacious, the probability of a beneficial event (like
recovery) will be greater in the treated (or experimental

173

Companion to Psychiatric Studies

Table 9.3 Contingency table showing relapse in haloperidol- and outcomes from one another. It makes no mathematical sense
placebo-treated patients to subtract odds from other odds so we will concentrate on
the probabilities or risks. In the example above, the risk of
Relapse Non-relapse Total relapse (EER) in the experimental group was 0.2 (or 20%)
and the risk of relapse in the control group (CER) was 0.5.
Haloperidol 20 80 100 By subtracting these figures we can also say that the risk differ-
Placebo 50 50 100 ence is 0.3 (30%). This is called the risk difference or absolute
risk reduction (ARR).

Absolute risk reduction ¼ CER À EER

group) than it is in the control group. These probabilities, or Absolute risk is a more clinically useful measure of treatment
risks, are sometimes called the experimental event rate efficacy than the relative risk because the relative risk is rela-
(EER) and control event rate (CER) respectively. The experi- tively insensitive to the underlying or absolute risk in
mental event rate is defined as the probability of an event in untreated individuals. If in our example, the EER were 0.02
the experimental group, and the control event rate is defined instead of 0.2, and the CER were 0.05 instead of 0.5, the rela-
as the probability of an event in the control group. These terms tive risk would remain the same and would suggest to the
are applicable to both undesirable and desirable events. na¨ıve reader that the treatment confers a considerable treat-
ment benefit. However, because the risk to untreated indivi-
Consider, for example, a trial where patients are rando- duals is comparatively low (0.05 or 5%) then the benefits of
mised to receive either haloperidol or placebo. One hundred treatment in clinical practice would be less impressive. This
people are treated in the haloperidol group and 100 are trea- is best illustrated by the ARR or risk difference, which in this
ted with placebo. Fifty people in the placebo group relapse case would be only 0.03 (3%) instead of 0.3 (30%). In other
over the course of the 6-week trial and 20 relapse in the words, we would need to treat about 34 people to prevent
haloperidol-treated group. We can represent the data in the one relapse instead of just 3.3. These figures are calculated
form of a 2 Â 2 (contingency) table, as shown in Table 9.3. by taking the reciprocal of the ARR to get the number needed
The probability (or risk) of relapse in the haloperidol-treated to treat (NNT) (NNT ¼ 1/ARR). These statistics and their
group is 20/100 or 0.2. This is called the experimental event applications are discussed in greater detail below.
rate. Alternatively we could also say that the odds of relapse
were 20/80 or 0.25. Incidence and prevalence

The probability of relapse in the placebo-treated group Epidemiology is the study of disease in populations. In many
is 50/100 or 0.5. This is called the control event rate. cases epidemiologists study populations by taking what they
Alternatively, we could also say that the odds of relapse are hope is a representative sample and inferring something about
0.5/0.5 or 1. the population from which they were drawn. In order to do
this, two numerical concepts must be introduced: incidence
These probabilities or odds can be combined to give a single and prevalence. Incidence is the number of new cases arising
measure of treatment effect. The simplest and most useful in a given population in a defined time period. For instance,
things to do are to divide probabilities or odds in the two schizophrenia has an incidence of about 2 cases per 10 000
groups, giving the relative risk and odds ratio, respectively. population per year. Incidence is in fact an event rate, as are
Alternatively we could subtract the probabilities from each the experimental and control event rates mentioned above.
other giving an absolute risk or risk difference. Prevalence is the number of people within a population who
have the disease of interest at any given time. The prevalence
In the above example EER ¼ 0.2 and CER ¼ 0.5. Dividing of schizophrenia is sometimes quoted as 0.5 per 100 (0.5%),
these two risks we get 0.2/0.5 or 0.4 (40%). This is known as meaning that at any time point, 5 in 1000 people fulfill diag-
the relative risk (RR). nostic criteria for schizophrenia. Prevalence can be measured
at a single point in time (point prevalence) or over a given time
Relative risk ¼ EER period (e.g. 1-month prevalence). Incidence and prevalence
CER are related to each other though duration. When a disease is
chronic, prevalence will be high relative to incidence. When a
Alternatively one could also say that the risk of relapse is disease is acute and short-lived, the incidence may be high,
reduced by 60% in the experimental group relative to the con- but because few cases persist, prevalence will remain low.
trol group. This is known as the relative risk reduction (RRR).
Using data on incidence epidemiologists can investigate can-
RRR ¼ 1 À RR, or RRR ¼ CER À EER didate risk factors. For example, when two populations exist,
CER one of which is exposed to an agent (e.g. sheep dip) while
the other is not, the incidence of a disease (e.g. depression)
Given the odds of relapse are 0.25 in the experimental group can be measured in both groups. If the incidence of depression
and 1 in the control group, their ratio (the odds ratio) is 0.25 is greater in the exposed group, this suggests that sheep dip
divided by 1, or 0.25. may cause depression. It would, however, be unusual for the

Odds ratio ¼ Odds of the event in the experimental group
Odds of the event in the control group

Another way of summarising the difference between groups is
to subtract the individual estimates of the probability of

174

Research methods, statistics and evidence-based practice CHAPTER 9

results of an epidemiological study to be as simple as this. Median
Frequently two populations may differ on a number of other
variables which may be associated with both the disease and The median, or middle value, can be calculated by ranking the
exposure, in which case the study is said to be confounded. numbers from smallest to largest and taking the middle one.
Quite often, the measurement of the disease frequency itself If there is an even number of values, the median is calculated
may be biased by a number of factors. Nonetheless, subject by taking the arithmetic mean of the two middle values.
to these considerations, the ratio of the incidence in the Ranking our data:
exposed group to the incidence in the unexposed group can
be referred to as the relative risk. Similarly, the subtraction 80, 98, 100, 100, 120, 122, 132, 138, 140, 160
of one risk from the other is referred to as the absolute risk
(or risk difference) and has similar properties to the absolute the median is 121.
risk reduction mentioned earlier.
Mode
The size of a relative risk or odds ratio is not, however, the
only determinant of the importance of a putative risk factor or The mode is the easiest measure of central tendency to
the incidence or prevalence of a disease. Analogously to the calculate as it is simply the most common value. As all of the
EER of 0.02/0.2 example above, a rare risk factor with a rela- numbers in our dataset occur once, with the exception of
tive risk of 10 will be less important at a population level than 100 which occurs twice, the mode is 100.
a common risk factor with a relative risk of 2 (think, for exam-
ple, about family history and urban upbringing/season of birth The relative benefits of the mean, median and mode are not
in schizophrenia). The concept of population attributable risk apparent until one considers different datasets and the effect
quantifies this relationship where, assuming causality, if an of extreme values on each estimate. For example, if socioeco-
attributable risk is 10%, then removing that risk factor from nomic status were measured in 100 individuals on an ordinal
the general population could avoid 10% of cases. scale of 1 to 6, the mean social class would be somewhat
meaningless, as the differences between each point are not
Descriptive statistics equal. The median would give a much more meaningful esti-
mate of central tendency. The mean also gives a poor estimate
It is difficult to make sense of a particular dataset without of central tendency when there is skew in a distribution (i.e.
summarising its main characteristics in a meaningful way. In when there are a disproportionately large amount of either
particular we often wish to know of what constitutes a typical small values or large values). These situations are illustrated
value and the spread or distribution of other values around graphically in Figs 9.4 to 9.6. In situations where the distribu-
that number. There are several numerical and graphical meth- tion is symmetrical (e.g. normal distribution, t-distribution),
ods of summarising datasets. We will consider the numerical the mean, median and mode all take the same value and are
methods first. equally good measures of central tendency. In skewed data,
however, these statistics diverge.

Measures of dispersion

Measures of central tendency or location As well as describing the central location of a dataset, a state-
ment of the data spread or dispersion is also helpful. There are
The average (or measure of central tendency or location) is a several ways of doing this, and the best method often depends
general term for the typical value from a distribution and can on the type of data and its distribution. The simplest method
be measured in a number of ways. First of all, let us consider of describing dispersion is simply to give the range of values
that we have measured the height of 10 people in the street from smallest to largest. This measure is called the range and
and we have obtained the following values (cm): can be usefully represented on a ‘box and whisker plot’

100, 120, 98, 132, 80, 140, 160, 138, 122, 108

Mean

The mean is simply the sum of all the values divided by the
number of values. In our example this equals:

ð100 þ 120 þ 98 þ 132 þ 80 þ 140 þ 160 þ 138 þ 122
þ100Þ=10 ¼ 119 cm

Algebraically,

mean ðxÞ ¼ Px
n
where Px is used to denote the sum of all observations, and
Mean, median and mode
n is the total frequency or number of values. Fig. 9.4 Symmetrical distribution.

175

Companion to Psychiatric Studies

skewness but lacks the many useful mathematical properties
of other methods.

A more popular method of determining spread is to mea-
sure how much each value differs from the mean and divide
it by the number of values in the distribution. If we were to

sum all of the differences we would, however, arrive at a value
of 0 for every data set. Therefore the sign (þ or À) of each
deviation is ignored. Each deviation from the mean is then
added together for every value and then divided by the num-
ber of observations. This gives the mean deviation. Another

method of overcoming the problem of differences summing
to 0 is to square each deviation from the mean first. This value

Mode Median Mean is always positive and is sometimes referred to as the sum of
Fig. 9.5 Positively skewed distribution. squares. This figure is then divided by the total number of
values minus 1 to give the mean square about the mean. This
measure has the more familiar name of the variance.
Algebraically,

Variance Pðx À x 2

¼ Þ

nÀ1

The standard deviation (SD) is simply the square root of

Median Mean Mode this value. The reason for dividing the squared deviations by
n À 1 rather than n is that, given that you know the mean,
Fig. 9.6 Negatively skewed distribution. (The differences between the number of independent ways a data set can vary is always
the mean, median and mode in this figure have been exaggerated for one less than the number of values. To illustrate, if a distribu-
illustrative purposes.) tion of numbers had only one value, any measure of dispersion
would be meaningless. Where a distribution has two values, it
(Fig. 9.7). Another method, less influenced by extreme values, is possible to calculate how much each one varies around the
is to rank the distribution from smallest to largest value and mean, but dividing by n (2) would give a very misleadingly
divide the distribution into equal parts called quantiles. The small estimate of the population variance from which the
most commonly used methods divide the distribution up into sample was drawn. n À 1 is sometimes called the degrees of
4 quarters using 3 quartiles, or into 100 parts using 99 percen- freedom. When a distribution is skewed, the variance or stan-
tiles. By stating the difference from the first quartile (Q1) to dard deviation may mislead, as they are based on the devia-
the third quartile (Q3) we obtain the interquartile range tions around the mean. When the mean is a poor estimate of
(Q3–Q1) or IQR. Occasionally this range is further divided central tendency, the variance and standard deviation will also
by 2 to obtain the semi-interquartile range. It is worth noting be poor estimates of dispersion.
also that the second quartile and 50th percentile will always
be equal to the median. This method is more robust to Table 9.4 summarises descriptive statistics appropriate for
various types of data.

Descriptive statistics are usually quoted in published papers
and can sometimes show that data are not normally
distributed. This is particularly important if the investigators

have gone on to use parametric statistics when they should
have either transformed the data (e.g. by taking the log or

50 square root of each value) or used a distribution-free (i.e.
non-parametric) method. The simplest method of evaluating
40 a data set for skewness is to look at the mean value and the
range. If the range is asymmetrical about the mean, it is likely,
although not certain, that the variable is skewed. For example,

Range
30

20 Median Interquartile Table 9.4 Descriptive statistics for various types of data
range

10 Nominal Ordinal Interval or ratio

(i.e. ranked)

0 Central tendency Proportion Median Mean
Scale 1 Scale 2 Scale 3 Variance or SD
Dispersion NA Interquartile
Fig. 9.7 Box-and-whisker plots of negatively skewed, positively (spread) range
skewed and normally distributed data.

176

Research methods, statistics and evidence-based practice CHAPTER 9

if a variable has a mean of 5 and a range of 2–40 it is likely Table 9.5 Hypothesis testing and statistical power
that the distribution is positively skewed, although the range
represents the difference or interval between the largest and Population
smallest values which are by definition somewhat atypical.
An alternative method is to examine the first and third quar- Sample study There is a No difference
tile, as these will also be symmetrical about the mean or finding difference
median if the distribution is normal. A final quantitative
method for detecting skewness is by examining the mean and There is a difference Correct (true Incorrect (false positive)
standard deviation. Where a variable is normally distributed, (reject H0) positive) Type I error p ¼ a
the mean plus or minus two standard deviations will contain
95% (approximately) of the values. It follows therefore that There is no difference Incorrect (false Correct (true negative)
if the mean is less than the value of the standard deviation (accept H0) negative)
then the data are likely to be skewed. These methods can be Insufficient
useful for showing that data are skewed, although if they do power
not suggest skewness, one cannot assume that they are nor- Type II error
mally distributed. Overall, perhaps the best method to detect p ¼ (1 – b)
skew is to examine the data visually (e.g. with a histogram or
box-and-whisker plot; see Fig. 9.7). Other methods, such as required to reject the null hypothesis, we need to set the level
normal probability plots can also be very useful, although they of statistical significance we require (usually p ¼ 0.05) and the
are beyond the scope of this chapter. Interested readers should likely size of the treatment effect. If one requires a very high
consult Altman (1991) for further information. level of statistical significance (e.g. p <0.01) one needs to
recruit a far greater number of participants. Similarly, if com-
Inferential statistics – introduction paratively small effects are envisaged, a much larger number of
participants will be required than if the effect is very large.
In the previous section we considered the various ways in The relationship between sample size, significance and statisti-
which data may be described. In many circumstances, how- cal power can be represented in the form of a graph (Fig. 9.8).
ever, one wishes to know whether two or more groups of mea- Power is given on the right-hand y-axis, and the p-values of
surements are different, or to be more precise, whether the 0.01 or 0.05 are represented by the two diagonal lines. The
difference is likely to be true or likely to have arisen by chance. left-hand y-axis is labelled ‘standardised difference’ and is a
measure of the size of the anticipated effect. By setting a ruler
Conventionally, when we conduct a statistical test we are on the expected effect and running it to the other side for the
testing the statement, or null hypothesis (H0), that there is required study power, one can read off the number of study
no difference between two or more groups. When we conduct participants required to detect this result at p ¼ 0.05 or 0.01.
a test for association (correlation or regression analysis) we test
for no association. Any p-value from a statistical test is simply 0.0 0.995
the probability that the difference or association between vari-
ables is due to chance (i.e. a false positive). The arbitrary 0.1 10600040003000200010041000080006005000 0.99
threshold for statistical significance is p ¼ 0.05 (5/100 or 0.2
1 in 20), although the threshold can be set at any value, some- 0.98
times called alpha. When a statistical test is significant at 0.97
p 0.05 we reject the null hypothesis that there is no differ- 0.96
ence or association. If one finds no association or difference, 0.95
then the null hypothesis cannot be rejected. Note that this is
not the same as showing no association or difference (a true 0.3 400 300242001061041021000n
negative) as we may find no apparent difference when our
study is not large enough or measurement is too imprecise 0.4 80706050 0.90
(insufficient statistical power). This often happens when the 0.5Standardised difference
spread of values is very large or when the effect one is trying 0.6 Power 0.85
to detect is very small. The probability that a study will not 0.80
find a difference should one exist (false negative, or type II 0.75
error) is sometimes called beta. 1 – b is the probability that 0.70
we will find a significant difference should one exist and is 0.65
sometimes referred to as the power of a study. The above
concepts are demonstrated in Table 9.5. 40 0.60

The power of a study should always be considered in 0.7 30 0.55
advance to ensure that a study is going to be large enough to 2420 0.50
reliably address a given research question. Underpowered 16141210 0.45
studies are both unethical and a waste of effort. In order to
calculate the power of a study and the number of people 0.8 0.40
0.35
0.9 8 0.30
0.05
0.25

0.20

1.0 0.15
0.01

1.1 Significance 0.10
level

1.2 0.05

Fig. 9.8 Nomogram for the calculation of sample size. (From Gore &

Altman 1982. Reproduced with permission of D G Altman and Wiley-Blackwell.)

177

Companion to Psychiatric Studies

There are a number of methods for calculating the standar- The Bonferroni adjustment is perhaps the most commonly
dised difference. When there are two groups of patients and used of many methods for correcting significance tests.
the mean value and standard deviations in each group are
known, then the standardised difference is given by: False positive results occur (by definition) with a frequency
of around 5% when the significance level is set to p ¼ 0.05.
Standardised difference ¼ Pooled X1 À X2 When we conduct one significance test and find it to be signif-
standard deviation icant, we are in fact saying that ‘the probability of finding this
result, or one that is more extreme, is less than one in twenty
The means and standard deviations of each group will not be if the null hypothesis is true’. Therefore, if two independent
known with total accuracy (otherwise there would usually tests are conducted simultaneously and the null hypo-
be no point in doing the study), but they can often be thesis is true, the chance that one of them will be significant
estimated from existing data. Cohen’s d, Hedge’s g and is 1 – (the probability that they are both non-significant) ¼
Glass’s D are related measures of standardised difference (or 1 – (0.95)2 or 0.1 (approximately). The probability of a false
effect size) and are sometimes quoted in meta-analyses. For positive increases with each independent test and approxi-
further information regarding the standardised difference, mates to the following relationship:
the interested reader should consult Egger et al (2001).
Significance levelðaÞ
In addition to the nomogram given in Fig. 9.8, there are a
number of more accurate and reliable methods of determining The probability that one or more ðusually 0:05Þ
sample size. Further details are available in Pocock (1983) and
in the statistical software Epi-info, G-Power, STATA and SAS. significance tests will be positive ¼ Â

when the null hypothesis is true Number of individual

significance tests

One-tailed and two-tailed tests There are a number of approaches to this problem. The most
of significance conservative approach is the Bonferroni method which cor-
rects for multiple significance testing by setting a higher
In the above examples we have referred to the probability of threshold for statistical significance. The result of the cor-
rejecting the null hypothesis as being alpha or the p-value. rection is that the significance level for each individual test
In rejecting the null hypothesis one is usually inferring that (pcorrected) when multiplied by the number of individual tests
two or more groups are unequal in terms of a dependent vari- will be 0.05. By rearranging the above formula, we can see that
able (by dependent variable we mean the variable that is being if we conduct n independent tests and we want an overall
compared between the groups), but that the difference might significance level of 0.05, then the significance of individual
lie in either direction (e.g. group A > B or vice versa). Either findings will need to be (approximately):
difference is likely to be of some interest. For example, in a
trial of olanzapine versus risperidone we would be interested pcorrected ¼ Number of 0:05 tests
to know whether olanzapine was better than risperidone, but significance
we would be equally interested in the reverse. Occasionally,
researchers have preconceptions about the direction of the dif- Therefore, if we test two hypotheses, then the level of statis-
ference and only test for a significant difference in one direc- tical significance required for each comparison will be approx-
tion. Therefore, the null hypothesis becomes H0: ‘olanzapine imately 0.025.
is not better than risperidone’ instead of H0: ‘there is no dif-
ference’. In order to test whether there is a significant differ- Criticisms have been made of the Bonferroni method for
ence in one direction only, a one-tailed test (so called because several reasons (Perneger 1998). First, multiple hypothesis
only one tail of the distributions or differences is examined) testing in a study is frequently done on variables which are not
is performed. If the investigator wishes to test for a difference really independent. For example, in a trial of chlorpromazine
in either direction, a two-tailed test should be conducted. versus placebo which measured clinicians’ global impression
However, one-tailed tests are almost never appropriate, as and patients’ scores on the PANSS, it would be unreasonable
we are almost always interested in a difference in either direc- to consider these two significance tests as independent from
tion (Bland & Altman 1994). The effect of using a one-tailed one another. Therefore, a Bonferroni adjustment may be too
test is to increase (i.e. double) the chance of finding a sig- stringent. Secondly, the Bonferroni adjustment is concerned
nificant result, and this can lead to spurious positive findings with the null hypothesis that all null hypotheses are true simul-
(type I error). taneously, a situation which is rarely of interest to researchers.
Finally, using the Bonferroni adjustment to reduce the probabil-
Multiple significance testing ity of false positives (type I error) will increase the probability
of a false negative (type II error).
Many published studies describe several significance tests. If
many statistical tests are conducted, around one in twenty will A similar problem arises in post hoc testing where research-
show a significant result at the p <0.05 level by chance alone. ers wish to look for significant findings when they did not set
In an attempt to correct for the chance finding of a positive out to do this from the beginning (a priori). Such a post hoc
result, multiple significance tests are sometimes ‘corrected’. analysis can often be very useful, as previously unrecognised
relationships can be explored and the need to conduct a further
study can be determined. Secondly, if we conduct a significance
test using three groups, we can test the hypothesis that the
group means are different, but the significance test itself cannot
tell us where the significant difference lies. However, the more

178

Research methods, statistics and evidence-based practice CHAPTER 9

statistical tests one conducts, the greater the chance of finding Study Risk ratio % Weight
a false positive result, and this must be borne in mind when (95% Cl)
conducting any analysis. There are in fact many statistical Owens 1975
approaches to this difficulty. The Scheff´e and Tukey’s Johnstone 1982 0.53 (0.11, 2.50) 5.2
Honestly Significant Difference are commonly used, but many Lawrie 1991 0.70 (0.46, 1.07) 33.5
others are available (SPSS lists more than 20). The details of McIntosh 1996
each test are too complex to cover here but should be checked Sharpe 2002 0.63 (0.34, 1.15) 21.5
before their use as some tests make underlying assumptions
about the nature of the underlying variables. 0.62 (0.36, 1.06) 26.5

0.80 (0.38, 1.69) 13.4

Confidence intervals Overall (95% Cl) 0.67 (0.51, 0.88)

P-values tell the investigator how likely it is that the difference 0.1 1 10
found in a study is due to chance. Studies often quote very
small p-values in the hope that this demonstrates the certainty Risk ratio
of their result. However, the p-value takes no account of the
precision of the estimate and the likely range of plausible Fig. 9.9 Forest plot of several simulated trials showing summary
values that the value might take. For example, a treatment estimate and 95% confidence intervals.
might be better than placebo at improving scores on the
Hamilton Depression Rating Scale and be significant at p ¼ perform parametric statistical tests which have greater power
0.01 but have a wide range of possible effect sizes, some of than their non-parametric equivalents. Even if raw data in
which may not be clinically significant. one study are not normally distributed, the sample means in
several studies generally are (this is sometimes called the cen-
Confidence intervals are the range of plausible values that a tral limit theorem). For example, if we drew repeated samples
variable may take in the ‘real world’ or population as a whole. of 40 people for a clinical study, the point estimates of each
For example, if a clinical trial showed that chlorpromazine was variable obtained from each sample would have a normal dis-
more effective than placebo for preventing relapse with a rela- tribution with a mean equal to the true treatment effect in
tive risk of 2.3 (with a 95% confidence interval of 1.2 to 3.5), the population. The spread of the sample mean in this special
we could be 95% certain that the true treatment effect lay case is also known as the standard error. The standard error is
between 1.2 and 3.5. The result is also statistically significant, frequently misunderstood, as it is easy to confuse it with the
as the confidence interval does not overlap 1 (the point of no standard deviation. In fact the standard error is the standard
effect or equal risk) though the range of possible risk ratios deviation of the sample mean. It may help to think of the sample
stretches from 3.5 (potentially very clinically important) to error as a ‘unit of uncertainty’. If we conduct a small study we
1.2 (somewhat less clinically impressive). The result could also can expect the standard error to be large. As we increase the
be said to be somewhat imprecise. By increasing the number of sample size the standard error will reduce as the study provides
people in the study we could obtain a more precise estimate of a more precise estimate of effect. Confidence intervals are
the treatment effect. Alternatively we could combine several constructed from the standard error and are equal to the point
studies together in a meta-analysis producing a summary estimate from the study plus or minus 1.96 Â standard error.
measure of treatment effect with an improved precision (and
hence narrower confidence interval). This can be shown in Sometimes other distributions are useful in medicine and
the form of a Forrest Plot (Fig. 9.9). Each square represents psychiatry. Where events occur with a certain frequency over
the study size, its midpoint represents the effect size found time or space (e.g. telephone calls to a switchboard or radio-
in the individual study, and the horizontal bar represents the active emissions) the variables frequently follow a Poisson
95% confidence interval around the estimate. Studies which distribution. When the result of a study is a proportion, the
have larger sample sizes give rise to smaller confidence variables usually follow a binomial distribution. When a contin-
intervals. None of the trials shown in the Forrest Plot is sig- uous variable is measured and its difference estimated between
nificant in its own right but when combined give a more pre- two groups (e.g. brain volume in schizophrenia vs. controls)
cise and statistically significant result. Further discussion of the mean difference follows a t-distribution. In most cases,
meta-analysis statistics are given later in this chapter. however, as the sample size increases, the binomial, Poisson
and t-distributions approximate to the normal distribution.

Statistical distributions Parametric and non-parametric statistics

Many variables in medicine follow a known distribution. For When we complete a study we usually want to do one of two
example, height and IQ follow an approximately normal distri- things:
bution which is more or less symmetrical around a central • test the hypothesis that some measurement is different
mean. In a normal distribution it is also possible to say that
95% of possible values are between the mean minus 1.96 Â between two or more groups of people; and
the standard deviation and the mean plus 1.96 Â the standard • test the hypothesis that there is a relationship between two
deviation. Normally distributed variables also enable us to
or more variables.

179

Companion to Psychiatric Studies

In order to test these hypotheses we have to re-frame each between the groups. In order to do this, we multiply the row
question as a null hypothesis (H0: there is no difference or
association), decide which p-value would lead us to reject this total by the column total for each cell and divide it by the total
hypothesis and then conduct the appropriate statistical test.
sample size, although in this case it is intuitive that the
The correct choice of statistical test will depend on certain
parametric assumptions: expected values will each be 30. Once we have the observed

• continuous data or at least interval discrete data; and expected values from each cell, we can calculate the value
• each unit of data collection should be independent of any
of 2 using the formula:
other;
• data should be normally distributed; and w
• the variance of each group should be approximately equal.
2 test
Where data meet these criteria we can use relatively powerful
parametric statistical tests which use all of the data values, but x
where these criteria are not met we may need to use non-
parametric statistics (distribution free tests) which require the • Is distribution free
conversion of raw values into ranks before analysis (Siegel 1988).
• Compares expected with observed frequencies
Parametric statistics are relatively robust to minor departures
from a normal distribution, contrary to the common mispercep- • Should be modified when cell values are small
tion that normality is the greatest underlying requirement.
Where data are not normally distributed, it is often possible to • The value of 2 and the degrees of freedom should always be
transform them to normal distribution by taking the log or squar-
ing each value. As a large number of transformations are possible, w
the interested reader should consult the suggestions for further
reading at the end of this chapter for further information. stated

Testing for differences w12df ¼ X ðO À EÞ2 ¼ 1:2 ðp ¼ 0:27Þ
E

Because the probability of falsely rejecting the null hypothesis

is 0.27, we cannot assume a difference in the population from

which the sample was drawn. In practice w2, and the resulting

p-value, are almost always calculated by computer.

Further refinements to the 2 test need to be made when

w

the cell values are very small. When any expected frequency

falls below 5, Yates’s continuity correction or Fisher’s exact

test should be used. Usually these tests produce more conser-

vative results and are less likely to lead to a false rejection of

Chi-squared (w2) test the null hypothesis when cell values are small. The 2 test

w

(and its modifications) should be used where you want to test

The 2 test is one of the most important statistical tests. It is for a difference in the proportions between two or more groups.

w The values must take the form of absolute frequencies or counts

one of the most commonly quoted tests in published papers, and not percentages, which falsely inflate sample sizes.

and other more complicated analyses can be derived from it.

The w2 test is a non-parametric test which is most com-

monly used to test whether the proportion of people with or The t-test

without a certain characteristic differs between two or more The t-test is one of the most common statistics quoted in
medical research. There are two common uses of the t distribu-
independent groups. For example, the proportion of people tion. The two-sample t-test is used to examine differences
in the means between two populations, provided there are
improving on a certain drug, or the numbers of people of male independent samples from each, when the data are continuous
or at least interval and approximately normally distributed
gender in a group of healthy controls compared with patients with equal variances. Data from the same group measured on
two separate occasions or from two different groups, where
with schizophrenia. When conducting a 2 analysis, it can each individual member of one group is matched on key char-
acteristics with an individual member of the other group,
w should be analysed by considering differences in scores
between occasions, or between members of each pair. In such
be helpful to represent the data in the form of a 2 Â 2 table. cases the paired t-test is applicable. You can also use a one-
sample t-test to compare a sample mean with a known pop-
Consider the data, shown in Table 9.6, from a study compar- ulation mean; for example, you could compare the mean age
of patients with dementia with the mean age of all elderly
ing the sexes of people with schizophrenia and healthy con- patients in a general hospital.

trols. The proportion of male subjects in both groups differs If your sample size is large, you can safely use the t-test
even if some of the underlying assumptions are violated.
slightly; the null hypothesis is ‘that there is no difference in The t-test is a parametric test which is relatively robust to
departures from the usual assumptions underlying parametric
the proportions in the population from which this sample

was drawn’. In order to test the hypothesis we need to calcu-

late the expected table values if there were no true difference

Table 9.6 Example of subject group by gender

Male Female Total
60
Schizophrenia 27 33 60
Healthy controls 33 27 120
Total 60 60

180

Research methods, statistics and evidence-based practice CHAPTER 9

statistics. However, there are modifications to the two-sample Table 9.7 Ages of three groups of four patients
t-test that can be used if the assumption of equality of variance
is violated. Ages Mean Variance
66.9
The test statistic involves the calculation of the following: 77.7
71.6
Observed difference in means Depressed 47, 52, 58, 66 55.75
Standard error of the observed difference 32.5
t ¼ Bipolar 25, 28, 32, 45 28.75

with n1 þ n2 – 2 degrees of freedom (where n1 þ n2 is Schizophrenic 18, 27, 32, 28
the total number of people in both samples combined).
This tests the null hypothesis that there is no difference in Mean of whole sample ¼ 39, variance ¼ 214.5
the means of the two populations from which the samples
were drawn. t becomes larger (and is more likely to be signif- Analysis of variance (ANOVA)
icant) as the difference in means increases, or when the stan-
dard error decreases (e.g. when the sample size increases). All of the previous tests have concerned two groups of obser-
The significance of t can be obtained from tables, although vations. When there are three or more groups, one tests the
it is usually calculated within the various statistical packages null hypothesis that there is no difference in the group means
available. by examining the variances.

t-test Consider the data, shown in Table 9.7, from three groups of
four patients. There are several sources of variation in this
• Is a parametric statistical test sample. First, there is the total variation of the whole sample.
• Compares the mean values of two independent groups To calculate this value one calculates the sample variance,
• Is relatively robust to departures from parametric assumptions ignoring the group to which each measurement belongs. This
• The value of t and the degrees of freedom should always be is sometimes called the total mean squares (MStotal) or total
variance. Second, there is the variation of the group means
stated about a grand mean of all observations. This is sometimes
called the between-groups variance (MSbetween) or sometimes
Mann–Whitney U-test MStreat. Finally, there is the variation between the group
measurements and their individual group means. This is some-
The Mann–Whitney U-test can be used when the aim is to times called the within-groups mean squares or residual mean
show a difference between two groups in the value of an ordi- squares (MSresidual) or within-groups variance. The relationship
nal, interval or ratio variable. It is the non-parametric version between these sources of variation is:
of the t-test, which can be used for interval, ratio or continu-
ous data unless there are large departures from the parametric SS
assumptions. The process of calculating the test statistic is SStotal¼ SSbetweenþSSresidual MS ¼ df
very simple but would use many lines of text to demonstrate
here. The interested reader should refer to Bland (2000) for If each of the groups is drawn from the same population with
a clear and concise account of its derivation. It is worth noting equal population means, the between-groups variance will
that the test can detect differences in the spread as well as the be comparable to the within-groups variation (MSresidual).
location (median) of two variables, even when the medians are Alternatively, if the three groups have different population
very similar (Hart 2001). Therefore, when presenting the means, the between-groups variation will be large compared
results of Mann–Whitney tests, the median of each group with the within-groups variance. In order to test which one
should be presented along with a description of the skewness of these situations is more likely, we use the test statistic F.
of each sample (e.g. with a box plot). The Mann–Whitney test
also assumes that the two groups are independent. Where the F ¼ Variation between samples ¼ MSbetween
measurements are paired (i.e. are two measurements from the Variation within samples MSresidual
same individual), the Wilcoxon matched-pairs test should
be used instead. In the example above the total variance is calculated from the
whole sample (as if they were not in groups). The between
Mann–Whitney U-test groups sum of squares is the sum of squared deviations
between the group means and the overall mean multiplied by
• Is distribution free the number of observations in each group. The residual sum
• Is based on ranked values of squares is calculated usually by subtraction. Mean squares
• Is used to compare two independent groups are the sum of squares (SS) divided by the appropriate degrees
• The value of U and the degrees of freedom should always be stated of freedom.

Most statistical packages when performing an ANOVA will
produce an output similar to that shown in Table 9.8. The
sum of squares is just the sum of the squared difference
between each value and its corresponding mean. By dividing
by the degrees of freedom, we can calculate the within-groups
(MSresidual), between-groups (MSbetween) and total variance.

181

Companion to Psychiatric Studies

Table 9.8 Analysis of variance table Table 9.9 Summary of tests for differences

Sum of Degrees Mean F Sig. Two groups Categorical Ordinal Interval or
squares (ranked) continuous
of freedom square

Between 1711.500 2 855.75 11.9 0.003 Unpaired Chi-squared Mann–Whitney Independent t-test
groups
test U-test

Within 648.500 9 72.1 Paired McNemar Wilcoxon Paired t-test
test matched pairs
groups

Total 2360.000 11 214.5 Three or more
groups

Unordered & Chi-squared Kruskal–Wallis Analysis of variance
unpaired test test (ANOVA)

In the above example, F ¼ 11.9, which is significant, being Paired Cochrane Q Freidman test Repeated measures
less than 0.05. We can therefore reject the null hypothesis that test ANOVA
there is no difference between the groups.
See Altman (1991) or Swinscow & Campbell (1996) for further details.
ANOVA can also be extended to the analysis of data which
can be classified in a number of ways. For example, in an age. In order to take into account these factors we can either
observational study measuring memory performance scores, use a regression analysis (see later) or use ANCOVA, where
patients may be classified by diagnosis, sex and treatment. IQ or age or both would be covariates.
If one wished to compare memory score by diagnosis, a one-
way ANOVA (as above) could be conducted. However, if Table 9.9 summarises tests for differences, for various data
gender and treatment also affected memory score, the differ- types.
ence might not be due solely to the effect of diagnosis alone.
To avoid this potential pitfall, a factorial ANOVA can include Testing for association
any number of the factors in a single experiment. The resulting
analysis could give the effect of each factor independently, Testing for an association between two variables is a common
but can also provide information about interactions between analysis in medical statistics and one that it sometimes mis-
factors. For example, a factorial ANOVA could detect that used. Often such analyses are undertaken in the hope that
memory scores may be impaired in males with schizophrenia one variable causes a change in another, but the direction of
but not females, whereas a one-way ANOVA might fail to effect cannot be inferred solely from the results of the analysis
detect any differences. (i.e. association is not causation). Two related techniques are
available: correlation and regression. Correlating two variables
ANOVA is the simplest form of analysis and looks for a linear asso-
ciation between two variables (e.g. whole brain volume and
• Is a parametric statistical test IQ, or age and MMSE score) and can be conducted by a vari-
• Tests the null hypothesis that the mean values of three or more ety of parametric and distribution-free methods. Regression
involves many of the underlying principles of correlation and
independent groups are equal is often extended to take account of several variables
• The test statistic F is the ratio of the between-groups to within- simultaneously.

groups variance Correlation
• The value of F and the two degrees of freedom should always be

stated
• ANOVA has a non-parametric equivalent called the Kruskal–

Wallis test

An analysis of variance can be extended to include paired Consider the sample data, given in Table 9.10, from patients in
values from the same samples when it is called a repeated an inpatient ward in whom performance IQ and duration of
measures ANOVA. Where data can be classified in several psychosis in months were measured. If we plot a graph of
ways (e.g. by group and gender) the appropriate statistical test these values (Fig. 9.10) we can see that they appear to be
is the factorial ANOVA. An analysis of covariance (ANCOVA) related to one another.
is used when we wish to see if the mean of a variable differs
across three or more groups, while taking into account a possi- In order to show a relationship we need to test the null
ble confounder. If, for example, one examined cognition in the hypothesis that there is no association between the two vari-
three groups of subjects above, their performance may be con- ables. If the parametric assumptions are met, we can calculate
founded by their premorbid general intellectual ability or their Pearson’s product–moment correlation coefficient. If these
assumptions are not met we can use Spearman’s rank correla-
tion coefficient, which is the corresponding distribution-free
test. We will here calculate both for the same data set.

182

Research methods, statistics and evidence-based practice CHAPTER 9

Table 9.10 Sample data on duration of psychosis and performance IQ

Duration of psychosis (months) 20 14 17 24 49 120 80 63 34 70

Performance IQ 140 139 150 115 75 71 102 99 120 140

Performance IQ 150.00 Pearson’s correlation coefficient
125.00
• Is a parametric statistical test
• Measures the observed association between two variables
• Its significance or confidence interval should always be stated
• Is derived using the method of least squares

100.00 The simple correlation examples shown above can be extended
to the situation where there is a third variable. In our example,
75.00 this might be age or premorbid IQ. It is possible to calculate a
25.00 50.00 75.00 100.00 partial correlation coefficient to take account of this
Duration of psychosis (months) confounder.

Fig. 9.10 Scatter plot of performance IQ against duration of Table 9.11 summarises statistical tests for association.
psychosis, with line of best fit shown.

Pearson’s correlation coefficient (r) can take any value from Spearman’s correlation coefficient
À1 to þ1. A correlation coefficient of 1 would indicate perfect
positive correlation (both values rise together) whereas a cor- • Is a distribution-free test
relation coefficient of À1 indicates perfect negative correla- • Converts raw values to ranks before measuring their association
tion. A correlation coefficient of 0 suggests that there is no • Tends to inflate the strength of the association when there are
relationship between two variables. Pearson’s correlation coef-
ficient is calculated using the method of least squares which many tied values, in which case other tests may be more
tries to minimise the differences between each data point appropriate
and a line of best fit. The line of best fit is shown on the graph
above. For the dataset shown above, the correlation coefficient Regression analysis
(the slope of this line) is À0.69 and the significance of the
result is p ¼ 0.026. This shows that duration of psychosis Regression analysis is the study of relationships between two
and performance IQ have a moderate to strong negative rela- or more variables and is usually conducted for the following
tionship with each other and that the relationship is significant reasons:
at p <0.05. In other words, as the duration of psychosis goes • when we want to know whether any relationship between
up, the performance IQ goes down, and the null hypothesis
(that the correlation coefficient is zero and there is no relation- two or more variables actually exists;
ship) can be rejected. • when we are interested in understanding the nature of the

Spearman’s rank correlation coefficient (r) can be calcu- relationship between two or more variables; and
lated using the same dataset and is not dependent on a normal • when we want to predict a variable given the value of
distribution of values. The value will lie between À1 and þ1
and its interpretation is similar to that of Pearson’s coefficient. others.
In this case Spearman’s correlation coefficient is À0.64, p ¼
0.044. The result is still significant, although slightly less so Table 9.11 Statistical tests for association
than before. This reflects the fact that distribution-free tests
tend to have less power to detect associations or differences Data Test
than parametric tests and yield more conservative estimates
if data is normally distributed. Spearman’s coefficient, how- Categorical data Kappa
ever, tends to exaggerate the association between variables Ranked data Spearman’s rank correlation coefficient
when there are many tied values, in which case other measures Continuous or interval data Pearson’s correlation coefficient
may be more appropriate.

183

Companion to Psychiatric Studies

In its simplest form regression analysis is very similar to whether there is an interaction between duration of psychosis
correlation; in fact the underlying mathematical models are and IQ or whether the addition of IQ to our model is better
virtually identical. Regression analysis can, however, be used than having duration of psychosis as the only predictor vari-
where there are many explanatory variables and where various able. To test the first hypothesis, that there is an interaction
data types are used together. The general regression model is: between IQ and duration of psychosis, we would need to
expand our model to include an interaction term.
Y ¼ a þ bX1 þ cX2 þ . . . þ error
If we wanted to see which model is best (in terms of how
Where a is a constant, X1, X2, etc. are the predictor variables, much variance or R2 is explained overall) we need to either
and the error term is the difference between the observed and add or take away predictor terms to see which model fits the
predicted value of g. A practical example of the above equa- data best. Most statistical packages have a variety of methods
tion using the performance IQ data might take the following for doing this. The most common methods are called forward
form: entry, backward entry and stepwise. Forward entry is a method
of regression analysis whereby the predictor variable most sig-
IQ ¼ 149 À 0:57 Â duration of psychosis nificantly associated with the dependent variable is included in
the model first, and if other predictor variables are also signifi-
The error term is omitted here and is assumed to have a mean cantly associated with the dependent variable, they are
of 0. The distances between each data point and the line of entered into the model. Backward entry regression enters all
best fit summarising their relationship are called the residuals. of the terms into the regression equation first and removes
These are the differences between the observed and predicted successive terms if they do not predict the dependent variable.
values and are a measure of the unexplained variation. The Stepwise regression is a combination of forward and backward
model can be extended to more complicated examples, e.g. entry methods.
brain volume using the variables diagnosis, height and IQ.
The equation might take the following form: The table in the regression analysis was titled ANOVA as
regression and ANOVA use virtually identical underlying mod-
Total brain volume ¼ 10 Â diagnosis þ 0:03 Â height els. For instance, one could conduct a regression analysis where
þ IQ=20 IQ was the dependent variable and duration of psychosis was
the predictor. If we had done that we would have arrived at
Diagnosis is a categorical variable, and therefore it makes no the same answer as an ANOVA.
sense to allocate a number to each diagnostic category as there
is no order in the categories. Therefore we have to include a There are, however, limitations to multiple regression. For
number of ‘dummy variables’ each one indicating the pre- example, as we enter more terms into our regression analysis,
sence or absence of a diagnosis. The example above would be it becomes more and more difficult to interpret the results.
a suitable model when only one diagnosis is considered, as In such cases clear descriptive statistics become invaluable.
the variable diagnosis will only have to take values of 1 or 0. Further, in the above example we have only dealt with a situa-
tion in which the dependent variable is at least interval, ratio
If we are interested in a potential interaction between two or continuous. When our dependent variable is an outcome
variables (e.g. we might think that IQ is related to brain vol- (e.g. dead or alive) then we need to use a closely related tech-
ume in healthy controls but not in people with schizophrenia, nique called logistic regression. Other more complex models
say) we can examine these by including the diagnosis  IQ are available but are beyond the scope of this chapter (see
interaction as another explanatory variable in the regression Altman 1991 for more details).
equation. If we had further information about IQ we might
want to include this in our regression analysis. The printout Linear regression
from the statistical software might look like that in Table 9.12.
The table is labelled ANOVA and it shows the mean squares • Is a parametric statistical test
about the regression model (similar to the between-groups • Tests the null hypothesis that there is no relationship between a
variance) the residual mean squares (unexplained variance),
their ratio F and its significance. What it does not tell us is predictor variable and a dependent variable
• Uses the test statistic F to test for the significance of the
Table 9.12 Analysis of variance table for a regression analysis
in SPSS regression model used
• Can incorporate interaction terms
ANOVA • The value of F, the degrees of freedom and R2 should be stated
• Can often be helpfully combined with the use of graphs or other

descriptive statistics

Model Sum of df Mean square F Sig.

squares

1 Regression 6098.446 2 3049.223 22.744 0.001 Survival analysis

Residual 938.454 7 134.065 Survival analysis, as its name implies, was originally related to
Total 7036.900 9 the drawing of inferences from numerical data about the
length of life. However, the methods of survival analysis may

184

Research methods, statistics and evidence-based practice CHAPTER 9

be applied to the amount of time elapsing before any particular 1.0Cumulative survival
event, such as relapse, in the history of an individual. The
quantity which is the subject of a survival analysis is the ‘time 0.8 600 800
to outcome’, or survival time, of the individuals under study
(Altman & Bland 2002). The survival time is the difference 0.6
between two times or dates. The terminal event in psychiatric 0.4
applications will not usually be death of the subject, but some 0.2
other kind of event, e.g re-hospitalisation.
0.0
Perhaps the most important thing to appreciate about 0 200 400
survival analysis is that by the end of any study the event will Time
probably not have occurred in all patients. We will not know
when or even whether they experience the event, only that Fig. 9.11 Kaplan–Meier survival curve.
they have not yet done so by the end of the study. Patients
may also be lost to follow-up during the course of a study or time. This ‘relative risk’ is sometimes called the hazard
may experience an event which is not the event of interest function or ratio and the assumption that it is constant is called
but means that their data is in effect ‘censored’. An example the proportional hazards assumption. The terms relative risk
of this might be the death of a patient from a lung tumour and hazard ratio, though they may be similar for specific time
during the course of an antipsychotic trial to prevent relapse points, are not terms which should be used synonymously.
in schizophrenia. In survival analysis, it is assumed that those For a more detailed consideration of this area the interested
patients lost to follow-up have the same prognosis as those reader should consult Parmar & Machin (1995).
remaining in the study.
More complex methods of analysing survival data may be
The aim of survival analysis is to model the survival experi- very useful, especially when two groups differ in the presence
ence of individuals and to estimate associated quantities of of one or more prognostic factor. Ideally, their effects should
interest. Models may include explanatory variables of several be corrected for in a type of multiple regression. The most
types: such as group membership, a discrete variable which common regression model applied to survival analysis is called
might indicate different treatment regimens, a continuous var- Cox’s proportional hazards.
iable such as age which can be adjusted for in group compari-
sons, and other variables which may be of primary interest in Survival analysis
themselves, or may be potential confounders of the relation-
ships in question, and therefore need to be taken into account. • Is useful when the outcome is time to an event
Before conducting any of these analyses it is always helpful to • Assumes that dropouts have the same prognosis as those
graph the survival function against time, using the Kaplan-
Meier survival curve (Bland & Altman 1998). This approach remaining in the study
graphs the proportion of subjects surviving beyond any speci- • A Kaplan–Meier curve illustrates survival data
fied follow-up time (time p) as S(t) and is estimated from • Methods of data analysis include the log rank test and Cox’s
the following equation:
proportional hazards
sðtÞ ¼ ðr1 À d1Þ Â ðr2 À d2Þ Â Â ðrp À dpÞ
r1 r2 rp
. . .

where r is the number of patients alive before a given time and
d denotes the number who died at that time. This equation
becomes much clearer when you consider the graph of S(t)
against time, as shown in Fig. 9.11.

Significance tests applied to survival Multivariate statistics
analysis
Multivariate statistics refers to analyses in which there are
The purpose of undertaking a survival analysis and plotting a multiple (more than one) dependent variables. This situation
Kaplan–Meier curve is usually to demonstrate that the time often arises in medicine when more than one outcome mea-
to some event is greater in one group than in another. In such surement is made or sometimes when the same measurement
cases it is usual to perform a significance test to test the null is repeated on more than one occasion. A common mistake in
hypothesis that the survival times are equal. The most com- the analysis of such data is to perform a series of one-way
mon method of comparing two or more survival functions is ANOVAs for each dependent variable separately. The prob-
the log rank test. This test is distribution-free and effectively lem with that approach is that every one-way ANOVA
yields a value which can be checked against statistical tables increases the chances of at least one type I error (false posi-
in order to give the significance level of the result. The log rank tive). In addition, the dependent variables are often correlated
test assumes that survival times are at least ordinal and that with one another, and including each variable in the same
the risk of one group relative to another does not change with analysis can provide the researcher with more information
(e.g. interactions between variables) than if several analyses

185

Companion to Psychiatric Studies

are conducted separately. To overcome these problems, a explained partly by such a single factor. As another example,
series of multivariate techniques have been devised. For a research in the field of personality variation has also shown
more detailed account of them all, the reader should consult that human personality can be thought of as having five latent
Norman & Streiner (1999). dimensions using the technique of factor analysis.

Multivariate analysis of variance A frequent criticism of factor analysis is that it is highly
exploratory and provides several possible solutions, given the
In order to compare two or more group means we would same data set. This criticism has some merit since factor anal-
conventionally conduct a one-way ANOVA. In more compli- ysis involves several steps and, at each step, more than one
cated situations, where there are several independent variables, technique is often available. The first step in factor analysis
a factorial ANOVA would be the appropriate technique. is to construct a correlation matrix. This is a table with all of
If one wishes to compare means between several groups while the individual variables listed along the top and side of the
controlling for a confounder (e.g. when measuring current table, with each individual cell showing the correlation coeffi-
IQ in people with various diagnoses, one might wish to con- cient between the two variables at the intersecting row and
trol for premorbid IQ) the appropriate model is called an column. It may be obvious even at this stage that several vari-
analysis of covariance or ANCOVA. Finally, when there are ables are highly correlated with one another and might be
two or more dependent variables the amended ANOVA explained by a single factor. The second step in the analysis
model is called a multivariate analysis of variance or MAN- is to extract a small number of factors from the correlation
OVA which can also be adapted to control for confounding matrix, where each factor is expressed as a linear combination
variables (MANCOVA.) The general mathematical model of the measured variables.
used in all of these approaches is very similar and often
referred to as the general linear model (GLM). For simplicity’s The factors may be extracted in a way which maximises the
sake the details have not been included in this chapter amount of variance explained by the factor. The amount of
although further details can be found in Hand & Taylor variance accounted for by a factor is known as its eigenvalue.
(1987). If a factor’s eigenvalue is less than 1, it is worse than a single
variable at explaining the overall variance. If, however, the
Multivariate ANOVA eigenvalue is equal to the number of variables, the factor
explains all of the variance. In practice, researchers often keep
• Is a parametric statistical technique the factors with eigenvalues of 1 or more and discard the rest.
• Is based on the general linear model Another approach is to take the first few factors which explain
• Is a relatively computer-intensive technique the largest amount of variance and when the amount of vari-
• A common test statistic is Hotteling’s T2, although ance explained by successive factors diminishes, ignore
subsequent factors. This second approach is usually performed
others exist with the aid of a Scree Plot (Kline 1994). The final step in fac-
tor analysis is to rotate the factors. This step is usually per-
Factor analysis formed because some factor loadings will be negative and
difficult to interpret and secondly because the measured vari-
Another method of dealing with large datasets is the technique ables may ‘load onto’ two or more factors. Factor rotations are
of factor analysis. Factor analysis is a method of data reduction usually orthogonal (i.e. they minimise the correlation between
in which many variables are collapsed into a smaller number of factors) or oblique (they allow extracted factors to correlate
different variables called factors. The variables can be reduced with one another). One of the most common rotations used
in this way when two or more variables are highly correlated is the Varimax rotation. This technique is an orthogonal rota-
and can effectively be replaced by a single factor without the tion which maximises the variance explained by each rotated
loss of much information. Factors are effectively condensed factor (hence the name) such that the factors obtained are
statements of relationships between a set of variables and are uncorrelated.
sometimes referred to as latent traits or variables and also as
hypothetical constructs. Factor analysis, as outlined above, can be used for data
reduction or data exploration. A further technique has evolved
One of the best known uses of factor analysis was in the called confirmatory factor analysis (CFA) or structural equa-
field of intellectual ability. Spearman observed that the perfor- tion modelling which can be used to test hypotheses about
mances of people on a wide variety of tests of intellectual abil- underlying factor structure.
ity were highly correlated. Spearman, among others, thought
that the reason for this finding was that performance on tests Factor analysis
of intellectual ability could be explained by a single trait we
now know as IQ. Evidence from factor analysis, and from • May be obtained by several methods
other research, suggests that human abilities can indeed be • Can be used to simplify large data sets
• Produces latent variables or hypothetical constructs
• Rotations may be oblique or orthogonal

186

Research methods, statistics and evidence-based practice CHAPTER 9

Cluster analysis Meta-analysis statistics

Cluster analysis is the name given to a set of techniques which A meta-analysis produces a weighted average of the results from
ask whether data can be grouped into categories on the basis two or more studies. A meta-analysis is often conducted along-
of their similarities or differences. It began when biologists side a systematic review, but the terms are not synonymous.
started to classify plants on the basis of their various phyla A meta-analysis of several studies may be misleading without a
and species and wanted to derive a less subjective technique. systematic review, since non-systematic reviews are more prone
It has been applied to diagnostic classification in a similar to bias, particularly in selectively citing studies which support
way. To take a theoretical example, conventional categories a particular point of view. A meta-analysis of such studies
of functional psychotic illness (depression, bipolar disorder will then provide a spuriously precise estimate of any overall
and schizophrenia) are thought by many to be somewhat effect failing to take into account other studies. Publication bias
unsatisfactory concepts. They do not predict outcome particu- is also a threat to the validity of a meta-analysis, even when there
larly well and seem to share many risk factors. A researcher has been a previous systematic review. Publication bias is the
might attempt to collect data from people with psychosis tendency for small, usually negative studies to remain unpub-
and empirically derive their own categories using cluster anal- lished. Finally, for studies to be meaningfully combined, the
ysis. The mathematical details are too complicated to explain results must be broadly similar from study to study. When study
here, but basically the researcher should first decide whether results vary more than one would expect by chance, the results
he wishes to use a hierarchical or partitioning method. Hierar- are said to demonstrate statistical heterogeneity. Finding hetero-
chical methods involve the measurement of various variables geneity should prompt investigators to search for an explanation
on each subject. These variables are then compared between of it. Causes include differences in the characteristics of study
subjects and the clusters are derived in such a way as to mini- participants and differences in methodology.
mise the differences (‘Euclidian distance’) between members
within a category and to maximise the differences between There are in fact a range of meta-analytic methods from
people belonging to different categories. Each category can which to choose. The choice of method will depend upon
then be further subdivided into lower-order categories repre- the measure of effect used in the individual studies and the
sented with a dendrogram (Fig. 9.12). Partitioning techniques presence or absence of heterogeneity. If the results of individ-
assume each category is unique from all the others and are less ual studies do not show heterogeneity, fixed-effects analyses
commonly used. should be used. These assume that there is a single underlying
effect and that each individual study is an unbiased estimator
Common multivariate techniques in psychiatry of that effect. When heterogeneity is present, random-effects
analyses should be used. These do not assume a single under-
• Multivariate analysis of variance lying treatment effect but estimate an average effect across
• Factor analysis all studies. Random-effect analyses produce a single overall
• Cluster analysis estimate of treatment effect which is generally less precise
than the corresponding fixed-effects analysis since the hetero-
geneity is also incorporated into the confidence interval of the
overall effect estimate.

Fixed-effects analysis

Fixed-effects meta-analysis is a two-step process. The first

step is to calculate a common unit of treatment effect, usually

an odds ratio or relative risk of an event or a difference in two

Schizophrenia means for continuous data, for each individual study. The

Schizoaffective Non-affective second stage is to calculate a summary statistic which is a
disorder psychosis
weighted average of the results from individual studies. The
Recurrent All
depression psychotic weights used are usually the inverse of the variance (the square
illness
of the standard error) from the individual studies. Larger stud-

ies, which provide a more precise estimate of overall treat-

ment effect, and have a relatively small variance, are given

more weight than smaller trials with larger variances. This

Recurrent Affective method can be expressed algebraically:
mania psychosis
P
Pwiyi
Manic yiv ¼ wi
depression
where yIV is the pooled result using fixed-effects (inverse
Fig. 9.12 Dendrogram of hypothetical cluster analysis of psychotic variance) analysis, yi is the result of individual studies and wi
patients. is the weight given to individual studies.

187

Companion to Psychiatric Studies

wi ¼ 1 methods are available. The simplest method of all is to con-
SEðy1Þ2 struct a funnel plot. Essentially a funnel plot is a plot of the
study effect size against its precision. The effect size is usually
where SE(yi) is the standard error of the results of individual measured as a mean difference or standardised difference, for
studies. The heterogeneity statistic Q is calculated as: continuous data, or a relative risk or odds ratio for dichoto-
mous or event-like data. Relative risks and odds ratios are usu-
Q ¼ X ðyiÀOivÞ2 ally plotted on a log scale so that effect sizes favouring an
wi effect are plotted an equal distance away from the line of no
effect as those showing an equal effect, but in the opposite
Random-effects models direction.

Random-effects analysis, sometimes referred to as DerSimo- Consider Figs 9.13 and 9.14. In Fig. 9.13, we have ‘found’
nian and Laird random-effects models (DerSimonian & Laird 16 studies, some of which are large and have tight confidence
1986), do not assume one underlying treatment or other intervals, some of which are small and have large con
fidence intervals. Overall the pooled estimate of treatment
effect. The effect sizes from individual studies are assumed effect is non-significant, although its confidence intervals are
to be normally distributed with variance t2. The individual quite narrow. In Fig. 9.14 we present the same trials but
weights (wi) for each included study are then: leaving out those with small sample sizes that find a negative
result. By removing the small negative findings – i.e. introdu-
wi ¼ 1 t2 cing publication bias – we have changed the result of the whole
SEðyiÞ2 meta-analysis to favour an overall effect. The funnel plot
þ shows a corresponding gap or void in its lower right-hand
corner where one would expect to find small negative studies
where SE(yi) is the standard error of the results of individual to be.
studies. And the pooled overall effect is:
Further techniques have been developed to quantify funnel
yDL ¼ PPwwiyi i plot asymmetry. The interested reader should consult Egger
et al (2001) for further details.
where yDL is the pooled result using random-effect (DL ¼
DeLaird) analysis, yi is the result of individual studies and wi Concerning meta-analysis
is the weight given to individual studies.
• Fixed-effects analysis – assume no or low heterogeneity and a
As heterogeneity (and therefore t2) increase, the study single underlying effect

weights given to individual studies will become more similar, • Fixed-effects analysis – usually weight the results by the
reciprocal of their variance
and relatively more weight will be given to smaller studies
• Random-effects analysis – allow for heterogeneity, to give an
compared with fixed-effects models. Random-effects models average treatment effect across studies

are more conservative than fixed-effects models, giving wider • Random-effects analyses – incorporate heterogeneity as well as
study variance into the weights given to individual studies
confidence intervals around the overall summary estimate.
• Publication bias may be suggested by funnel plot asymmetry
Measuring publication bias

In order to assess whether the sample of studies you have
obtained is likely to be biased because of selective publication
or identification, a number of graphical and numerical

Funnel plot Relative risk meta-analysis plot (fixed effects)
0.0
Lawrie 2002
Standard error 0.3 Johnstone 2000

0.6 Muir 1999
Blackwood 1998
0.9 –1 0 1 2 100
–2 Log (relative risk) Thomson 1996
Carson 1995

Blackwood 1990
Thomson 1990
Carson 1989

Blackwood 1990
Thomson 1990
Carson 1990
Gooch 1990
Thomson 1986
Steven 1986
Davidson 1980

0.1 0.2 0.5 1 2 5 10

MH pooled relative risk = 0.872159
(95% Cl = 0.733436 to1.037121)

Fig. 9.13 Funnel and forest plot on all of 16 studies.

188

Research methods, statistics and evidence-based practice CHAPTER 9

Funnel plot Relative risk meta-analysis plot (fixed effects)

Standard error 0.0 –1 0 Lawrie 2002
0.2 Log (Relative risk) Johnstone 2000
0.4
0.6 Muir 1999
Thomson 1996
–2 Blackwood 1990
Thomson 1996
Thomson 1990

Gooch 1990
Thomson 1986

Steven 1986
Davidson 1980

1 0.1 0.2 0.5 1 2 5

MH pooled relative risk = 0.789455
(95% Cl = 0.656503 to 0.949333)

Fig. 9.14 Funnel and forest plot on studies where small negative findings have been removed.

Concluding remarks event can compensate for poor study design. Contrary to the
clich´e, statistics cannot be used to show anything one wants
Figure 9.15 summarises basic statistical tests for various data them to.
types. This section has attempted to introduce the reader to
the main statistical techniques used in medical and psychiatric Evidence-based clinical practice
research. It is, however, difficult to fully appreciate some of
the issues discussed without access to your own dataset and Clinicians are under increasing pressure from a number of
a suitable software package. We encourage readers to do this, sources to keep up-to-date with the research literature and
once they have a grasp of the fundamentals, but also advise to ensure that their clinical practice is as effective as possible.
them to seek out sensible research design and statistical advice The rapid expansion of the internet means that patients
at an early stage. Statistical testing should be planned and have increased access to knowledge about healthcare and
should use research methods which are likely to yield a reli- understandably expect their doctors to be fully informed.
able and unbiased answer. No amount of analysis after the Purchasers of healthcare expect maximum value for each

Summarise data
graphically/and tabulate

Qualitative data Quantitative data
(i.e. nominal or ordinal) (i.e. interval, ratio or continuous)

Nominal data Ordinal data Parametric Parametric
(Count or proportion) (e.g. social class) assumptions* not met assumptions* are met

2 groups 2 groups 2 groups
Chi-squared test Mann–Whitney U test Independent t-test
McNemar test (paired data) Wilcoxon rank sum Paired t-test
(paired data) (paired data)
3+ groups
Chi-squared test 3+ groups 3+ groups
Kruskal –Wallis test ANOVA test

Correlation Correlation
Spearman’s rank Pearson’s coefficient

Fig. 9.15 Summary of basic statistical tests for various data types. *Assumptions underlying parametric statistics
(approximately normal distribution, independence, equality of variance, at least interval data) are described in more detail
in the text.

189


Click to View FlipBook Version