XXimage
The technique is more clearly seen as a four part process. 4
1 23
The unknown The first A mental image The meaning
word language
≡ keyword combining the of the
≡ meaning of the ≡ unknown word
unknown word and
the meaning of the
keyword
Here are some examples. The keywords have been chosen from a variety of languages including
English. Bird and Jacobs (1999) suggest that for languages like Chinese with very limited
syllable structure, it may also be useful to choose keywords not only from the first language but
from known words in the second language.
fund fun (Thai) Imagine a fund of a supply of
candid ≡ meaning
≡ money being eaten ≡ money for a
Ateeth@
by a set of teeth special
purpose
≡ can The ≡ Imagine a can with ≡ honest and
English word a label that truthful
meaning a honestly shows its
485
container contents
core ≡ hor ≡ Think of a choir ≡ the most
(Serbo-Croat) standing on the important or
meaning core of an apple central part
Achoir@
Step 2 provides a word form link between the unknown word and the keyword. Step 3 provides a
meaning link between the keyword and the meaning of the unknown word. Thus the whole
sequence provides a link from the form of the unknown word to its meaning.
The unknown word because of its formal similarity to the key word prompts recall of the
keyword. The keyword prompts recall of the image combining the keyword meaning and the
meaning of the unknown word. This image prompts recall of the meaning of the unknown word
and completes the set of links between the form of the unknown word and its meaning.
Instead of an image at step 3, some experimenters (Pressley, Levin, McCormick, 1980) have
used a sentence which describes what the image might be, for example, AThere is a pin in the
pintu@. The keyword technique can be used with ready made keywords and images as in the
examples above. This is generally recommended for younger learners and seems to work as well
as self created keywords and images (Hall, 1988; see Gruneberg and Pascoe, 1996 for a
discussion of this). Some researchers (Fuentes, 1976; Ott, Butler, Blake and Ball, 1973) found
that learners in the control group were spontaneously using keyword-like techniques.
There has been considerable research on the keyword technique. It has been found that the
technique works with
1. learners of differing achievement (Levin, Levin, Glasman and Nordwall, 1992; McDaniel
and Pressley, 1984) although learners with low aptitude may find it more difficult to use
the technique (McGivern and Levin, 1983)
2. learners at a variety of grade levels including very young children (Pressley, Samuel,
486
Hershey, Bishop and Dickinson, 1981)
3. elderly learners (Gruneberg and Pascoe, 1996)
4. educationally disadvantaged learners.
The technique has been used with a wide range of languages, English speakers learning English
words, learning Spanish, Russian, German, Tagalog, Chinese, Hebrew, French, Italian, Greek,
and Latin, Dutch learners learning Spanish, and Arabic speakers learning English.
The keyword technique can be used in L1 or L2 learning, for learning the gender of words
(Desrochers, Gelinas and Wieland, 1989; Desrochers, Wieland and Coté, 1991), and with
learners working in pairs or individually (Levin, Levin, Glasman and Nordwall, 1992). When it
is used for L1 learning, the unknown word is an L1 word and the keyword is usually a higher
frequency L1 word, for example, cat could be the keyword for catkin.
The experiments evaluating the keyword technique have compared it with
1. rote learning
2. use of pictures (Levin, McCormick, Miller, Berry and Pressley, 1982)
3. thinking of images or examples of the meaning (instantiation) (Pressley, Levin, Kuiper,
Bryant and Michener, 1982)
4. context (the unknown word is placed in sentence contexts and the meaning of the word is
provided) (Moore and Surber, 1992; Brown and Perry, 1991)
5. added synonyms (the meaning is accompanied by other known synonyms) (Pressley,
Levin, Kuiper, Bryant and Michener, 1982)
6. guessing from context (McDaniel and Pressley, 1984).
The keyword technique usually performs better than any of these other methods and at least as
well as them.
The keyword technique has positive effects on both immediate retention and long term retention
487
(one week to ten years). This finding is not consistent as there are a few studies which suggest
that long term retention is not good with the keyword technique (Wang, Thomas, Inzana and
Primicerio, 1993; Wang and Thomas, 1992; Wang and Thomas, 1995) and so such learning may
need to be closely followed by some additional meetings with the words. The case study
described by Beaton, Gruneberg and Ellis (1995) shows that even after ten years without
opportunity for use, some memory for words learned by the keyword technique remains. Without
any revision 35% of the words were remembered with correct spelling and 50% correct or with
some small spelling errors. After ten minutes spent looking at the vocabulary list around 75%
were recalled correctly or with minor errors and after one and a half hours revision almost 100%
of the 350 words were recalled correctly. This relearning is a very sensitive test of retained
knowledge.
The effect of the keyword technique is not limited to receptive recall of a synonym. Studies have
shown it be effective for recall of definitions (Levin, Levin, Glasman and Nordwall, 1992; Avila
and Sadoski, 1996), in sentence completion tasks (Avila and Sadoski, 1996), in story
comprehension (Avila and Sadoski, 1996; Pressley, Levin and Miller, 1981; McDaniel and
Pressley, 1984), in writing sentences using the words studied (McDaniel and Pressley, 1984),
and in productive recall (Gruneberg and Pascoe, 1996; Pressley, Levin, Hall, Miller and Berry,
1980). The keyword needs to overlap a lot in form with the unknown word for productive recall
to be successful (Ellis and Beaton, 1993) and repetition may be more effective (Ellis and Beaton,
1993). Learners find using the keyword technique an enjoyable activity (Gruneberg and Sykes,
1991) and can achieve large amounts of learning with it (Gruneberg, 1992: 180; Gruneberg and
Jacobs, 1991) with some learners learning 400 words in 12 contact hours and 600 words in four
days. It is unlikely that these rates could be sustained but they represent very useful initial
achievements.
To be effective, learners need extended training with the keyword technique. Hall (1988) spent a
total of three hours over a period of four weeks training learners in the use of the keyword
technique and even this was probably not enough time. As with all the major vocabulary learning
strategies, learners need to be brought to a level of skill and confidence in the use of the strategy
where they find it just as easy to use the strategy as not use it. If their grasp of the strategy is
488
unsure, then it will be rarely used. A fault with many of the experimental studies of the keyword
technique is that training seems to have been very short or is not described clearly in the reports.
Several studies show that the keyword technique works well on some words (usually where
keywords are easy to find) and not so well on others (Hall, 1988). It would be interesting to see if
extended training in the keyword technique results in ease of use with most unknown words or if
there are still problems finding keywords for many words and with some languages whose
syllable structure differs greatly from the first language. Gruneberg=s Linkword books provide
keywords for a wide range of vocabulary indicating that the only limit on finding a key word
could be the learner=s imagination. In the books the learners are encouraged to spend about 10
seconds thinking of the image so that there really is visualisation.
The results of the experiments on the keyword technique are not unanimous, but there is a very
large amount of evidence supporting its use, and if it is fitted into a balanced programme any
possible weaknesses, such as long term retention and availability for productive use, will be
lessened.
Research on the keyword technique has continued at a rate far in excess of its importance in
learning, particularly when one considers the other areas of vocabulary learning where we lack
the support of experimental findings. The keyword studies now number well over one hundred.
The learners also need to work out a spaced repetition schedule for working on the cards as
described in Chapter 3.
Training learners in the use of word cards
The research reviewed in this chapter has shown that there is value in learning vocabulary using
word cards. This learning, however, must be seen as part of a broader programme involving
other kinds of direct learning as well as the strands of meaning-focused input, meaning-focused
output, and fluency development.
489
The research shows that there are ways of maximizing learning and learners need to know about
these and know how to make use of them in their learning. Some of Griffin's (1992) studies
suggest the importance of informing learners about how to go about learning, so that factors like
transfer of learning, serial position in a list, and item difficulty are taken into account to suit the
language learning goal.
1. Learners should know about the importance of retrieval in learning and how word cards
encourage this by not allowing the word form and meaning to be seen simultaneously.
They should know about receptive retrieval and productive retrieval.
2. Learners should know the value of repeating and spacing learning and to include long term
review in their learning.
3. Learners should know what information to include on their word cards, particularly a
sentence context or some useful collocations.
4. Learners should know what words to choose to put on their cards, giving particular
attention to high frequency words.
5. Learners should know what to do with each word, rehearsing its spoken form and using
mnemonic techniques like the keyword technique whenever a word is difficult to
remember.
6. Learners should keep changing the order of the cards, avoiding serial learning and putting
more difficult items at the beginning of the pack so that they get more attention. They
should re-form packs, taking out words that are now known and including new items.
7. Learners should use small packs of cards in the early stages of learning and use bigger
packs when the learning is easier.
8. Learners should be aware of interference effects between semantically and formally related
490
words and avoid including such related items in the same pack.
9. Learners should make deliberate efforts to transfer the learning from word cards to
meaning-focused language use.
10. Learners should know how to monitor and reflect on their own learning, and adapt their
learning procedures on the basis of this reflection.
Some of these points are easy to learn and require only a little explanation and discussion.
Others, like the use of mnemonic devices, choosing words to go on the cards, avoiding
interference, and transferring knowledge require much more time and attention. This training can
involve:
1. Understanding what should be done. This can be tested by quizzes.
2. Observing and hearing about others= learning experiences and discussing strengths and
weaknesses of what was observed.
3 Performing learning tasks using word cards and reporting and reflecting on the experience.
4 Monitoring and training others in the use of word cards.
This training requires planning and a suitable allocation of time. The principle of spaced retrieval
should be applied to the training procedure and teachers should plan a mini-syllabus spread over
several weeks to train learners in the effective use of word cards.
Teachers should be able to justify to themselves and to others the value of spending time training
learners in the use of word cards. These justifications could include the following points:
1 The word card strategy can be applied to both high frequency and low frequency words. It
is a widely applicable strategy.
491
2 Direct deliberate learning is faster and stronger than incidental learning.
3. Direct learning can help incidental learning by raising consciousness of particular words
and providing knowledge that can be enriched and strengthened through incidental
meaning-focused learning.
4. Learners differ greatly in their skill at direct learning. Training is likely to reduce these
differences.
5. Learners spontaneously do direct learning but they do not always do it efficiently. Training
can increase their efficiency.
Learning using word cards should not be seen as an alternative to other kinds of learning. It
should be seen as a useful and effective complement and simply one part of a well balanced
vocabulary-learning programme.
The three word study strategies of using word parts, dictionary use, and using word cards are
important in helping learners quickly increase their vocabulary size. The deliberate nature of the
strategies results in substantial gains. When these are supplemented by opportunities to meet and
use these words in listening, speaking, reading and writing, then the vocabulary programme has a
very strong base.
9 Chunking and collocation
The term "collocation" is used to refer to a group of words that belong together, either because
they commonly occur together, like "take a chance", or because their meaning is not obvious
from the meaning of their parts, as with "by the way" or "to take someone in" (to trick them).
A major problem in the study of collocation is determining in a consistent way what should be
classified as a collocation. This is a problem because collocations occur in a variety of general
forms and with a variety of relationships between the words that make up the collocation. In this
516
book, the term "collocation" will be used to loosely describe any generally accepted grouping of
words into phrases or clauses.
From a learning point of view, it makes sense to regard collocations as items frequently
occurring together and with some degree of semantic unpredictability. These two criteria justify
spending time on collocations because of the return in fluency and nativelike selection.
Collocation is often described as a "Firthian" term (Kjellmer, 1982: 25; Fernando, 1996: 29), but
Palmer used it many years earlier and produced a substantial report on English collocations.
Palmer (1933: 4) used a restricted definition of collocation, focusing mainly on items whose
meaning is not obvious from their parts:
Each [collocation] ... must or should be learnt, or is best or most conveniently
learnt as an integral whole or independent entity, rather than by the process of
piecing together their component parts.
Palmer discussed several terms including idiom, heteroseme, phrase, formula but decided on
collocation because it was not a completely new word (Palmer refers to a use in 1750 noted in
the Oxford English Dictionary), it had not become definitely associated with other meanings, it
was an international word in that it was made of Latin parts, and it could be used in a variety of
disciplines.
There is a range of arguments put forward for giving attention to word groups and some of them
go to the heart of what it means to know a language. Here is a brief list of these arguments. We
will look at each of them more fully in the rest of this chapter.
(1) Language knowledge is collocational knowledge. N. Ellis (in press) argues that although it is
possible for linguists to discover grammar rules in instances of language, language knowledge
and language use can be accounted for by the storage of chunks of language in long term
memory and by experience of how likely particular chunks are to occur with other particular
517
chunks, without the need to refer to underlying rules. Language knowledge and use is based on
associations between sequentially observed language items. This viewpoint sees collocational
knowledge as the essence of language knowledge.
(2) All fluent and appropriate language use requires collocational knowledge. Pawley and Syder
(1983) argue that the best way to explain how language users produce nativelike sentences and
use the language fluently is that in addition to knowing the rules of the language, they store
hundreds of thousands of preconstructed clauses in their memory and draw on them in language
use. Thus each word in the language is likely to be stored many times - once as a single item and
many times in memorized chunks.
(3) Many words are used in a limited set of collocations and knowing these is part of what is
involved in knowing the words. In some cases the collocations are so idiomatic that they could
only be stored as memorised chunks. In others there are general collocational rules (or
prosodies).
Considering the role of collocational knowledge in language learning raises an important
recurring issue in language study, namely, how much of language learning and language use is
based on underlying abstract patterns and how much is based on memorized sequences? When
we hear or produce a sentence like "It's really great to see you!", do we subconsciously perceive
its underlying grammatical structure, do we see it as two or more previously stored chunks "It's
really great" "to see you", or do we see it as one stored unanalysed chunk that we recognise or
produce when needed? The answer to this question should affect what collocations we give
attention to and the way we deal with them in language classrooms. In this chapter we are
concerned with collocation but the argument about the units of language knowledge and the way
they fit together applies at all levels of language. Let us look first at the units.
Chunking
In an influential paper, Miller (1956) distinguished Abits@ of information from Achunks@ of
information. Our ability to make reliable one dimensional judgements, such as classifying tones,
brightness and size seems to be limited to around seven bits of information. Coincidentally, the
518
span of immediate memory seems to be limited to the same number of items. We can overcome
this limitation by chunking the information. Bits of information are formed into chunks by the
process of Arecoding@, that is creating larger meaningful chunks. These recoded items need to be
able to be accessed fluently as units in order for them to act as chunks.
N. Ellis (in press) sees the learning of collocation as one level of "chunking", that is, by the
long-term storing of associative connections (p.5). This chunking occurs at all levels of language,
and in both spoken and written forms. Table 9.1 has examples from written language.
TABLE 9.1. EXAMPLES OF CHUNKING AND DIFFERENT LEVELS OF WRITTEN
LANG
UAGE
Level Type of chunking Examples
Letters
Each letter is processed as a unit p is processed as a unit, not as a
Morphemes not as a set of separate strokes. small circle and a descending
Words stroke on the left hand side
Each morpheme is processed as play is processed as a unit not as
a unit rather than a set of letters. a combination of p, l, a, y.
Complex words are processed player is processed as a unit not
as a unit rather than several as a combination of two units
morphemes. play and -er.
519
Collocations Collocations are processed as a a player with promise is
unit not as a group of two or processed as a unit.
more words.
Chunking can develop in two directions. Memorized unanalysed chunks can be later analysed, or
smaller chunks can be grouped into larger chunks. For the moment however let us look at
chunking as a process that starts with knowledge of the smallest parts. These small parts are later
chunked to become bigger parts and so on. When learning to read another language which uses a
different script, for example an Arabic speaker learning to read English, the smallest units will be
the parts or strokes making up the letters. Distinguishing d, b, p, and g will require a lot of
practice. When the learner can see each letter as a unit rather than having to look carefully at the
parts to distinguish the letters, then one level of chunking has occurred. Similarly at a higher
level, that is a level involving more or bigger chunks, a reader may be able to recognise
particular words without having to look carefully at each letter. Common combinations have
been chunked as morphemes or words.
Chunking typically occurs where the same parts are often observed occurring together. In some
cases this occurs solely because of frequency. For example, words like the and soon occur very
frequently and may be thus more efficiently treated as one chunk rather than a sequence of
letters. In some cases, parts are often observed as occurring together because they represent a
regular pattern in the language. For example, the sequence spl represents a regular initial
consonant cluster in English following the pattern /s/ + voiceless plosive + /l/ or /r/.
The advantages and disadvantages of chunking
The main advantage of chunking is reduced processing time. That is, speed. Instead of having to
give close attention to each part, the chunk is seen as a unit which represents a saving in time
needed to recognize or produce the item. Instead of having to refer to a rule or pattern to
comprehend or produce the chunk, it is treated as a basic existing unit.
520
The main disadvantage of chunking is storage. There are many more chunks than there are
components of chunks, and if the chunks are also stored in long term memory then there will be
a lot of items to store. There may also be difficulty in finding an item in the store.
If chunks are learned as unanalysed units, then a major disadvantage of chunking is that the parts
of the unit are not available for creative combination with other parts. For example, if "Please
make yourself at home" is learned as an unanalysed unit, then the parts "make yourself ..." and
"at home" are not available from this chunk to use in other patterns "Make yourself
comfortable", "I really feel at home here" and so on.
The alternative to chunking is rule based processing. In productive language use, this means
recreating an item each time it is used. The best researched language area on this issue is word
building, that is, the use of complex words. When we produce a word like "unable" or
"unambiguousness" do we create these words from their parts each time we use them (un + able,
un + ambigu + ous + ness) or do we simply retrieve them as already created previously stored
complete units? There is a very large amount of research that attempts to answer this important
question (see Marslen-Wilson, Tyler, Waksler and Older, (1994) for reviews). At present, the
research evidence shows that high frequency complex units like "unable" are stored as whole
chunks. Low frequency complex items like "unambiguousness" are recreated by rules each time
we need them. If this explanation is correct then it represents a nice compromise between the
advantages and disadvantages of chunking. High frequency items are chunked and stored
separately thus reducing processing time. As we have seen, a small number of high frequency
items account for a large proportion of use. Low frequency items are not stored as chunked units,
thus reducing the need for lots of storage. As we have seen, there is a very large number of low
frequency items which account for a very small proportion of use. This recreation takes
processing time but does not happen frequently. It is likely that this efficient frequency based
balance of storage of chunks and rule based creation or analysis runs through all levels of
language.
As chunks become bigger, their frequency of use becomes lower. There will be a point where the
521
frequency of collocations of a certain length is so low that it is not efficient to store them as a
chunk. This is a general principle and there will be exceptions where a long collocation is stored
as a chunk because an individual uses it frequently. Poems, songs and some speeches are
probably also stored in this way.
TABLE 9.2. FREQUENCY, STORAGE AND PROCESSING OF COMPLEX ITEMS
Type of vocabulary Number of different Coverage of text Treatment
words
High frequency A few items A large proportion of Store as complete items
words (not many to store) text
(too much to process)
Low frequency Many items A small proportion of Apply the rules to
words (too many to store) text create them each time
(not much to process) they are used
This explanation however still does not tell us what the rules are and if there is an interaction
between rules and chunks. That is, are rule based chunks easier to learn? To examine these
issues, let us now look in more detail at each of the three positions on collocation that were
briefly described at the beginning of this chapter.
Language knowledge is collocational knowledge
The strongest position taken on the importance of collocational knowledge is that it is essential
because the sequential probabilities of language items are the basis of learning, knowledge and
522
use.
In several papers Ellis (in press; Ellis and Schmidt, 1997) argues that a lot of language learning
can be accounted for by associations between sequentially observed language items. That is,
without the need to refer to underlying rules. The major factor affecting this learning by
association is frequency of meeting with instances of language use (the power law of practice).
By having chunks of language in long term memory, language reception and language
production are made more effective.
If we accept this view of the role of collocational knowledge being the basis of language learning
and use, then all collocational sequences, both regular and idiomatic, are important for learning,
with the most frequent ones being the most important. Although the direct formal study of
collocations has a role to play in this learning (Ellis, in press), most learning will take place
through meaning focused receptive and productive language use.
Fluent and appropriate language use requires collocational knowledge
Pawley and Syder (1983) consider that the best explanation of how language users can choose
the most appropriate ways to say things from a large range of possible options (nativelike
selection), and can produce language fluently (nativelike fluency) is that units of language of
clause length or longer are stored as chunks in the memory. They suggest that this explanation
means that most words are stored many times, once as an individual word and numerous times in
larger stored chunks.
The "puzzle" of nativelike selection is that by applying grammar rules it is possible to create
many grammatically correct ways of saying the same thing. However only a small number of
these would sound nativelike. For example, all the following are grammatically correct.
Please close the window.
I desire that the window be closed.
The closing of the window would greatly satisfy me.
523