Foundations of S
Language P
Statistical Natural
Processing
劉成韋
OUTL
Introduction
Methodological Prelimin
Supervised Disambiguat
Dictionary-based Disam
Unsupervised Disambigu
What is a Word Sense?
Further Reading
LINE
naries
tion
mbiguation
uation
Introduct
Problem
A word is assumed to ha
senses.
Task
To make a forced choice
the meaning of each usa
Based on the context of
In fact
A word has various some
unclear whether to and w
them.
tion (1/2)
ave a finite number of discrete
e between these senses for
age of an ambiguous word.
use.
ewhat related senses, but it is
where to draws lines between
Introduct
However, the senses ar
defind.
For Example:bank
The rising ground border
坡)
As establishment for the
exchange, or issue of mo
credit, and for facilitating
funds.(銀行)
tion (2/2)
re not always so well-
ring a lake, river, or sea...(邊
custody(保管), loan
oney, for the extension of
g the transmission of
Methodological Pr
Supervised learning
Know the actual status fo
which one train
Can usually be seen as a
Unsupervised learning
Don’t know the classifica
training example
Can thus often be viewed
reliminaries (1/3)
or each piece of data on
a classification task
ation of the data in the
d as a clustering task.
Methodological Pr
Pseudo
For testing the performa
Large number of occurre
disambiguated by hand
Time intensive
Laborious task
Pseudo-words
Conflating two or more w
Such as replaces all bana
banana-door
reliminaries (2/3)
o-words
ance of these algorithms
ences has to be
words
ana and door in a corpus by
Methodological Pr
Upper and lower bou
The estimation of upper
A way to make sense of
A good idea for those wh
evaluation sets for comp
The upper bound used
performance
We can’t expect an autom
The lower bound is the
simplest possible algorit
Assign all contexts to the
reliminaries (3/3)
unds on performance
r and lower bounds
performance figures
hich have no standardized
paring systems.
is usually human
matic procedure to do better
performance of the
thm
e most frequent sense
Supervised Di
A disambiguated corpus
There is a training set wh
ambiguous word is anno
Bayesian classification
< Gale et al. 1992 >
Treats the context of occ
without structure
An information-theoretic
< Brown et al. 1991 >
Looks at only one inform
which may be sensitive t
isambiguation
s is available for training
here each occurrence of the
otated with a semantic label
currence as a bag of words
c approach
mative feature in the context,
to text structure
Supervised Di
Bayesian Class
Each context word
Contributes potentially u
sense of the ambiguous
A Bayes classifier applie
when choosing a class
For each cases, choose t
The rule minimize the pr
Bayes decision rule
P(s'| c) > P(sk | c) for sk ≠ s'
isambiguation
sification (1/4)
useful information about which
word is likely to be used with it
es the Bayes decision rule
the class with the highest prob.
robability of error
'
Supervised Di
Bayesian Class
We want to assign the a
sense s′ , given the con
s' = arg max P(sk | c)
sk
= arg max P(c | sk ) P(sk )
P(c)
sk
= arg max P(c | sk )P(sk )
sk
= arg max[log P(c | sk ) + log
sk
isambiguation
sification (2/4)
ambiguous word w to the
ntext c
Baye’s Rule
g P(sk )] log
Supervised Di
Bayesian Class
Gale et al.’s classifier, th
An instance of a particula
Naïve Bayes assumption
P(c | sk ) = P({vj | vj inc}| s
All the context and linear
Each word is independen
Actually it’s not true, such
The simplifying assumpti
isambiguation
sification (3/4)
he Naïve Bayes classifier
ar kind of Bayes classifier
n
∏sk ) = vjinc P(vj | sk )
r ordering of words is ignored
nt of another
h as “president”
ion makes it more effective
Supervised Di
Bayesian Class
With the Naïve Bayes as
Decision rule for Naïve B
decide s’ if
s'= argmaxsk [log
P(vj | sk ) = C(v j , sk ) P(s
C(sk )
Choose s ' = arg maxs
isambiguation
sification (4/4)
ssumption :
Bayes
∑gP(sk ) + vjinc logP(vj | sk )]
sk ) = C(sk )
C(w)
sk score(sk )
Supervised Di
An information-theo
It tries to find a single c
reliably indicates which
word is being used.
Instead of use informatio
context, such as Bayes c
isambiguation
oretic approach (1/5)
contextual feature that
sense of the ambiguous
on from all words in the
classifier
Supervised Di
An information-theor
Two senses of the word
Prendre une mesure Æ t
Prendre une decision Æ
Flip-Flop algorithm <Bro
Let {t1,…,tm} be the tra
word
Let {x1,…xm} be the po
For prendre {t1,…,tm} Æ
For prendre {x1,…xm} Æ
isambiguation
retic approach (2/5)
d prendre
take a measure
make a decision
own et al.>
anslations of the ambiguous
ossible values of the indicator
Æ {take, make, rise, speak}
Æ {mesure, note, exemple,
decision, parole}
Supervised Di
An information-theo
Flip-Flop Algorithm:
find a random partition P
while (improving) do
find partition Q={Q1
that maximi
find partition P={P1
that maximi
end
Each iteration of the alg
mutual information I(P;
I (P;Q) = ∑ ∑ p
t∈P x∈Q
isambiguation
oretic approach (3/5)
P={P1,P2} for {t1,…, tm}
1, Q2} of {x1,…,xn}
izes I(P;Q)
1, P2} of {t1,…, tm}
izes I(P;Q)
gorithm increases the
;Q) monotonically.
p(t, x) log p(t, x)
p(t) p(x)
Supervised Di
An information-theor
The initiation partition p
P1={take, rise} P2={ma
Let’s assume prendre is
Q1={measure, note, exe
Q2={decision, parole}
Since this partition will m
The 2nd partition p
P1={take} P2={male, s
isambiguation
retic approach (4/5)
p
ake, speak}
s translated by take, so
emple}
maximize I(P;Q)
speak, rise}
Supervised Di
An information-theor
Disambiguation
For the occurrence of the
the value of the indicator
If the value is in Q1, ass
if the value is in Q2, assi
isambiguation
retic approach (5/5)
e ambiguous word, determine
r
sign the occurrence to sense 1
ign the occurrence to sense 2
Dictionary-Based
If we have no informati
categorization of a word
Relying on the senses in
Disambiguation based o
Thesaurus-based disam
Disambiguation based o
second-language corpus
One sense per discourse
collocation
d Disambiguation
ion about the sense
d
dictionaries and thesauri.
on sense definitions
mbiguation
on translations in a
s
e, one sense per
Dictionary-Based
disambiguation-based on
D1,..., Dk the dictionar
sreepnrseessenste1 ,d..a.s, tshke of th
definition. bag
v j is the word occurring
Ethvej is the dictionary defi
sense definitions of
d Disambiguation
n sense definitions (1/3)
ry definitions of the
he ambiguous word w,
of words occurring
g in the context c of w
inition of v j (union of all
v j)
Dictionary-Based
disambiguation-based on
The algorithm:
Given a context c for a
For all senses s1,…,sk of
score(sk ) = overlap
that is, overlap ( word set o
word set of dictionary defi
end
Choose the sense with hig
d Disambiguation
n sense definitions (2/3)
word w
f w do
( Dk , U v j in c Ev j )
of dictionary definition of sense Sk,
inition of Vj in context c )
ghest score.
Dictionary-Based
disambiguation-based on
Example ( Two Senses
Senses
S1 tree at
S2 burned stuff the
com
Score
S1 S2
0 1 This cigar b
The ash is o
1 0 leaf.
d Disambiguation
n sense definitions (3/3)
of ash ):
Definition
tree of the olive family
e solid residue left when
mbustible material is burned
Context
burns slowly and creates a stiff ash
one of the last trees to com into
Dictionary-Based
Thesaurus based di
This exploits the semantic c
thesaurus like Roget’s.
Semantic categories of the
Ædecide the semantic cate
Æthen decide which word s
(Walker,1987):Each word
subject codes which corresp
meanings.
For each subject code, we
(from the context) having th
We select the subject code
count.
d Disambiguation
isambiguation (1/2)
categorization provided by a
words in a context
egory of the context
sense are used
d is assigned one or more
ponds to its different
count the number of words
he same subject code.
e corresponding to the highest
Dictionary-Based
Thesaurus based di
Walker’s Algorithm
comment : given con
for all senses sk of
∑score(sk ) = v j
end
choose s, s.t. s, = arg
The unit value is either
d Disambiguation
isambiguation (2/2)
ntext c
w do
j in c ζ (t(sk ), v j )
g max sk score(sk )
1 or 0
Dictionary-Based
disambiguation-based on tran
(1
This method makes use
correspondences in a bi
First language
The one for which we wa
Second language
Target language in the b
For example, if we want
based on German corpu
language, and the Germ
d Disambiguation
nslations in a second-language
1/3)
e of word
ilingual dictionary.
ant to do disambiguation
bilingual dictionary
t to disambiguate English
us, then English is the 1st
man is the 2nd language.
Dictionary-Based
disambiguation-based on tran
(2/
For the word “interest” :
Sense1
Definition legal sha
Translation Beteiligun
Collocation acquire an in
Translation Beteiligung e
d Disambiguation
nslations in a second-language
/3)
:
1 Sense2
are attention, concern
ng Interesse
nterest show interest
erwerben Interesse zeigen