Word sense disambiguation
By Wikipedia,
the free encyclopedia,
http://en.wikipedia.org/wiki/Word_sense_disambiguation
Get the List of 5,400+ Translation Agencies Now! No Recurring Membership Fees!
In computational
linguistics, word sense disambiguation (WSD)
is the process of identifying which sense
of a word
is used in any given sentence,
when the word has a number of distinct senses.
For example, consider two examples of the distinct senses
that exist for the word bass:
- a type of fish
- tones of low frequency
and the sentences:
- I went fishing for some sea bass
- The bass line of the song is very moving
To a human, it is obvious that the first sentence is using
the word bass, as in the former sense above and in
the second sentence, the word bass is being used
as in the latter sense below. Developing algorithms
to replicate this human ability can often be a difficult
task.
Difficulties
One problem with word sense disambiguation is deciding
what the senses are. In cases like the word bass
above, at least some senses are obviously different. In
other cases, however, the different senses can be closely
related (one meaning being a metaphorical
or metonymic
extension of another), and in such cases division of words
into senses becomes much more difficult. Different dictionaries
will provide different divisions of words into senses. One
solution some researchers have used is to choose a particular
dictionary, and just use its set of senses. Generally, however,
research results using broad distinctions in senses have
been much better than those using narrow, so most researchers
ignore the fine-grained distinctions in their work.
Another problem is inter-judge variance.
WSD systems are normally tested by having their results
on a task compared against those of a human. However, humans
do not agree on the task at hand — give a list of senses
and sentences, and humans will not always agree on which
word belongs in which sense. A computer cannot be expected
to give better performance on such a task than a human (indeed,
since the human serves as the standard, the computer being
better than the human is incoherent), so the human performance
serves as an upper bound. Human performance, however, is
much better on coarse-grained than fine-grained distinctions,
so this again is why research on coarse-grained distinctions
is most useful.
Approaches
As in all natural
language processing, there are two main approaches to
WSD — deep approaches and shallow approaches.
Deep approaches presume access to a comprehensive body
of world
knowledge. Knowledge, such as "you can go fishing for
a type of fish, but not for low frequency sounds" and "songs
have low frequency sounds as parts, but not types of fish",
is then used to determine in which sense the word is used.
These approaches are not very successful in practice, mainly
because such a body of knowledge does not exist in a computer-readable
format, outside of very limited domains. However, if such
knowledge did exist, then deep approaches would be much
more accurate than the shallow approaches. Also, there is
a long tradition in computational
linguistics, of trying such approaches in terms of coded
knowledge and in some cases, it is hard to say clearly whether
the knowledge involved is linguistic or world knowledge.
The first attempt was that by Margaret Masterman and her
colleagues, at the Cambridge Language Research Unit in England,
in the 1950s. This attempt used as data a punched-card version
of Roget's Thesaurus and its numbered "heads", as an indicator
of topics and looked for repetitions in text, using a set
intersection algorithm. It was not very successful, as is
described in some detail in (Wilks, Y. et al., 1996), but
had strong relationships to later work, especially Yarowsky's
machine learning optimisation of a thesaurus method in the
1990s.
Shallow approaches don't try to understand the text. They
just consider the surrounding words, using information such
as "if bass has words sea or fishing
nearby, it probably is in the fish sense; if bass
has the words music or song nearby, it is
probably in the music sense." These rules can be automatically
derived by the computer, using a training corpus of words
tagged with their word senses. This approach, while theoretically
not as powerful as deep approaches, gives superior results
in practice, due to the computer's limited world knowledge.
Though, it can be confused by sentences, like The dogs
bark at the tree, which contains the word bark
near both tree and dogs.
These approaches normally work by defining a window of
N content words around each word to be disambiguated
in the corpus, and statistically analyzing those N
surrounding words. Two shallow approaches used to train
and then disambiguate are Naïve
Bayes classifiers and decision
trees. In recent research, kernel based methods
such as support
vector machines have shown superior performance in supervised
learning. But over the last few years, there hasn't
been any major improvement in performance of any of these
methods.
It is instructive to compare the word sense disambiguation
problem with the problem of part-of-speech
tagging. Both involve disambiguating or tagging with
words, be it with senses or parts of speech. However, algorithms
used for one do not tend to work well for the other, mainly
because the part of speech of a word is primarily determined
by the immediately adjacent one to three words, whereas
the sense of a word may be determined by words further away.
The success rate for part-of-speech tagging algorithms is
at present much higher than that for WSD, state-of-the art
being around 95% accuracy or better, as compared to less
than 75% accuracy in word sense disambiguation with supervised
learning. These figures are typical for English, and may
be very different from those for other languages.
Another aspect of word sense disambiguation that differentiates
it from part-of-speech tagging is the availability of training
data. While it is relatively easy to assign parts of speech
to text, training people to tag senses is far more difficult
[1].
While users can memorize all of the possible parts of speech
a word can take, it is impossible for individuals to memorize
all of the senses a word can take. Thus, many word sense
disambiguation algorithms use semi-supervised
learning, which allows both labeled and unlabeled data.
The Yarowsky algorithm was an early example of such an algorithm.
Yarowsky’s unsupervised
algorithm uses the ‘One sense per collocation’ and the
‘One sense per discourse’ properties of human languages
for word sense disambiguation. From observation, words tend
to exhibit only one sense in most given discourse and in
a given collocation. The corpus is initially untagged.
The algorithm starts with a large corpus, in which it identifies
examples of the given polysemous word, and stores all the
relevant sentences as lines. For instance, Yarowsky uses
the word ‘plant’ in his 1995 paper to demonstrate the algorithm.
Assume that there are two possible senses of the word, the
next step is to identify a small number of seed collocations
representative of each sense, give each sense a label, i.e.
sense A and B, then assign the appropriate label to all
training examples containing the seed collocations. In this
case, the words ‘life’ and ‘manufacturing’ are chosen as
initial seed collocations for sense A and B respectively.
The residual examples (85% - 98% according to Yarowsky)
remain untagged.
The algorithm should initially choose seed collocations
representative that will distinguish sense A and B accurately
and productively. This can be done by selecting seed words
from a dictionary’s entry for that sense. The collocations
tend to have stronger effect if they are adjacent to the
target word, the effect weakens with distance. According
to the criteria given in Yarowsky (1993), seed words that
appear in the most reliable collocational relationships
with the target word will be selected. The effect is much
stronger for words in a predicate-argument relationship
than for arbitrary associations at the same distance to
the target word, and is much stronger for collocations with
content words than with function words. Having said this,
a collocation word can have several collocational relationships
with the target word throughout the corpus. This could give
the word different rankings or even different classifications.
Alternatively, it can be done by identifying a single defining
collocate for each class, and using for seeds only those
contexts containing one of these defining words. A publicly
available database called WordNet can be used as an automatic
source for such defining terms. In addition, words that
occur near the target word in great frequency can be selected
as seed collocations representative. This approach is not
fully automatic, a human judge must decide which word will
be selected for each target word’s sense, the outputs will
be reliable indicators of the senses.
A decision-list algorithm is then used to identify other
reliable collocations. This training algorithm calculates
the probability P(Sense | Collocation), and the decision
list is ranked by the log-likelihood ratio:
Log( P(SenseA | Collocationi)
/ P(SenseB | Collocationi) )
A smoothing
algorithm will then be used to avoid 0 values. The decision-list
algorithm resolves many problems in a large set of non-independent
evidence source by using only the most reliable piece of
evidence rather than the whole matching collocation set.
The new resulting classifier will then be applied to the
whole sample set. Add those examples in the residual that
are tagged as A or B with probability above a reasonable
threshold to the seed sets. Apply the decision-list algorithm
and the above adding step iteratively. As more newly-learned
collocations are added to the seed sets, the sense A or
sense B set will grow, and the original residual will shrink.
However, these collocations stay in the seed sets only if
their probability of classification remains above the threshold,
otherwise they are returned to the residual for later classification.
At the end of each iteration, the ‘One sense per discourse’
property can be used to help preventing initially mistagged
collocates and hence improving the purity of the seed sets.
In order to avoid strong collocates becoming indicators
for the wrong class, the class-inclusion threshold needs
to be randomly altered. For the same purpose, after intermediate
convergence the algorithm will also need to increase the
width of the context window.
The algorithm will continue to iterate until no more reliable
collocations are found. The ‘One sense per discourse’ property
can be used here for error correction. For a target word
that has a binary sense partition, if the occurrences of
the majority sense A exceed that of the minor sense B by
a certain threshold, the minority ones will be relabeled
as A. According to Yarowsky, for any sense to be clearly
dominant, the occurrences of the target word should not
be less than 4.
When the algorithm converges on a stable residual set,
a final decision list of the target word is obtained. The
most reliable collocations are at the top of the new list
instead of the original seed words. The original untagged
corpus is then tagged with sense labels and probabilities.
The final decision list may now be applied to new data,
the collocation with the highest rank in the list is used
to classify the new data. For example, if the highest ranking
collocation of the target word in the new data set is of
sense A, then the target word is classified as sense A.
See also
Notes
- ^
Fellbaum,
Christiane 1997. Analysis of a handtagging task. Proceedings
of ANLP-97 Workshop on Tagging Text with Lexical Semantics:
Why, What, and How? Washington D.C., USA.
References
- Wilks, Y., Slator, B., Guthrie, L. (1996) Electric Words:
dictionaries, computers and meanings. Cambridge, MA: MIT
Press.
- X.Y.Chou, (2007), Yarowsky’s unsupervised algorithm,
Oxford Computing Lab.
External links
Source: http://en.wikipedia.org/wiki/Word_sense_disambiguation
Published - November 2008
Information from Wikipedia
is available under the terms of the GNU Free Documentation
License
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice counts!
|