Learning Semantic Lexicons from a Part-of-Speech and Semantically Tagged Corpus Using Inductive Logic Programming
Vincent Claveau, Pascale Sébillot, Cécile Fabre, Pierrette Bouillon; 4(Aug):493-525, 2003.
Abstract
This paper describes an inductive logic programming learning method designed to
acquire from a corpus specific Noun-Verb (N-V) pairs---relevant in information
retrieval applications to perform index expansion---in
order to build up semantic lexicons based on Pustejovsky's generative lexicon
(GL) principles (Pustejovsky, 1995). In one of the components of this lexical model,
called the
qualia structure, words are described in terms of semantic
roles. For example, the
telic role indicates the purpose or function of
an item (
cut for
knife), the agentive role its creation mode
(
build for
house), etc. The qualia structure of a noun is
mainly made up of verbal associations, encoding relational information. The
learning method enables us to
automatically extract, from a morpho-syntactically and semantically tagged
corpus, N-V pairs whose elements are linked by one of the semantic relations
defined in the qualia structure in GL. It also infers rules explaining what in
the surrounding context distinguishes such pairs from others also
found in sentences of the corpus but which are not relevant. Stress is put here on the
learning efficiency that is required to be able to deal with all the available
contextual information, and to produce linguistically meaningful rules.
[abs][pdf][ps.gz][ps]