Shallow Parsing using Noisy and Non-Stationary Training Material
Miles Osborne;
2(Mar):695-719, 2002.
Abstract
Shallow parsers are usually assumed to be trained on
noise-free
material, drawn from the same distribution as the testing
material. However, when either the training set is
noisy or else
drawn from a
different distributions,
performance may be degraded. Using the parsed Wall Street Journal, we
investigate the performance of four shallow parsers (maximum entropy,
memory-based learning, N-grams and ensemble learning) trained using
various types of artificially noisy material. Our first set of results show that shallow parsers
are surprisingly robust to synthetic noise, with performance gradually
decreasing as the rate of noise increases. Further
results show that no single shallow parser performs best in all noise
situations. Final results show that simple, parser-specific extensions
can improve noise-tolerance.
Our second set of results addresses the question of whether naturally
occurring disfluencies undermines performance more than does a change
in distribution. Results using the parsed Switchboard corpus suggest
that, although naturally
occurring disfluencies might harm performance, differences in
distribution between the training set and the testing set are more
significant.
[abs]
[pdf]
[ps.gz]
[ps]