Next: The ALLiS Algorithm Up: Learning Rules and Their Previous: Learning Rules and Their

Introduction

For now more than a decade, Natural Language has been a common domain of application for machine learning algorithms. In this article is presented a top-down inductive system, ALLiS, for learning linguistic structures. Two difficulties came up during the development of the system: the presence of a significant amount of noise in the data and the presence of exceptions (linguistically motivated) in the encoding schemata. It is then a challenge for an inductive system to learn rules from this kind of data. This problem is not new and was addressed among others by [Quinlan(1986)] and [Brunk and Pazzani(1991)]. Noise occurs in training data when a datum is not assigned to the correct class. But noise is not the only problem: Natural Language is an object which contains many sub-regularities and exceptions which can not be considered as noise.

This leads us to add a specific mechanism for facing such problems: refinement. Whenever a rule is learned, exceptions to this rule are systematically searched. The result of this algorithm is a set of rules where each of them is associated with a set of exceptions. In the first part of this article, we will evaluate the usefulness of this device and will show that it improves results when learning linguistic structures. We will also show that, with the use of refinement, some traditional problems occurring when learning set of rules such as threshold determination fall if one uses appropriate prior knowledge.

In a second part, we explore a second way for improving the efficiency of the system by using prior knowledge. Since Natural Language is a strongly structured object, it may be important to investigate whether structural linguistic knowledge can help to make natural language learning more efficiently and accurately. The utility of (prior) knowledge has been shown with inductive systems [see][]pazzani92,cardie99integrating. This article presents some experiments, trying to answer this question: What kind of linguistic knowledge can improve learning?

This article is articulated as follows: the inductive learning system ALLiS is described and a first estimation using no prior knowledge is proposed. Results of this experiment without linguistic knowledge will be used as the baseline in order to appraise the effect of the prior knowledge. This linguistic prior knowledge is then detailed, and we will discuss its (positive) effect from a computational viewpoint as well as from a qualitative viewpoint. The system has been applied to the shared task of the CoNLL'00 workshop. We then provide a quantitative and qualitative analysis of these results. Finally we compare our algorithm with related systems, especially FOIDL.

A description of the Upenn tagset used along the article is given Appendix 8.

Next: The ALLiS Algorithm Up: Learning Rules and Their Previous: Learning Rules and Their

Hammerton J. 2002-03-13