Next: Conclusion and Further Work Up: Learning Rules and Their Previous: What Kind of Rules?

Related Work

We can find in the literature two systems using explicitly or not this notion of exceptions: TBL, already presented Section 3 and FOIDL presented in [Mooney and Califf(1995)]. We present now FOIDL.

FOIDL is based on FOIL, [, see][]Quinlan90, and differs from it in two points: it employs intentional background knowledge, and avoids the need for explicit negative examples, using the output completeness hypothesis. The system outputs a decision list in the reverse order. But the most interesting point for us is the way this decision list is used in order to learn exceptions. The argument in favor of this reverse order is that the first rules learned tend to be more general and can be considered as defaults rules, and can then be positioned towards the end of the list [, see][]webb93learning. FOIDL uses this mechanism in order to learn exceptions.

The algorithm generating exceptions is quite different to ALLiS' one. Whenever a learned rule (or clause in the FOIDL terminology) does not meet the minimum clause-accuracy threshold (similar to our $\theta$ ), the positive examples covered by the rule are prepended in the decision list. It is unclear whether the frequency of these examples has to be greater than 1, the minimum number of examples that a rule must cover⁷. If the minimum clause-accuracy threshold is met, exceptions can be learned using this algorithm: if the negatives covered by the rule are all examples that are covered by previously learned rules, FOIDL returns them to the set of positives examples to be covered by subsequently learned rules. As well as ALLiS, exceptions to rules are processed through a simple modification of the set of positive and negative examples, even if the modifications are not comparable. But The mechanism used in FOIDL postpones the learning of the exceptions to the rule, and does not explicitly link both as ALLiS does.

Note that the threshold called the minimum clause-accuracy threshold, which corresponds to our $\theta$ , is also set to 0.50, but without further explanations. We empirically showed (Section 3) that the best result is obtained with this value. FOIDL also includes a parameter for the minimum number of examples that a rule must cover, also set to 2. As ALLiS, the algorithm can also learn exception to the exception to the rule. Unlike ALLiS there is no explicit relation between a rule and its exceptions.

Due to computational reasons, we were unfortunately unable to train FOIDL with our training data, which is too large. [Mooney and Califf(1995)] already mentioned memory limitations for some experiments, and used only 500 examples as training data, when our training data contains 500,000 words. This problem seems to be mainly due to the mechanism used for handling output completeness hypothesis. Remember that FOIDL learns first-order decision lists and uses Inductive Logic Programming methods, when ALLiS 'only' learns propositional rules (attribute-value formalism). ILP formalism is far more powerful than propositional one.

If we go back to TBL, it can also be viewed as handling exceptions in a similar way FOIDL does. Once a rule is learned, this rule is applied to the training data, potentially generating errors. Then subsequent rules can transform (correct) these errors. These subsequent rules can be considered as exceptions to the rule generating the errors.

We think it is important to stress that, among many differences, these three systems, ALLiS, FOIDL and TBL share a common point at high-level: working on Natural Languages, where noise and/or exceptions in data are quite frequent, they had to develop an exception mechanism for handling these phenomena.

Next: Conclusion and Further Work Up: Learning Rules and Their Previous: What Kind of Rules?

Hammerton J. 2002-03-13