An Empirical Study of the Use of Relevance Information in Inductive Logic Programming

Ashwin Srinivasan, Ross D. King, Michael E. Bain; 4(Jul):369-383, 2003.


Inductive Logic Programming (ILP) systems construct models for data using domain-specific background information. When using these systems, it is typically assumed that sufficient human expertise is at hand to rule out irrelevant background information. Such irrelevant information can, and typically does, hinder an ILP system's search for good models. Here, we provide evidence that if expertise is available that can provide a partial-ordering on sets of background predicates in terms of relevance to the analysis task, then this can be used to good effect by an ILP system. In particular, using data from biochemical domains, we investigate an incremental strategy of including sets of predicates in decreasing order of relevance. Results obtained suggest that: (a) the incremental approach identifies, in substantially less time, a model that is comparable in predictive accuracy to that obtained with all background information in place; and (b) the incremental approach using the relevance ordering performs better than one that does not (that is, one that adds sets of predicates randomly). For a practitioner concerned with use of ILP, the implication of these findings are two-fold: (1) when not all background information can be used at once (either due to limitations of the ILP system, or the nature of the domain) expert assessment of the relevance of background predicates can assist substantially in the construction of good models; and (2) good "first-cut" results can be obtained quickly by a simple exclusion of information known to be less relevant.