<RULE S='I-NP' ACC='1.00' FREQ='17'> <W C='IN' W='the'/> </RULE>
<RULE S='I-NP' ACC='0.89' FREQ='8'> <W C='NNS' LEFT='1'/> <W C='VBP' /> <W C='VBD' RIGHT='1'/> </RULE>The first rule is a trivial example of wrong rule learned by the tagger. The second rule deals with the problematic of the noun/verb distinction. In this context, a finite verb (VBP) belongs to an NP. This corresponds to an error of tagging: the word is tagged VBP but has to be tagged NN, as in the following example:
[...] a/DT federal/JJ appeals/NNS court/VBP vacated/VBD an/DT earlier/JJR summary/NN judgment/NN [...]
The frequency of these rules is generally low (usually lesser than 5). This explains why it is important to learn rules with low frequency (Section 3). They repair noise.
It is then very important to use the same tagger used for tagging training data since a different tagger would not generate exactly the same errors. This might be a way to improve a specific tagger by learning its errors. We estimate that around 10% of the errors can be learned and fixed.
Second kind of rules: the lexicalized rules. As said in Section 3, the majority of these rules have a low frequency. They generally include the feature W (the word itself), as the following rule shows.
<RULE S='I-NP' FREQ='12'> <W C='VBG' W='operating'/> <W C='NN' RIGHT='1'/> </RULE>
This rule is learned because of the presence of frequent terms such as operating system and chief operating officer. It seems then that the utility of these lexicalized rules strongly depends on the domain of the corpus used as training data, such rules having little possibility of generalization. Rules without lexicalized items depend less on the corpus.