This module acquires the semantic relationship rules that are used to suggest the appropriate relationships when the AGROVOC relationship is underspecified (defined too broadly), especially RT. The rules will be provided by experts and by machine learning.
As shown in Fig.2, some relationships between terms are presented in AGROVOC completely and consistently, if not as precisely as required for an ontology. In this case, the experts can simply define the rules for systematically revising the inappropriate relationships to the new one. The expert will observe the AGROVOC data and define rules using data on concept types given in AGROVOC as shown in Fig.2. For example, the rules constraint consists of the data in 'concept type data', the category of term such as GC (Geographic term: Country level), GG (Geographic term: above country level), TA (Taxonomic term: Animal), TP (Taxonomic term: Plant).
Based on the given rules, the relationship that satisfies the rule will be revised automatically. For example, consider the following rule:
If X and Y are marked as "T*" in the concept type field, and X BT Y then X<subclassOf> Y
From AGROVOC data, the concept types of Rosaceae and Malus are TP and they are related by BT. Then, the original relationship BT of "Malus BT Rosaceae" will be replaced by <subclassOf>.
Fig. 2. Examples of term relationships in AGROVOC that could be handled by revision rules formulated by experts
Many terms in AGROVOC Database do not have enough information for defining the rule. Moreover, some relationships, especially the relationship named RT, could be refined more precisely, as shown in Fig.3. In this case, the rules are prepared by learning from examples.
To prepare the learning examples, we provide an annotation tool that allows the domain expert to manually tag term senses (labelled by a sense id number in WordNet) and to specify the appropriate semantic relationship between them. For example, (Mutton#1 <madeFrom> Sheep#1).
In the case of compound nouns, only the noun heads are used. For example: Rice and Rice Flour will be annotated as follows: (Rice#1 <usedToMake> Flour#1)
After preparing the examples, the complete hypernym path of each term will be extracted from WordNet as in the following examples:
{sheep#1, bovid#1, ruminant#1, mammal#1,vertebreate#1, animal#1, organism#1, livingthing#1, object#1,entity#1}
{mutton#1, meat#1, food#2, solid#1, substance#1, entity#1}
Fig. 3 Some examples of appropriate relationships for learning the revision rules by examples
The hypernym list, given above, will be used as the basis of the features vector, i.e. features_vector{{list of hypernym class of term1},{list of hypernym class of term2}}
The features will be converted into binary representation for obtaining vectors of equal length. The learning system, C4.5, will be applied to learn the common ancestral concept for term1 and term2, and then generate the rules. Fig. 4 shows the example of the data set for training the <madeFrom> relationship. Table 3 shows the revision rules learnt from the training examples.
Fig. 4 Examples of hierarchical data used for training the 'usedToMake' relationship
Table 3 Examples of training statistical-based rule.
|
Rule |
Example |
1 |
If class X is animal#1 and class Y is meat#1, and X RT YThen X <UsedToMake> Y |
Sheep RT Mutton, Swine RT Pork, Calf RT Veal |
2 |
If class X is plant#2 and class Y is food#1, and X RT YThen X <usedToMake> Y |
Rice RT Rice flour, Oat RT Oatmeal, Sugar Cane RT Cane Sugar |
3 |
If class X is fruit#1 and class Y is oil#3, and X RT YThen X <usedToMake> Y |
Castor beans RT Castor oil, Cottonseed RT Cottonseed oil |
By applying the Rule 1, the original relationship RT of "Chicken RT Chicken meat " will be replaced by <usedToMake>.