Automatic Term Relationship Cleaning and Refinement for AGROVOC Asanee Kawtrakula, Aurawan Imsombutb, Aree Thunyakijjanukitc, Dagobert Soergeld, Anita Liange, Margherita Sinif, Gudrun Johannseng, and Johannes Keizerh a,b,c Department of Computer Engineering, Kasetsart University, Bangkok, Thailand, {ak, aurawani}@vivaldi.cpe.ku.ac.th d College of Library and Information Services, University of Maryland, College Park, [email protected] e,f,g,h Food and Agriculture Organization (FAO) of the United Nations, Library & Documentation Systems Division, 00100 Rome, Italy, {anita.liang, margherita.sini, gudrun.johannsen, johannes.keizer}@fao.org |
|
Abstract AGROVOC is a multilingual thesaurus developed and maintained by the Food and Agricultural Organization of the United Nations. Like all thesauri, it contains some explicit semantics, which allow it to be transformed into an ontology or used as a resource for ontology construction. However, most thesauri, AGROVOC included, give very broad relationships that lack the semantic precision needed in an ontology. Many relationships in a thesaurus are incorrectly applied or defined too broadly. Accordingly, extracting ontological relationships from a thesaurus requires data cleaning and refinement of semantic relationships. This paper presents a hybrid approach for (semi-)automatically detecting these problematic relationships and for suggesting more precisely defined ones. The system consists of three main modules: Rule Acquisition, Detection and Suggestion, and Verification. The Refinement Rule Acquisition module is used to acquire rules specified by experts and through machine learning. The Detection and Suggestion module uses noun phrase analysis and WordNet alignment to detect incorrect relationships and to suggest more appropriate ones based on the application of the acquired rules. The Verification module is a tool for confirming the proposed relationships. We are currently trying to apply the learning system with some semantic relationships to test our method. Key words: AGROVOC, Data Cleaning, Semantic Relationship Refinement, Noun Phrase Analysis |
2. Structural Problems in AGROVOC
2.1 Incorrectly assigned relationships
2.2 Vaguely defined (underspecified) relationships
3. A Hybrid Approach to the Process of Cleaning and Refining Term Relationships
4. The Rule Acquisition Module: Expert-defined Rules and Learning by Example
5. The Detection and Suggestion Module: An Algorithm for Term Relationship Revision
5.1 Overview of the algorithm
5.2 Noun phrase analysis and WordNet alignment
5.3 The Verification Tool