The AGRIS DTD provides a set of elements, refinements and schemes for describing and enforcing the structure that makes up the XML format of a bibliographic record. It is essential, when creating or exporting AGRIS AP compliant XML documents, to append the header shown below.
<?xml version="1.0" encoding="UTF-8"?> |
All XML documents must declare that they are XML documents by writing the following XML declaration:
<?xml version="1.0" encoding="UTF-8"?> |
This line tells a software that receives the XML data file that you are writing XML and that it should match the file to the XML specification for version 1.0. We shall tackle the encoding issue later in this document. As this is not actually an XML tag containing data, it does not require a closing tag and must be at the beginning of the document.
When marking up documents using a DTD, it is a standard practice to include a DOCTYPE declaration so that the processing tools 'know' which DTD the document being processed conforms to. When an XML document is validated against the DTD by a validating XML parser, the XML document will be checked to ensure that all required elements are present and that no undeclared elements have been added. The hierarchical structure of elements defined in the DTD must be maintained. The values of all attributes will be checked to ensure that they fall within defined guidelines. In short, every detail of the XML document from top to bottom will be defined and validated by the DTD. This facilitates the process of ensuring uniformity among groups of XML documents, such as those harvested by the AGRIS repository from distributed centres from around the world.
<!DOCTYPE ags:resources SYSTEM "http://purl.org/agmes/agrisap/dtd/"> |
The above DOCTYPE declaration for an AGRIS resource document, marked up using the AGRIS DTD, indicates that the document type is ags:resources and that it conforms to the DTD. Requiring that an XML document be validated against the AGRIS DTD ensures the integrity of the data structure. XML documents may be parsed and validated before they are ever loaded by an application.
This declaration points to a PURL (Persistent Uniform Resource Locator), which facilitates the validation, provided that the computer is connected to the Internet. If not, the DTD included in the appendix should be used to validate the XML document.
The namespace declarations should be the next line following the XML DTD reference. Because documents may contain multiple namespaces, and because the possibility of collisions between prefixes exists, namespaces allow developers to map prefixes to URIs for elements and their contents, not just document-wide.
<ags:resources |
xmlns:ags="http://purl.org/agmes/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:agls="http://www.naa.gov.au/recordkeeping/gov_online/agls/1.2"> |
In the above example, there are four namespace declarations: ags, dc, dcterms and agls. In general, a namespace uniquely identifies a set of names or tags so that there is no ambiguity when tags having different origins but the same names are mixed together. Thus, dcterms:citation is different from ags:citation.
The structures used to describe the “AGRIS” class of documents are Text Only (dc:type, dc:source, etc.), Element only (dc:citation, agls:availability, etc.) and Mixed content (dc:title or dc:relation). All attributes are data character strings (CDATA) with the exception of ags:ARN, which is a unique identifier for the root element ags:resource (ID) and the reserved attribute xml:lang which, where applied, should be constrained to the three-letter ISO639-2 language code12.
Within the DTD, cardinality of the elements is indicated with the following cardinality operators.
(no indicator) |
Required |
One and only one |
+ |
Required, repeatable |
One or more |
? |
Optional |
None or one |
* |
Optional, repeatable |
None, one, or more |
This section will explain how to encode each element, refinement and scheme to create well-formed XML elements. Each table describes the content model of the element, a template explaining how the content should be tagged, the attributes and if the attribute is required.
This attribute replaces the previous AGRIS field for Temporary Record Number (TRN). It has an ID validity constraint that provides uniqueness to an AGRIS resource. It is therefore essential that a unique numbering system be used to differentiate between two records. ARN is mandatory for all records submitted to AGRIS. The format used for this required attribute is made of 12 characters, divided into three groups. A typical ARN will contain:
This is the root element and it contains all the other core elements and qualifiers. Five of the core elements are mandatory, namely title, date, subject, language and availability information. It is the most important element, as it contains the rest of the document and becomes synonymous with the document type.
XML content model |
(dc:title+, dc:creator*, dc:publisher*, dc:date+, dc:subject+, dc:description*, dc:identifier*, dc:type*, dc:format*, dc:language+, dc:relation*, agls:availability+, dc:source?, dc:coverage*, dc:rights*, ags:citation*) | |
XML tag |
<ags:resource ags:ARN="XF2004000244"> </ags:resource> | |
XML attributes/schemes |
ags:ARN (See 4.3.1). |
required |
Enter in this element the title of the document. Enter also, if available, the translated title of the resource (dcterms:alternative).
XML content model |
(#PCDATA | dcterms:alternative)* | |
XML tag |
<dc:title xml:lang="eng">title of resource | |
XML attributes/schemes |
xml:lang |
required |
This element describes all entities (Agents) that handle the resource, i.e. creating or contributing. It may include a person (ags:creatorPersonal); an organization, a service or an agency (ags:creatorCorporate); or a conference (ags:creatorConference).
XML content model |
(ags:creatorPersonal | ags:creatorCorporate | ags:creatorConference)* | |
XML tag |
<dc:creator> | |
XML attributes/schemes |
- |
Enter in the two refinement elements the information about the publisher. These elements provide the name of the individual, group, or organization which controls or publishes the item (ags:publisherName) and its location (ags:publisherPlace).
XML content model |
(ags:publisherName | ags:publisherPlace)* | |
XML tag |
<dc:publisher> | |
XML attributes/schemes |
- |
Enter in this element the date when the resource was made available. dc:date must be used together with its qualifier (dcterms:dateIssued).
XML content model |
dc:date (dcterms:dateIssued) | |
XML tag |
<dc:date> | |
XML attributes/schemes |
scheme (dcterms:W3CDTF) |
Enter in this element the subject information about the resource. It can be free-text (dc:subject), come from a classification scheme (ags:subjectClassification) or from a controlled vocabulary (ags:subjectThesaurus).
XML content model |
(#PCDATA | ags:subjectClassification | ags:subjectThesaurus)* | |
XML tag |
<dc:subject> | |
XML attributes/schemes |
ags:subjectClassification |
required required
|
This element indicates different descriptive aspects of the resource. These may include a brief statement, annotation, comment, or elucidation concerning any aspect of the resource (ags:descriptionNotes); formally designated version of the data set or information resource being described (ags:descriptionEdition); or an abstract as a summary of a document designed to give the user a clearer idea about the document’s contents (dcterms:abstract).
XML content model |
(ags:descriptionNotes | ags:descriptionEdition | dcterms:abstract)* | |
XML tag |
<dc:description> | |
XML attributes/schemes |
dcterms:abstract |
optional |
The identifiers help locate or/and identify a resource. There can be many numbers assigned to a document. This element is reserved for standard numbers taken from the item. Some of the numbers may be input in authorized form. For web resources, the URI (electronic address starting with: for ex. http:// or ftp://) is also placed in this element. Numbers assigned by cataloguing institutions for internal purposes are not entered here, but placed into the agls:availability field.
XML content model |
dc:identifier (#PCDATA) | |
XML tag |
<dc:identifier scheme="ags:DOI">DOI id</dc:identifier> | |
XML attributes/schemes |
scheme (ags:IPC | ags:RN | ags:PN | ags:ISBN | ags:JN | dcterms:URI | ags:DOI) |
optional |
Although it is not mandatory, the value of this element should be provided when possible. It explains the nature or genre of the content of the resource and also helps to describe the general categories, functions, genres, or aggregation levels for the content of the resource.
If possible, select the dc:type values from the DCMI Type list13. If using a local type controlled vocabulary, make sure there is no code but instead whole words that describe the genre of the resource.
XML content model |
dc:type (#PCDATA) | |
XML tag |
<dc:type>DC Types controlled vocabularies</dc:type> | |
XML attributes/schemes |
scheme (dcterms:DCMIType) |
optional |
The extent element (dcterms:extent) is used to indicate the size or duration of the resource. The medium element (dcterms:medium) is used to indicate the material or physical carrier of the resource.
XML content model |
dc:format (dcterms:extent | dcterms:medium)* | |
XML tag |
<dc:format> | |
XML attributes/schemes |
- |
For this element, it is recommended to enter the three letter code from ISO639-214. If your local system does not allow you to provide the 3 letter code, enter the two letter code, indicating the scheme as ISO639-115. If a language does not have a code in the selected scheme, enter the full form of the language without indicating the scheme.
XML content model |
dc:language (#PCDATA) | |
XML tag |
<dc:language scheme="ISO639-1">language of resource</dc:language> | |
XML attributes/schemes |
scheme (ISO639-1 | ISO639-2) |
optional |
This element is used to link one resource to another. It allows the establishment of various relationships between resources and for users to locate related resources. When using relation element, it is important to establish the type of relationship by choosing a value from one side of any of the following pairs of relation refinement types.
Availability provides users with a number or code that is uniquely associated with an item, and serves to identify that item within an organization. This number is normally assigned by the organization that holds the item. Since this is local information, availability must include the name or code identifying the institution or repository (ags:availabilityLocation) in which the item is housed and the local number (ags:availabilityNumber) with which the resource is locally accessed.
XML content model |
agls:availability (ags:availabilityLocation, ags:availabilityNumber)* | |
XML tag |
<agls:availability> | |
XML attributes/schemes |
- |
This element (dc:source) provides the reference to a resource of which the current resource is a part. When cataloguing the analytic, this element is used to provide information for identification of the Monograph. Source information that can go into this element includes title of the whole, creators of the whole, etc.
XML content model |
dc:source (#PCDATA) | |
XML tag |
<dc:source>additional information of resource</dc:source> | |
XML attributes/schemes |
- |
This element (dc:coverage) provides information about the geographical (dc:spatial) and temporal (dc:temporal) coverage of the resource.
XML content model |
dc:coverage (#PCDATA, dc:spatial, dc:temporal) | |
XML tag |
<dc:coverage>additional information of resource | |
XML attributes/schemes |
dcterms:spatial |
optional optional |
This element is used to provide a simple human-readable statement of who holds rights over a resource.
XML content model |
dc:rights (#PCDATA, ags:rightsStatement, ags:rightsTermsOfUse) | |
XML tag |
<dc:rights> | |
XML attributes/schemes |
|
This is a mandatory entry when the resource is part of a serial. A serial is defined as a publication, usually having numerical or chronological label, and intended to be continued indefinitely. It may be made available on any medium and is issued in successive parts.
XML content model |
ags:citation (ags:citationTitle | ags:citationIdentifier | ags:citationNumber | ags:citationChronology)* | |
XML tag |
<ags:citation> | |
XML attributes/schemes |
ags:citationTitle |
optional required |
11 In APPENDIX B an instance of an AGRIS AP XML document
12 Codes for the Representation of Names of Languages at http://www.loc.gov/standards/iso639-2/langcodes.html
13
http://dublincore.org/documents/dcmi-type-vocabulary
14 ISO639-2 the three letter language codes http://www.loc.gov/standards/iso639-2/langcodes.html
15 ISO639-1 and ISO639-2 codes
http://www.loc.gov/standards/iso639-2/codechanges.html