ANNEX 4: Data and definitions for Chapter 3

A. URCA data definitions and framework

The Urban Rural Catchment Areas (URCA) is a publicly available global geospatial dataset which provides a global mapping of the rural–urban continuum.36, 37 It is based on the Global Human Settlement Layer21 and places urban centres on a gradient based on population size and density. As shown in Chapter 3 (Figure A of Box 2) rural locations are assigned a gradient of their own, using the shortest travel time to urban centres of various sizes. Thus, the URCA disaggregates rural areas into multiple categories, distinguishing, for example, between locations that are less than 1 hour from an urban centre and those that are farther away. In Chapter 4, the URCA dataset is combined with household survey data for the country case studies.

The URCA approach builds upon the central place theory, which is a set of assumptions and propositions that explain why hierarchically tiered centres are found at certain favoured locations on the economic landscape. For example, retail trade and service activities often tend to cluster. The URCA approach assumes that city size is a proxy for the breadth of services and opportunities provided by an urban centre. It uses travel time to locations as a proxy for cost and adopts an urban hierarchy based on city size to classify rural locations as gravitating around a specific urban centre. This approach allows for: i) capturing the urban hierarchy that exists between urban centres of different sizes in terms of access to services and employment opportunities for rural locations; ii) defining urban–rural catchment areas (URCAs) in terms of the interconnection between urban centres (of different sizes) and their surrounding rural areas; and iii) adopting a gridded approach that is easily comparable across countries, developing a dataset for the whole world.

Additionally, the URCA approach allows for the identification of the share of the population that falls in a specific category of the rural–urban continuum within an administrative unit, rather than placing all the population in one territory or functional area. This categorization allows for more detailed analyses regarding consumption and production across the continuum. Table A4.1 describes the basic urban URCA categories; consequently, different categories of rural are attributed to urban areas of different sizes, e.g. rural areas less than 1 hour travel to a city of more than 5 million people.

Table A4.1URCA definition of categories across the rural–urban continuum

A table lists the URCA definition of categories across the rural–urban continuum.
NOTE: * Considered as either hinterland or dispersed towns, being that they do not gravitate around any urban agglomeration, and are hence not part of the rural–urban continuum.
SOURCE: FAO. 2021. Global Urban Rural Catchment Areas (URCA) Grid – 2021. In: FAO. [Cited 4 May 2023]. https://data.apps.fao.org/?share=g-3c88219e20d55c7ce70c8b3b0459001a

In defining the rural URCA categories based on travel time to an urban agglomerations, the time interval is to be considered as a closed interval on the right. In particular, for the URCA categories used in the report it means that:

  • “<1 hour” to any urban centre includes areas located 1 hour or less to a city of any size or town: areas ≤1 hour.
  • “1–2 hours” to any urban centre includes areas located more than 1 hour but less or equal to 2 hours to a city of any size or town: 1 hour < area ≤2 hours.
  • “>2 hours” to any urban centre includes areas located more than 2 hour to a city of any size or town: areas >2 hours.

Note that for improved readability of the text and figures in Chapter 4, this degree of specificity applies, but is not written at this level of detail.

B. Methodological approach and tool for the systematic structural literature review

The systematic review of evidence from scientific studies used for Chapter 3, designed following the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA),4 was implemented using an integrated research tool, Expert Search Semantic ENriChmEnt (Essence), developed by the FAO Data Lab.

Essence is based on a web application that offers the possibility of automatically querying scientific articles from multiple data sources (Google Scholar, World Bank, International Monetary Fund, etc.). These articles, including their full text, are then stored and made available for review through a semantic search engine utilizing the Apache Solr database at its core. This allows for the aggregation and filtering of results by selecting values automatically identified when the documents are downloaded or by exploiting annotations added collaboratively.

Advanced methods were used from the tool’s web interface, which permitted the filtering of downloaded documents through an algorithm based on an artificial intelligence method that learns and extends user selections of relevant articles. The approach relies on the manual revision of a small subset of documents that are identified as relevant, or not, by the users to be used as a source of ground truth. A preliminary text pre-processing and learning step was then executed directly from the web interface, in order to estimate and generalize the linking function between the content (i.e. terms) of the reviewed documents and their relevance status. The learning step was based on linear logistic regression, which is a classification algorithm used to solve binary classification problems. The logistic regression classifier uses a weighted combination of the input features (the terms in the Tf-idf matrix) and passes them through a sigmoid function that transforms any real number input to a number between 0 and 1. The weights of the combinations are then estimated to minimize the distance between the output of the function and the user’s specification of the relevance of the reviewed documents. After this step, the resulting function was applied to all the documents that were downloaded (and also those not reviewed), which were associated to a “score of relevance.” A threshold made it possible to classify all the documents that were downloaded and not manually reviewed as “relevant”.

Through this iterative process, it was possible to revise the literature in few passages and rely on the features available directly from the Essence Web interface. This is because the proposed relevance score for the non-user-evaluated documents becomes a filter, permitting users to quickly identify and review the most likely relevant documents and add new examples that could help the algorithm to better identify those that are relevant to the set of documents used in the learning step. This iterative process helps users filter out the most relevant documents and helps improve the accuracy of the model so that it is better able to make predictions on the relevance of a document.

For a full description of the implementation of the PRISMA protocol, and the methodological approach for the systematic structural literature review, see de Bruin and Holleman (2023).18

back to top