A wide range of base quantities, units and modes of expression is required in a food composition database, as determined by the specific uses of the data. Generally, compositional data are expressed as mass quantity, for which the kilogram (kg) is the base unit (BIPM, 2003; NIST, 2003b). For food composition purposes, this is understood to be weight and, by convention, the data are usually reported per 100 g edible portion. However, data may be expressed on other bases such as serving sizes or domestic units, per 100 ml or per kg, or on the basis of energy (e.g. nutrients per 1 000 kJ), protein (amino acids per 100 g protein), nitrogen (amino acids per g N), total lipid (fatty acids per g total fatty acids) and others.
In principle, all specialized user databases can be derived from a main comprehensive reference database. The ways in which data are held and manipulated within any computer data management system, determined by the preferred operating system or data management routine, are not discussed here. However, compilers of a food composition database should be aware of several general issues relating to data capture and data documentation.
The following suggestions are made for data values.
These should be carefully documented so that the primary source of the data can be traced and the analytical methods used identified.
It is virtually impossible to have complete data sets for all nutrients. It is essential that the database identifies missing values and alerts the user whenever food items with missing values are selected for entry or retrieval. This is particularly important in software programs in which calculated nutrient intakes (or calculated nutrient composition of recipes) with missing values need to be flagged for the attention of the user. Missing values must never be assigned a zero value.
Zero may be used when it has been shown analytically that a constituent is not present in the food sample. Strictly speaking, the use of “zero” means that any amount present is below the detection or quantification limits of the method of measurement used. Although zero may be used to indicate the amount present is below the nutritionally significant level, it is however preferable to use the designation “trace” in these circumstances. An exception is where there is good reason to believe that none of the constituent is present, for example vitamin B12 in plant foods. In these cases, analyses may not be required and the source or origin of the values may be referred to as “assumed” or “presumed” zero.
“Trace” signifies that the constituent is present, but at a level that cannot be measured adequately. It may also be used when the level is judged to be nutritionally insignificant. It is desirable to define these limits in the database documentation. In many food composition databases trace is expressed as “T” or “tr” and it often represents the only acceptable nonnumeric entry in a data value field. Table 9.1 contains some suggestions regarding more formal limits for the various constituents based, albeit intuitively, on the methods in current use.
In certain circumstances an estimated or imputed value, based on a similar food, may be substituted for a missing analytical value (see Chapter 1). Each imputed value should be fully documented for data type and source/origin.
Values derived by calculation are often used for mixed food dishes, recipes and some processed foods. Such foods should be distinguished by a statement to this effect in the description, and a field should be provided with a list of the ingredient food records used in the calculations. All values should be fully documented for data type and source/origin.
If food composition database systems are to be compatible, the mode of data expression must be formalized (Klensin et al., 1989). In most cases, the basis for this should be long-standing nutritional conventions or international agreement on the preferred usage. For cases in which agreement has not been reached, the guidelines in this chapter suggest the most widely used conventions. Interchange and compatibility of data would be facilitated if the data were also more uniformly expressed in original data sources.
Table 9.1 Modes of expression of food composition values in reference and user databases (per 100 g edible portion of food) |
|||||
Constituent
|
Unit
|
Number of significant digits |
Suggested limits in database |
Trace = less than |
|
Value |
Limit |
||||
Energy |
kJ (kcal) |
3 |
1 999 |
±1 |
0.6 |
>1000 |
±10 |
6 |
|||
Major constituents (water, protein, fat, carbohydrates, dietary fibre, alcohol, organic acids) |
g |
3 |
|
±0.1 |
0.06 |
Amino acids |
mg |
3 |
|
±0.1 |
0.06 |
Fatty acids |
g |
3 |
|
±0.1 |
0.06 |
mg |
3 |
|
±0.1 |
0.06 |
|
Cholesterol |
mg |
3 |
|
±1 |
0.6 |
Inorganic constituents
|
mg |
3 |
1 9 |
± 0.1 |
0.06 |
mg |
3 |
10 99 |
±1 |
|
|
mg |
3 |
>100 |
±10 |
|
|
mg |
2 |
100 1000 |
±10 |
6 |
|
Vitamins |
|||||
Vitamin A |
|||||
Retinol |
mg |
3 |
±1 |
0.6 |
|
Carotenes |
mg |
3 |
±1 |
0.6 |
|
Vitamin D |
mg |
2 |
±0.1 |
0.06 |
|
Vitamin E |
|||||
Tocopherols |
mg |
2 |
±0.01 |
0.006 |
|
Vitamin K |
mg |
2 |
±0.1 |
0.06 |
|
Group B vitamins | |||||
Thiamin |
mg |
2 |
±0.01 |
0.006 |
|
Riboflavin |
mg |
2 |
±0.01 |
0.006 |
|
Niacin |
mg |
2 |
±0.01 |
0.006 |
|
Vitamin B6 |
mg |
2 |
±0.01 |
0.006 |
|
Pantothenic acid |
mg |
2 |
±0.01 |
0.006 |
|
Biotin |
mg |
2 |
±0.01 |
0.006 |
|
Vitamin B12 |
mg |
2 |
±0.01 |
0.006 |
|
Folates |
mg |
2 |
±0.1 |
0.06 |
|
Vitamin C |
mg |
3 |
±0.1 |
0.06 |
The basis of expression should be chosen to fit the specific use of the database. The most common basis is g per 100 g of edible portion of food, although expression in terms of portion size or household measures is appropriate for many special-purpose user databases. Expression per kg is less convenient for users and can involve the use of greater numbers of significant figures than can be justified (see below). It is proposed that the 100 g basis be used for food composition data and databases, except for special-purpose databases and certain other items identified below.
Edible portion is itself a value that should be recorded in the database. It refers to the proportion of edible part in the raw food as collected or purchased, expressed on the basis of weight. The proportion of edible matter in cooked food is often expressed on the basis of the raw food.
Since liquid foods are frequently measured by volume, expression on a 100 g or 100 ml basis could be used. It is desirable to record the density of these foods so that appropriate conversions can be made. Liquids with a high viscosity are usually measured by weight, making this the preferred mode of expression.
The last digit cited in the value should reflect the precision of the analysis and values should not be cited in such a way as to give a false impression of the precision with which a constituent can be measured. Because foods vary in composition, it is also fundamentally incorrect to cite values that imply that the composition is defined to a higher level than its natural variation. Significant digits should not be confused with the number of decimal places in a value. For example, the numbers 123, 12.3, 1.23, 0.123, and 0.0123 all have three significant digits.
Values for nutrients may be reported with more significant figures in the data source than are needed in a database. When capturing the data the figures are entered without any rounding. At higher levels of data management it is desirable to retain one more significant digit than is necessary in the user database, as outlined in Table 9.1. Where values are being summed for statistical purposes, the conventional rounding rules are appropriate, with even values ending in the digit 5 being rounded down (e.g. 0.25 becomes 0.2) and uneven numbers rounded up (e.g. 0.55 becomes 0.6) to avoid significant bias (Snedecor, 1956). It should be remembered, however, that digits beyond those indicated in Table 9.1 may have little analytical meaning and are of minimal nutritional significance.
While nomenclature for foods is of crucial importance (Chapter 3), the topic is too wide to be considered here. Food nomenclature, classification and description systems include Eurocode (Arab, Wittler and Schettler, 1987), LanguaL (McCann et al., 1988; Feinberg, Ireland-Ripert and Favier, 1991) and INFOODS (Truswell et al., 1991). Some authors have evaluated and compared the various systems for their advantages and disadvantages (Burlingame, 1998; Ireland and Møller, 2000). Food classification systems can also be based on the Codex Alimentarius, the FAO Agricultural Statistics Databases, the Harmonized System for Trade and the UN System for Classification of Individual Consumption According to Purpose (COICOP). Descriptions and links for all these nomenclature and classification systems can be found on the INFOODS Web site (INFOODS, 2003).
Nomenclature for nutrients (see Chapters 4, 6 and 7) is in the main formalized; the following guidelines are based on international conventions.
Edible matter refers to the proportion of edible matter in the raw food as collected or purchased, expressed on the basis of weight. The proportion of edible matter in cooked food is often expressed on the basis of the raw food.
Water content (moisture content) values are method-dependent (Chapters 6 and 7), but for the most part the differences are of minor nutritional significance. Freeze-drying is the exception; residual water content from this method can affect the accuracy of all other results expressed on a wet-weight basis.
Nitrogen (total) is usually measured by the Kjeldahl or Dumas methods or a modification of these methods.
Protein is usually a calculated value, derived from the total nitrogen value multiplied by a nitrogen conversion factor. Food-specific factors have been elaborated, based on the nature and composition of the proteins contained in different materials (Jones, 1931). The specific factor for almonds is 5.18, while the specific factor for milk is 6.38. Jones' factors are still widely used in food composition work (see Table 7.3). In the absence of food-specific factors, the general factor of 6.25 is applied. Some food composition databases use the general factor exclusively for all protein calculations, and in many countries/regions, food-labelling regulations require the use of the general factor (EC, 1990). All other methods for measuring protein are still calibrated against this type of value. It may be useful also to include in a food composition database protein calculated by both specific factors and the factor 6.25. For some applications,
e.g. the formulation of diets against dietary requirements, the factor 6.25 is more appropriate because this is the factor used to derive protein requirements (FAO/WHO/UNU, 1985).
It has been proposed on several occasions (Southgate, 1974; Southgate and Greenfield, 1992; Salo-Väänänen and Koivistoinen, 1996) that protein definitions and methods of determination should be redefined. Many believe that the sum of the amino acids is the most appropriate representation of the protein content of foods (Salo-Väänänen and Koivistoinen, 1996). In all cases, the factor and the nitrogen values should be included in the reference database.
Fat (total) refers to the total lipid in a foodstuff, including triacylglycerols. The values are highly dependent on the method used. In the United States, the NLEA (Federal Register, 1990) and FDA (Federal Register, 1993) defined “total fat” as the sum of fatty acids expressed as triglyceride (sic) for nutrition labelling purposes (FDA, 2001).
Total carbohydrate (total “by difference”) is an unsatisfactory expression that should be phased out (FAO/WHO, 1998). It is a derived value, obtained by subtracting the percentages of water, protein, fat and ash from 100 to give the percentage of carbohydrate “by difference”. It includes all the non-carbohydrate material not analysed in the other proximate analyses and the cumulative errors from the other measurements. However, some food composition databases also subtract alcohol values for relevant foods.
Available carbohydrate is defined as the sum of free sugars (glucose, fructose, sucrose, lactose, maltose), starch, dextrins and glycogen. In reference databases it is useful to include the individual carbohydrate components separately in addition to the summated values for total available (glycemic) carbohydrate. In reference databases it is useful to include the individual carbohydrate species separately in addition to the summated values for total available (glycemic) carbohydrate. Values for the individual species are increasingly being given in user databases, in addition to those for total available carbohydrate. Available carbohydrate and its fractions can be expressed as weight (i.e., anhydrous form) or as monosaccharide equivalents (i.e., including the water of hydration). Available carbohydrate can also be calculated “by difference”, by subtracting a dietary fibre value, preferably “total dietary fibre”, from total carbohydrate by difference.
Dietary fibre is the focus of considerable scientific dispute in terms of the methods for its measurement. As the values are method-dependent they therefore need to be identified by the method used. The most widely used method is probably the AOAC total dietary fibre (TDF) method (see Chapter 7), but more specific definitions have also been used, for example the sum of the non-starch polysaccharides and lignin. If the non-starch polysaccharides approach is used, it may be preferable to use this term to identify the values in the database.
Ash (total) refers to the residue after incineration of organic matter. Values are method-dependent, but differences are of little nutritional significance.
Because it is rare to measure proximate or major constituents to an accuracy greater than ±1 percent, three significant figures are a maximum; values should be limited to 0.1 g/100 g, with “trace” defined as less than 0.06 g/100 g.
For inorganic constituents the appropriate elemental names or symbols are used. INFOODS tagnames are equivalent to atomic symbols for elements. Measurement to a precision of ±1 percent is extremely satisfactory, but may not be possible with trace constituents. The limits suggested in Table 9.1 are based on expected analytical limits combined with accepted levels of nutritional significance.
Vitamin is the term used when there are several active forms of an agent with a defined physiological activity, “vitamers” (see Chapter 7). The International Union of Nutritional Sciences (IUNS, 1978) system should be used to record defined chemical species. In the reference database, the values should be listed for each vitamer separately (e.g. the individual carotenoids). Values for total vitamin A activity and total vitamin D activity are calculated values and are therefore best restricted to the user databases, and the factors used in the calculation should be clearly specified. Over time, conversion factors for vitamer activities are likely to change, requiring a recalculation from the individual vitamer data in the reference database. Equivalences given in Chapter 7 should be used for conversion from international units. In general, methods for measuring vitamins are somewhat less precise than those used for inorganic analyses. The limits of expression are shown in Table 9.1. Expression to three significant figures is seen as a reasonable level for citation.
Amino acids are referred to by the approved trivial names, or three-letter symbols that are equivalent to INFOODS tagnames. At the reference level, amino acids are usually expressed as mg per g of nitrogen or as g per 16 g nitrogen (approximately 100 g protein), but at the user database level, expression as mg/100 g of food is useful. As with fatty acids, it is often useful to have both modes of expression available for comparative evaluation at all levels of the database system.
If amino acid values at the reference level are expressed in relation to total nitrogen, non-protein and non-amino-acid nitrogen should be deducted from the total nitrogen in order to express values as mg/100 g of food. Expression to three significant figures is seen as appropriate for amino acids cited as mg.
Fatty acids are listed with the chain length and double bond numbers. Systematic names may be needed to define values for specific isomeric fatty acids. Some of the more important isomers, e.g. trans isomers, should be included in the user database. At the data source and reference database levels, values for individual fatty acids are usually expressed as percentages of total fatty acids since this is the most common form of analytical presentation. At the user database level, values per 100 g of food are required. At all levels of data management both modes of expression are useful for comparative evaluation. A conversion factor derived from the proportion of the total lipid present as fatty acids is required (Paul and Southgate, 1978) for converting percentages of total fatty acids to fatty acids per 100 g of food (Table 9.2). For fatty acids expressed in g per 100 g total fatty acids, precision is best limited to the
0.1 g/100 glevel, with trace being set at <0.06 g/100 g total fatty acids. Other constituents are referred to by the recognized chemical terms, using either trivial or systematic names depending on common usage.
Energy value refers to a value for metabolizable energy, derived by calculation from energy-yielding constituents using energy conversion factors (see Chapter 7). The energy values of foods in the user database are often derived by application of conversion factors to the values for proximate or energy-supplying constituents. Direct determination of gross energy values (i.e. heats of combustion) may be useful for some purposes; however, these values cannot be compared with values for metabolizable energy as used in nutrition.
Table 9.2 Conversion factors to be applied to total fat to give values for total fatty acids in the fat | |||
Food |
Factor |
Food |
Factor |
Wheat, barley and rye1 |
|
Beef 3 |
|
wholegrain |
0.72 |
lean |
0.916 |
flour |
0.67 |
fat |
0.953 |
bran |
0.82 |
Lamb, take as beef |
|
Oats, whole1 |
0.94 |
Pork4 |
|
Rice, milled1 |
0.85 |
lean |
0.910 |
Milk and milk products |
0.945 |
fat |
0.953 |
Eggs2 |
0.83 |
Poultry |
0.945 |
Fats and oils, all except coconut |
0.956 |
Brain4 |
0.561 |
Coconut oil |
0.942 |
Heart4 |
0.789 |
Vegetables and fruit |
0.80 |
Kidney4 |
0.747 |
Avocado pears |
0.956 |
Liver4 |
0.741 |
Nuts |
0.956 |
Fish5 |
|
|
|
fatty |
0.90 |
|
|
white |
0.70 |
Sources:
|
It is important not to imply great accuracy in the citation of energy values. The convention is based on the following questionable assumptions:
Attempts have been made to derive specific factors for individual foods or food groups, recognizing assumptions a) and c) (Merrill and Watt, 1955), but not b) or d) (Southgate and Durnin, 1970).
Energy values should not be cited to more than three significant digits with a limit of 1 kcal or kJ.