The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Home Explore NHANES Dietary Web Tutorial_397 pages

NHANES Dietary Web Tutorial_397 pages

Like this book? You can publish your book online for free in a few minutes!

Download PDF

Related Publications

Discover the best professional documents and content resources in AnyFlip Document Base.

Published by smlneyman, 2019-01-16 01:35:47

NHANES Dietary Web Tutorial_397 pages

Pages:

NHANES Dietary Web Tutorial_397 pages

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Preparing/FormatLabel/intro.htm

Format & Label Variables

Purpose

Formats and labels are user-defined tools that provide a convenient way to describe variables in your SAS or SUDAAN output.
Although adding formats or labels to your variables is optional, it is often helpful when reviewing the output from your
analyses. In this module, you will learn about defining and assigning formats and labels (#1) to the variables in your dataset.

Task 1: Define and Assign Formats and Labels

Formatting and labeling variables in SAS is optional and does not need to be done for all variables in the dataset. However, it is
especially useful for frequently used variables and for clarity in your output.

Key Concepts about Formats and Labels in NHANES (/nchs/tutorials/Dietary/Preparing/FormatLabel/Info1.htm)
How to Format and Label NHANES Variables (/nchs/tutorials/Dietary/Preparing/FormatLabel/Task1.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:19 PM

12/19/2018 NHANES Dietary Web Tutorial: Format & Label Variables: Formats and Labels in NHANES

Print Text!

Task 1: Key Concepts about Formats and Labels in NHANES

Formats and labels are user-defined tools that provide a convenient way to describe variables and their numeric values in
SAS or SUDAAN output. The use of formatting and labeling is optional, but investigators often rely on these tools because
they help in keeping track of frequently used variables and they add clarity to programming output.

Formatting is used to assign descriptive text names to numeric and character values of a variable. For example, you can
create a format that you name “YESNO.” In this case, “Yes” represents values of 1 and “No” represents values of 2. You
can then apply this format to a variable in the dataset that has the same response categories (i.e., 1 and 2). As a result, in
your output, all of the 1 values of that variable are represented by “Yes” and all of the 2 values of that variable are
represented by “No.”

Labeling, on the other hand, allows you to assign descriptive titles to variable names. As explained in the Locating
Variables module, variables have names that are an abbreviated series of letters or letters and numbers (e.g., RIDAGEYR,
DI1ICALC). Labeling is a way to flesh out this shorthand with some explanatory detail. For example, the variable
"FOODGRP" can be given the label "Broad food group – based on first digit of USDA food code."

The distinction between formatting and labeling variables is that formats are applied to the values of a variable, whereas
labels are applied to a variable name. Formats must be explicitly defined in the code, and this step is usually done at the
beginning of a program. Labels can be added at any point in the code.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/FormatLabel/Info1.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Format & Label Variables: Task 1

Print Text!

Task 1: How to Format and Label NHANES Variables

Here are the steps to formatting and labeling NHANES variables:

name, define, and apply custom formats; and
apply labels to variables.

Step 1: Name, Define, and Apply Custom Formats

To create custom formats for your dataset, you will need to use the PROC FORMAT procedure. Using the VALUE
statement, you first assign a name to a format. Then, you use descriptive text to define the values of the format. Note that
all assigned text names for the values must be surrounded by single quotation marks in order to be applied properly.

The sample code, which comes from the "Food Sources" program, below shows how to name and define a custom
format. This example uses the format FOODGRPF. (Note that you can assign any name you choose, so long as it meets
the SAS specifications for a valid format name. See the SAS manual for more information.) This format defines values 1
through 9, with each value representing one of the broad food groups based on the USDA food code. FOODGRPF will be
applied as a format to the variable FOODGRP later in this step.

Program to Name and Define a Custom Format

Sample Code

*--------------------------------------------------; ;
* The PROC FORMAT procedure assigns text names
* to the numeric values of the FOODGRPF format. ;
*--------------------------------------------------;

proc format ;
value FOODGRPF
1 = "Milk & Milk Products"
2 = "Meat, Poultry, Fish & Mixtures"
3 = "Eggs"
4 = "Legumes, Nuts and Seeds"
5 = "Grain Products"
6 = "Fruits"
7 = "Vegetables"
8 = "Fats, Oils & Salad Dressings"
9 = "Sugar, Sweeteners & Beverages" ;

run ;

After you have named and defined a format, you apply it to selected variables using the FORMAT statement in the data
step of your code. Applying a format to a variable allows you to determine how the values will look in the output (e.g., food
group 1 will be represented by the text “Milk and Milk Products,” food group 2 will be represented by the text “Meat,
Poultry, Fish & Mixtures”). When assigning formats to variables, note that format names always come directly after
variable names and MUST end with a period, as shown in the sample code below.

Program to Apply a Custom Format

Sample Code

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/FormatLabel/Task1.htm 1/3

12/19/2018 NHANES Dietary Web Tutorial: Format & Label Variables: Task 1

*--------------------------------------------------------------;
* The FORMAT statement applies the format FOODGRPF to FOODGRP ;
*--------------------------------------------------------------;

data FDSRC;
set FDSRC;
format FOODGRP FOODGRPF. ;

run ;

IMPORTANT NOTE

In each of the Preparing an Analytic Dataset modules beginning with this module, you will be working with temporary
datasets, which are saved in the WORK folder of your SAS program. The dataset exists only as long as your SAS session
and is deleted when you exit the program. If you would like to save these datasets so that you can return to them at a later
time, you can learn how to do this in the Save a Dataset module at the end of this course.

Step 2: Apply Labels to Variables

Variables are given a text description using a LABEL statement. One way to do this is by using a SAS data step, as shown
below in the sample code from the “Food Sources” program. User-defined labels should always be surrounded by
quotation marks.

Program to Label Variables

Sample Code

*----------------------------------------------------;

* The LABEL statement applies a text description ;

* to the variable FOODGRP ;

*----------------------------------------------------;

data FDSRC;
set FDSRC;
label FOODGRP = "Broad food grp based on 1st digit
of USDA food code" ;

run ;

View animation of program and output
Can't view the demonstration? Try our Tech Tips for troubleshooting help.

IMPORTANT NOTE

In each of the Preparing an Analytic Dataset modules, you will be working with temporary datasets, which are saved in the
WORK folder of your SAS program. The dataset exists only as long as your SAS session and is deleted when you exit the
program. If you would like to save these datasets so that you can return to them at a later time, you can learn how to do
this in the Save a Dataset module at the end of this course.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/FormatLabel/Task1.htm 2/3

12/19/2018 NHANES Dietary Web Tutorial: Format & Label Variables: Task 1

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/FormatLabel/Task1.htm 3/3

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Preparing/Save/intro.htm

Save a Dataset

Purpose

In this module, you will learn how to create a permanent dataset in a SAS library (#1) . This will allow you to save the
temporary dataset that you have been working with, as a permanent file on your computer so you can continue your analysis at
a later time.

Task 1: Create a Permanent Dataset in a SAS Library

In the previous modules you worked with a temporary dataset that exists only as long as your SAS session. In this module
you'll take that temporary dataset and save it as a permanent file in your SAS library which will be stored between SAS sessions.

Key Concepts about Saving and NHANES Dataset (/nchs/tutorials/Dietary/Preparing/Save/Info1.htm)
How to Save an NHANES Dataset (/nchs/tutorials/Dietary/Preparing/Save/Task1.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:19 PM

12/19/2018 NHANES Dietary Web Tutorial: Save a Dataset: Save an NHANES Dataset

Print Text!

Task 1: Key Concepts About Saving NHANES Datasets

Temporary datasets exist in the WORK folder of your SAS program and are deleted when you exit the program.
Permanent datasets are saved in a SAS library and are stored between SAS sessions. A library is a folder that you
designate on your computer to store your SAS files.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Save/Info1.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Save a Dataset: Task 1

Print Text!

Task 1: How to Save NHANES Datasets

Here are the steps to save an NHANES dataset:

save as a permanent dataset; and
check that dataset was saved to SAS-accessible library

Step 1: Save as a Permanent Dataset

In order to save a temporary dataset as a permanent dataset in a SAS library, you will need to use the DATA and SET
statements. The sample code below, which comes from the “Supplement” program, shows how to save the temporary
dataset, DEMOOST, as a permanent SAS dataset, called OSTEO_ANALYSIS_DATA. This permanent dataset will be
saved in the SAS library, NH, which was created in the Download Data Files Module (please note that if you set up your
file structure differently than what was presented in that module, you will need to adjust your SAS program accordingly).

Program to Save a Temporary Dataset

Sample Code

*---------------------------------------------------------------------;

* Use the LIBNAME statement to specify the SAS library that your ;

* permanent dataset will be saved to (NH). Use the DATA statement to ;

* define the dataset that will be stored in the NH library. Use the ;

* SET statement to refer to the temporary dataset. ;

*---------------------------------------------------------------------;

libname NH "C:\NHANES\DATA" ;
data NH.OSETO_ANALYSIS_DATA;

set DEMOOSTS;
run ;

Step 2: Check that Dataset was Saved to SAS-accessible Library

To check that your dataset was saved to the NH library:

Open the SAS Explorer.
When the Explorer opens, open the Library icon.

Double-click on NH library .The contents of NH library will be displayed in the right-hand window labeled "Contents" and
should include your new permanent dataset, OSTEO_ANALYSIS_DATA.

View animation of program 1/1
Can't view the demonstration? Try our Tech Tips for troubleshooting help.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Save/Task1.htm

NOTES

UNIT 3

BASIC DIETARY ANALYSES

NOTES

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Basic/StatisticalConsideration...

Important Statistical Considerations Regarding Dietary Data Analyses

Purpose

Before beginning an analysis of dietary data, it is important to have a basic understanding of some key statistical principles that
may affect the results. This module gives an overview of measurement error, with a special emphasis on error related to day-
to-day variation in intakes. The module reviews how to check for data symmetry and provides an outline of practical
considerations for data analysis, including data sources that are appropriate for different types of analysis.

Task 1: Explain Usual Intake and Day-to-Day Variation in Dietary Intakes

Twenty-four hour recalls are a primary source of dietary data in NHANES. However, one or two recalls do not accurately
reflect an individual’s true usual (long-run average) intake. Because long-run average or “usual” intake is most often the
measurement of interest, statistical adjustments are often necessary.

Key Concepts about Understanding Usual Intake and Day-to-Day Variation in Dietary Intakes (/nchs/tutorials/Dietary/Basic
/StatisticalConsiderations/Info1.htm)

Task 2: Define Measurement Error

This section describes the basic concepts of both random and systematic measurement error and provides examples of each in
dietary data.

Key Concepts about Understanding Measurement Error (/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info2.htm)

Task 3: Check for Data Symmetry

Many statistical procedures are based on the assumption that data are normally distributed, and therefore, symmetrically
distributed. However, the distribution of dietary intake data is often skewed because, for any given dietary component
measured on a single day, many people may have zero intake and at least some people may have very large intakes. Therefore,
it is important to check dietary data for symmetry.

Key Concepts about Checking for Data Symmetry (/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info3.htm)

Task 4: Identify Analytic Implications of Different Types of Data

Data from the dietary recalls, food frequency questionnaire, and supplement questionnaire each measure different things,
cover different time periods, and are collected differently. Because of this, these various types of data lend themselves to
different types of analyses and each type of analysis requires different statistical assumptions.

Key Concepts about Identifying Analytic Implications of Different Types of Data (/nchs/tutorials/Dietary/Basic
/StatisticalConsiderations/Info4.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:19 PM

12/19/2018 NHANES Dietary Web Tutorial:Important Statistical Considerations Regarding Dietary Data Analyses: Explain Usual intake and Day-to-…

Print Text!

Task 1: Key Concepts about Understanding Usual Intake and Day-to-Day
Variation in Dietary Intakes

Occasionally, dietary research focuses on an acute exposure—that is, intake at a given point in time—such as when
tracking the outbreak of a food borne illness. Generally, however, for most surveillance, epidemiologic, and behavioral
research purposes, dietary analyses are concerned with measuring usual intake—that is, long-term average daily intake.
This is because dietary recommendations are intended to be met over time and diet-health hypotheses are based on
dietary intakes over the long term.

As noted in previous courses, the main instrument for gathering dietary intake data in the NHANES is the dietary recall.
Dietary recalls are rich in details regarding every item consumed (when, how, how much, with what), and for this reason
are considered the main instrument for estimating food and nutrient intakes for the population. However, because they
only cover one or two 24-hour periods, they represent only a snapshot in time, rather than usual intake, and some of these
snapshots are not typical diet days for the individual.

Due to this day-to-day variation, one (or even a few) 24-hour recall(s) cannot be considered as an accurate assessment of
an individual’s true usual intake. Therefore, because long-run, or usual, dietary intake is most often the measurement of
interest, statistical adjustments are often necessary. One exception to this is that the mean of the population’s
distribution of usual intake can be estimated from a sample of individuals’ 24-hour recalls, without sophisticated
statistical adjustment. For more advanced analyses, such as estimating the percentiles of a distribution, sophisticated
adjustment techniques are needed. See the Advanced Dietary Analyses course for more information.

The problem of estimating usual intake from 24-hour recalls can be thought of as a measurement error issue. For more
information on measurement error, see “Task 2: Key Concepts about Understanding Measurement Error.”

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info1.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial:Important Statistical Considerations Regarding Dietary Data Analyses: Understand Measurement Error

Print Text!

Task 2: Key Concepts about Understanding Measurement Error

Types of Measurement Error

Statisticians understand that every survey measurement is an estimate of the true value of the thing being measured—
whether it is dietary intake, physical activity, or some physiologic indicator such as blood pressure. They call the difference
between the measurement and the true value “measurement error,” but in this context, “error” does not mean “mistake.”
Rather, measurement error is understood to be an inherent part of data collection and analysis. Nonetheless, because
truth is the ideal, survey researchers attempt to minimize measurement error when collecting data, and statisticians adjust
for existing error to minimize its effects.

Measurement error can be either random (non-systematic) or biased (systematic). Random error is non-systematic
because it contributes variability but does not influence the sample average. Bias, on the other hand, occurs when
measurements consistently depart in the same direction from the true value.

Figure 1. Examples of bias and/or error

All sampled data contain random errors; some of these are positive and some are negative, but they balance out. For
example, individuals do not consume exactly the same amount of energy every day; yet, there is some true usual amount
of energy that they consume over time. If we could obtain perfectly recalled 24-hour dietary data from survey participants,
we would assume that each recall measures the individual’s usual intake with some random error—i.e., that some recalls
will be greater than usual and others less than usual, but that on average they approximate the true usual intake.
Unfortunately, however, the inaccuracies inherent in self-reported intakes are not purely random, and thus, bias is
introduced.

Bias is potentially more serious than random error because it affects the mean of the sample, and can result in incorrect
conclusions and estimates. The same degree of bias may occur across all individuals in a sample, or differential bias can
be associated with a particular characteristic. For example, there is a general tendency across the population to under-
report dietary intake, on both recalls and food frequency questionnaires. This tendency varies by body weight status of the

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info2.htm 1/3

12/19/2018 NHANES Dietary Web Tutorial:Important Statistical Considerations Regarding Dietary Data Analyses: Understand Measurement Error

individual, such that overweight individuals under-report to a greater degree than do normal weight persons (the small
percentage of the population that is underweight actually has a tendency to over-report their intakes).

Examples of Measurement Error in Dietary Data

The table below shows examples of random error and bias that can be found in each of the major types of dietary data.

Examples of Measurement Error in Dietary Data

Dietary Data Random Error Bias
Type
Individuals tend to eat more on some days Generalized bias toward
Dietary Recall than others, so some 24-hour recalls will under-reporting of total
Data reflect greater-than-usual intakes while energy
others will reflect less-than-usual amounts May be differential under-
Standard nutrient values associated with reporting of some foods
foods in the database may be somewhat and nutrients and not
higher or lower than actual amounts of others; not much is known
nutrients consumed about this

Difficulties with cognition and recall lead to Generalized bias toward
inaccurate and imprecise reporting under-reporting of total
energy
Food May be differential under-
Frequency reporting of some foods
Questionnaire and nutrients and not
others; not much is known
Data about this

Dietary Slight variations in the nutrient content of Differences between actual
Supplement supplements from dose to dose dietary supplement
Reporting errors formulations and what the
Data database states

Implications of Measurement Error in Dietary Data

Measurement error in dietary data has several practical implications. Measurement error can seriously attenuate the
relationship between dietary data and other factors, such as a health outcome. That is, the analyses would be less likely
to indicate a relationship between diet and disease even if one truly existed.

Moreover, as shown in Figure 2 below, the relatively large within-person variation (among the days) in 24-hour recall data,
if left unadjusted, leads to distributions of intake that are wider (red curve below) than distributions of true usual intake
(blue curve below). Because the single-day distribution includes unusual days—such as days of feasting and days of
fasting—the red curve stretches further in each direction, causing it to be flatter and wider than the distribution of true usual
(long-run average) intakes.

Finally, the tendency toward underreporting, at least in energy intakes, indicates that reported intakes are also generally
less than true intakes. This underreporting is demonstrated by the fact that the blue curve is to the right of the red curve.

Figure 2. Relationship between reported intake, estimated usual intake, and true usual intake*

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info2.htm 2/3

12/19/2018 NHANES Dietary Web Tutorial:Important Statistical Considerations Regarding Dietary Data Analyses: Understand Measurement Error

*Note: This is a conceptual drawing, not a depiction of real data.

Some of these problems have been addressed with statistical methods of adjustment. Measurement error models can be
used to analyze diet-disease relationships, and methods have been developed to estimate usual intakes that adjust for the
problems associated with large within-person variation. Unfortunately, no standard adjustment currently exists for
correcting for underreporting bias. Therefore, these models and methods require an assumption that 24-hour recalls are
unbiased for usual intake, in spite of biomarker-based evidence to the contrary. Nonetheless, these are the best methods
available and represent state-of-the-art practice. For this reason, it is important to acknowledge these caveats when
reporting analyses.

The green curve in the figure above shows an estimated distribution of intake corrected for within-individual variability
(random error) but not for underreporting (bias). Note that the means of the green and red curves are the same, even
though the overall shapes are different. The sample analyses in this course capitalize on this fact, in that unadjusted
means of the reported intakes are interpreted as the means of the population distribution of usual intake. More
sophisticated techniques are needed to estimate the entire distribution of usual intake, rather than just its mean.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info2.htm 3/3

12/19/2018 NHANES Dietary Web Tutorial:Important Statistical Considerations Regarding Dietary Data Analyses: Check for Data Symmetry

Print Text!

Task 3: Key Concepts about Checking for Data Symmetry

An underlying assumption in many statistical analyses is that the distribution of the data is normal. However, as shown in
the figure below, almost all distributions of dietary data tend to be skewed. For many dietary constituents, a large number
of persons may have zero intakes and a few people may have very large intakes.

Figure 1. Hypothetical distributions of dietary intake in the population

These distributional characteristics are especially important when estimating the distribution of usual intakes. Skewness
does not affect simple analyses such as those described in the Basic Dietary Analysis Course. Therefore, no special
corrections are necessary to estimate means of usual intake distributions and statistics based on those means, such as
differences between population subgroups. However, for more complex analyses, such as those described in the
Advanced Dietary Analyses course, skewness must be taken into account. For more information on this topic, please see
the Advanced Dietary Analyses course.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info3.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial:Important Statistical Considerations Regarding Dietary Data Analyses: Check for Data Symmetry

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info3.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial:Important Statistical Considerations Regarding Dietary Data Analyses: Identify Analytic Implications of Di…

Print Text!

Task 4: Key Concepts about Identifying Analytic Implications of Different
Types of Data

It is important to keep in mind that data from the dietary recalls, food frequency questionnaire, and supplement
questionnaire each measure different aspects of dietary intake, cover different time periods, and are collected differently.
The fact that each instrument measures a different aspect of dietary intake affects whether or not it can legitimately be
used alone in analyses. The recall data and supplement questionnaire data can be used alone, but each one represents
only a portion of nutrient intake. Neither one alone is sufficient to estimate total nutrient intakes (from both foods and
supplements). Similarly, food frequency data were not designed to be used alone to estimate absolute intakes of foods or
nutrients even though some analytic applications allow them to be used alone. In general, the NHANES food frequency
data are meant to be used as supplementary (covariate) information in modeling data from the 24-hour recalls to estimate
usual intakes when examining them in relation to some other variable of interest.

The reference period also is different for each type of dietary data. Dietary recalls cover intake for a given day—
specifically, the previous 24 hours—although these data can be used to estimate usual intake as well. Supplement data
cover intakes from the previous month, and the food frequency questionnaire covers the previous year.

Each type of dietary data is also gathered differently, which could lead to differential cognition (comprehension, recall,
decisions & judgment, and response processes) and how individuals respond. The recall data are gathered by a trained
interviewer who probes about the previous day’s intake, capturing the details of the day’s eating using a multiple pass
method. The Day 1 recall data are collected by a personal interview in the Mobile Examination Center and the Day 2 recall
are collected by a telephone interview. The supplement data also are gathered by a trained interviewer asking a series of
questions about type and amount. The FFQ data are gathered by respondents completing a mailed questionnaire that
asks about frequency of intake for a list of foods.

Because of these differences, the various types of dietary data lend themselves to different types of analyses and require
different assumptions (see table below). For example, a single 24-hour recall is sufficient for analyzing mean nutrient
intakes from foods and beverages, whereas both days of data are required when estimating the distribution and
prevalence of nutrient intake from foods and beverages. There may be a sequence effect—that is, that the number and
amount of foods is sometimes less on the first versus subsequent recalls—so when an analysis calls for both 24-hour
recalls, you may want to control for this by adding a variable for recall day (first versus second) to the statistical analysis.

Components used in means and ratios of group-level means analyses

Dietary Component Data Source

Food Intakes 24HR (single
day)
Nutrient intakes from
foods/beverages 24HR (single
Nutrient intakes from day)
supplements
Supplement
data

Assumptions and other issues using the 24HR and Supplement data in means and group-level means analyses:

Assumption that the 24HR has no bias
Following that assumption, unadjusted mean of reported intakes can reflect mean of the population distribution of
usual intake
Outliers may affect standard errors/confidence intervals

Components used in distribution and prevalence analyses

Dietary Component Data
Source
Nutrient intakes from
foods/beverages 24HR (two
days)
https://www.cdc.gov/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info4.htm
1/2

12/19/2018 NHANES Dietary Web Tutorial:Important Statistical Considerations Regarding Dietary Data Analyses: Identify Analytic Implications of Di…

Nutrient intakes from supplements or Supplement
prevalence of supplement intake data

Food intakes 24HR (two
days)

Assumptions and other issues using the 24HR and Supplement data in distribution and prevalence analyses:

Assumption that the 24HR has no bias
Estimation of population distribution of usual intake requires statistical modeling (see Advanced Dietary Analyses
Course)

Components used in correlation and regression analyses

Dietary Component Data Source
Nutrient intakes from 24HR (two days) and
foods/beverages FFQ data1
Nutrient intakes from
supplements Supplement data

Food intakes 24HR (two days) and
FFQ data

1 Note that the NHANES FFQ is unlike other FFQs in that it was never intended to be used alone for epidemiological
analyses.

Assumptions and other issues using the 24HR and Supplement data in correlation and regression analyses:

Assumption that the 24HR has no bias
Estimation of population distribution of usual intake requires statistical modeling (see Advanced Dietary Analyses
Course)

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/StatisticalConsiderations/Info4.htm 2/2

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Basic/EstimateVariance/intro.htm

Calculate Variance, Analyze Subgroups, and Calculate Degrees of Freedom

Purpose

It is often necessary to determine the precision of a point estimate. The variance of an estimate, or its standard error, is
generally used to calculate this precision, and degrees of freedom are used to measure the reliability of a variance estimate.
Variance estimation requires special procedures, and even more so when analyzing subgroups. This module introduces the
basic concepts of variance estimation for NHANES data. You will learn how the multi-stage probability sample design of
NHANES, including the geographic clustering of the sampled individuals, affect variance estimation and which methods are
appropriate to use when calculating variance for NHANES data. You will also learn how to use SUDAAN and SAS syntax to
properly analyze subgroups and determine degrees of freedom.

Task 1: Calculate Variances

This task explains the concept of variance and its importance, and shows how to estimate variances appropriately in NHANES
analyses.

Key Concepts about Calculating Variances in NHANES (/nchs/tutorials/Dietary/Basic/EstimateVariance/Info1.htm)
How to Specify the Survey Design to Obtain Appropriate Variance Estimates Using SUDAAN (/nchs/tutorials/Dietary/Basic
/EstimateVariance/Task1a.htm)
How to Specify the Survey Design to Obtain Appropriate Variance Estimates Using SAS (/nchs/tutorials/Dietary/Basic
/EstimateVariance/task1b.htm)

Task 2: Analyze Subgroups

Because of the nature of the NHANES survey design, subgroup analysis is more involved than it would be with a simple,
unweighted random sample. This task covers examination of subgroups for basic NHANES dietary analyses in SAS and
SUDAAN.

Key Concepts about Analyzing Subgroups (/nchs/tutorials/Dietary/Basic/EstimateVariance/Info2.htm)
How to Specify the Survey Design to Analyze Subgroups in SUDAAN (/nchs/tutorials/Dietary/Basic/EstimateVariance
/task2a.htm)
How to Specify the Survey Design to Analyze Subgroups in SAS (/nchs/tutorials/Dietary/Basic/EstimateVariance/task2b.htm)

Task 3: Calculate Degrees of Freedom

Accurate determination of degrees of freedom is important for performing statistical tests and calculating confidence limits.

Key Concepts about Calculating Degrees of Freedom (/nchs/tutorials/Dietary/Basic/EstimateVariance/Info3.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:20 PM

12/19/2018 NHANES Dietary Web Tutorial: Calculate Variance, Analyze Subgroups, and Calculate Degrees of Freedom: Calculate Variances

Print Text!

Task 1: Key Concepts about Calculating Variances in NHANES

Because NHANES is only a sample of the U.S. population (instead of a census), any number computed from NHANES
data is an estimate of the corresponding population number. Therefore, each statistic has some level of sampling error
associated with it, and if estimates were derived from numerous samples, they would not all be the same. That is, this
sampling error would result in a dispersion of the estimates, and this theoretical dispersion is known as the variance.
Typically, the true population variance is unknown and we can only estimate the sampling variance—or a related measure,
the standard error—of a statistic. Standard errors are used to assess the precision of the statistic of interest. See the
NHANES Analytic Guidelines to learn how to interpret these standard errors and determine whether your estimate is
precise enough to report.

Standard statistical software packages calculate variance estimates, but only those that are designed to address complex
weighted samples such as NHANES should be used. These software packages include SUDAAN and STATA or the
survey procedures in SAS and SPSS. These procedures require information on the first stage of the sample design (i.e.,
identification of strata and PSUs) for each sample person. Variance estimates computed using standard statistical
software packages that assume simple random sampling would be generally too low (i.e., significance levels would be
overstated) for the NHANES sample. They also would be biased because they would not account for the differential
weighting and the correlation among sample persons within a cluster. Therefore, the procedures used to analyze
NHANES data should be able to account for the complex sample design when producing variance estimates.

Accounting for the complex sampling design of NHANES is critical when calculating estimates and standard errors of
means, percentages, and other statistics. As explained in the NHANES Survey Design Overview module, NHANES has a
multistage probability design, where the first two stages (selection from strata and from PSUs) are of primary concern for
variance estimation. Typically, individuals within a PSU are more similar to one another than to those in other PSUs.
Ideally, it is more desirable to sample fewer people within each PSU but sample more PSUs. However, because of
operational limitations (e.g., cost of moving the Mobile Examination Centers, geographic distances between PSUs),
NHANES can sample only 30 PSUs within a 2-year survey cycle. The sample size is roughly equal across PSUs and
yields about 5,000 examined persons per year. The NHANES sample design uses unequal probabilities of selection in
order to oversample select individuals and population subgroups. For example, in 1999-2006, individuals ages 65 and
older, African Americans, and Mexican Americans are oversampled, as are low income whites. All of these complex
sample design factors (PSU stratification, geographic clustering, differential probabilities of selection) affect variance
estimates of the NHANES data.

Together, the strata and the PSUs define the variance units of the sampling design, which should be taken into account to
properly estimate the variance due to sampling error for any statistic computed from NHANES data. However, the true
stratum and PSU identifiers must be kept confidential because the release of data in 2-year cycles makes it easy to identify
them. To protect the confidentiality of data obtained from sample persons, Masked Variance Units (MVUs) are constructed
by aggregating secondary sampling units into groups. Therefore, “sample design” variables on the public release files
(SDMVSTRA and SDMVPSU for strata and PSU, respectively) are provided instead of the real identifiers for purposes of
variance estimation. These variables define MVUs. MVUs are equivalent to Pseudo-PSUs used to estimate sampling
errors in past NHANES.

Using MVUs yields variance estimates that closely approximate those obtained using the real “unmasked” variance units
and therefore are considered satisfactory and appropriate. They have been created for each 2-year cycle of NHANES in
such a way that they can be used for any combination of data cycles without recoding.

For complex sample surveys, exact mathematical formulas for variance estimates are not readily available. Variance
approximation procedures are required to provide reasonable estimates of the magnitude of sampling error. Two variance
approximation procedures that account for the complex sample design are replication methods and Taylor series
linearization methods. For the most part, the variance approximation methods are generally equivalent, although for some
specific applications, one particular method may be slightly preferred. Because replication methods tend to be more
cumbersome, NCHS currently recommends the use of the Taylor series linearization methods for variance estimation in all
NHANES surveys. SUDAAN, SAS, STATA, and SPSS procedures can be used to obtain variances estimated by this
method for a variety of statistics, such as means, geometric means, and percentages. In general, you need to identify the
variables that hold the information about the sampling design when using most statistical software packages. In other
words, you need to specify the variables that define the stratum, PSUs (also called clusters) within each stratum, and
sampling weight.

The two statistical analysis software packages demonstrated in the tutorial are SUDAAN and SAS, and they each use
slightly different syntax for specifying the design. The key differences are:

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/EstimateVariance/Info1.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Calculate Variance, Analyze Subgroups, and Calculate Degrees of Freedom: Calculate Variances

SUDAAN code requires your input dataset to be sorted by PSU within a stratum; SAS does not require presorting
data.
SUDAAN specifies design options (i.e., with replacement or without replacement); SAS does not.
SUDAAN specifies the strata and cluster variables in a single nest statement
SAS uses separate strata and cluster statements, and orders the statements to indicate hierarchy
Both procedures use a weight statement to specify the sampling weight

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/EstimateVariance/Info1.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Calculate Variance, Analyze Subgroups, and Calculate Degrees for Freedom: Task 1b

Print Text!

Task 1b: How to Specify the Survey Design to Obtain Appropriate Variance
Estimates Using SAS

The following programming statements are typically used to specify NHANES survey design parameters when using a
SAS procedure with NHANES data. As you will see, any SAS code used for these analyses has four key elements, which
are explained below. Note that the four elements used in SAS are different than those in SUDAAN.

Template for Specifying the Survey Design in SAS

Code Element

proc <SAS procedure> data = <dataset name>; Element 1

strata SDMVSTRA ; Element 2

cluster SDMVPSU ; Element 3

weight <appropriate sample weight variable>; Element 4

<more SAS procedure syntax>;

run;

The four key elements of this code include:

Element 1

The dataset must be identified when using the SAS survey procedures. However, the dataset does not have to be
presorted by the sample design variables as it does in SUDAAN.

Element 2
The “strata” statement names the variable that forms the strata.

Element 3
The “cluster” statement names the variables that identify the clusters (i.e., PSU), which are nested within the strata.

Element 4

The “weight” statement tells SUDAAN which sampling weight variable to use. For more information on sampling weights,
see the “Overview of NHANES Survey Design and Weights” and the “Locate Variables” modules.

IMPORTANT NOTE

To calculate the variance appropriately, one of the SAS survey procedures must be used, instead of the standard SAS
procedures for simple random samples. The elements in this example identify the most basic statements used in SAS to
account for the complex sample design of NHANES (i.e. strata, PSUs and weights). Additional options can be added to

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/EstimateVariance/task1b.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Calculate Variance, Analyze Subgroups, and Calculate Degrees for Freedom: Task 1b

these statements to customize the variance estimates, statistics, and the output to suit individual analytic needs. Please
consult the SAS manual for specifications on customized options.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/EstimateVariance/task1b.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Calculate Variance, Analyze Subgroups, and Calculate Degrees of Freedom: Analyze Subgroups

Print Text!

Task 2: Key Concepts about Analyzing Subgroups

Sometimes you may wish to analyze only a certain demographic subgroup of interest, such as a particular age range or
gender, or only those survey participants who were tested for a particular diet-related lab analyte, such as serum
carotenoids.

As a general rule, when working in any survey analysis software package, such as SUDAAN or SAS, the dataset used as
input to all procedures should contain all individuals in the sample with non-missing or non-zero values of the appropriate
sample weighting variable. That is, you should use the entire dataset (instead of creating smaller subset of the data) and
then use coding statements to select the subpopulation of interest. Although estimates of descriptive statistics might be
the same if you used a subset of the entire file, the estimated standard errors would not be appropriately calculated. This
is particularly true if the subset is based on a characteristic measured in the survey. For example, it would not be
appropriate to create a smaller data file comprised of only those who are diabetic or those who are hypertensive.

The only time that you can create separate datasets for smaller subgroups is when those subgroups are based on specific
values of the variables used in constructing the sample weight (e.g., gender, race/ethnicity, age). It should be noted that if
a smaller dataset is created based on these demographic characteristics, the standard errors may not differ greatly from
the standard errors from the full dataset. However, as a general rule, the full data set should be used with the subgroups
defined in the following manner:

In SUDAAN, it is safest to define a subset of your sample population using the SUBPOPN statement in the
procedure itself.
In SAS, the SURVEYMEANS and SURVEYFREQ procedures have special syntax that can be used to conduct
domain analyses. With other SAS survey procedures, special SAS-provided macros may be used to perform
subgroup analyses, but these analyses are beyond the scope of this course. SAS does not use SUBPOPN
statements that are used in SUDAAN.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/EstimateVariance/Info2.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Calculate Variance, Analyze Subgroups, and Calculate Degrees for Freedom: Task 2b

Print Text!

Task 2b: How to Specify the Survey Design to Analyze Subgroups Using
SAS

SAS has four survey analysis procedures. Only two of these, PROC SURVEYMEANS and PROC SURVEYFREQ, allow
for proper subgroup (or domain) analyses. PROC SURVEYMEANS uses a domain statement to specify the subgroup of
interest, whereas PROC SURVEYFREQ specifies the subgroup variable within the tables statement. General templates
for using these two procedures to conduct subgroup analyses in SAS are shown below.

Template for Analyzing Subgroups using PROC SURVEYMEANS

Code Element

proc surveymeans: Element 1

strata SDMVSTRA ; Element 2

cluster SDMVPSU ; Element 3

weight <appropriate sample weight variable>; Element 4

domain <domain variable>; Element 5

<more SAS procedure syntax>;

run;

Using a domain statement in the PROC SURVEYMEANS procedure allows you to identify the subgroup of interest.

Template for Analyzing Subgroups using PROC SURVEYMEANS

Code Element

proc surveyfreq: Element 1

strata SDMVSTRA ; Element 2

cluster SDMVPSU ; Element 3

weight <appropriate sample weight variable>; Element 4

tables <domain variable>*<row variable>*<column variable>; Element 5

<more SAS procedure syntax>;

run;

Using a tables statement in the PROC SURVEYFREQ procedure allows you to identify the subgroup of interest such that
two-way tables of the row variable by the column variables will be produced for each level of the domain and variable.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/EstimateVariance/task2b.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Calculate Variance, Analyze Subgroups, and Calculate Degrees of Freedom: Calculate Degrees of Free…

Print Text!

Task 3: Key Concepts about Calculating Degrees of Freedom

Due to the complex sample design of NHANES, estimates computed from the data are more variable than the sample size
would suggest. Therefore, as reviewed in Task 1, it is important to always calculate the variance, or standard error, that
reflects the sampling design for any estimate obtained from these data.

The variance of an estimate calculated from NHANES data is related to the number of PSUs. Because the number of
PSUs is relatively small in NHANES, hypothesis tests and confidence intervals should based on a t-distribution rather
than the more commonly used z-distribution. The t-distribution is directly dependent on the number of degrees of freedom,
as the number of degrees of freedom is used to choose the critical value on which the t-distribution is based.

Degrees of freedom are calculated by subtracting the number of strata from the number of PSUs, as shown in the equation
below. Therefore, a 2-year survey cycle generally has 15 degrees of freedom, which is calculated by subtracting 15 strata
from 30 PSU. However, when data are analyzed on a subgroup of sample persons, all of whom may not be represented in
all strata and PSUs (e.g. Mexican Americans), the degrees of freedom provided in the output may differ. For example, SAS
Survey procedures, such as PROC SURVEYMEANS, compute the degrees of freedom as the number of PSUs in the non-
empty strata minus the number of non-empty strata. This means that if your data have empty strata (i.e. no persons in the
population for either PSU), the number of degrees of freedom will increase. This is incorrect and SAS is currently working
on correcting this problem. In the meantime, you can use SAS macros that have been developed to get around this issue.
Please see the Continuous NHANES tutorial for more information.

Equation for Degrees of Freedom

The degrees of freedom are inversely proportional to relative standard of error and proportional to the reliability of an
estimated standard error. As the number of degrees of freedom increases, the relative standard error decreases and the
reliability of the estimate increases. The NHANES guidelines recommended a relative standard error of at most 30%. This
corresponds to at least 22 degrees of freedom.

IMPORTANT NOTE

For more information on degrees of freedom, please visit “Module 12: Variance Estimation” of the Continuous NHANES
Tutorial.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/EstimateVariance/Info3.htm 1/1

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Basic/PopulationMeanIntakes/...

Estimate Population Mean Intakes

Purpose

One of the most frequently cited measures in dietary assessment is mean intake. Because NHANES is a sample of the US
population, weighted sample means are estimates of the population means. This module covers the estimation of mean food
intakes and the estimation of mean nutrient intakes.

Task 1: Estimate Mean Food Intakes

This task reviews the key concepts related to measurement error associated with mean food intakes and the need to group
foods for analysis. It also provides instructions on deriving means and standard errors, using intake of milk consumed as a
beverage and total milk and milk products as examples.

Key Concepts about Estimating Mean Food Intakes (/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info1.htm)
How to Estimate Mean Food Intakes Using SUDAAN (/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Task1a.htm)
How to Estimate Mean Food Intakes Using SAS (/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Task1b.htm)

Task 2: Estimate Mean Nutrient Intakes from Foods and Beverages

This task explains that nutrient intakes can be derived from dietary recalls, and discusses the measurement error associated
with nutrient intakes estimated from dietary recalls. It also provides details on how to derive means and standard errors, using
intake of calcium from foods and beverages as an example.

Key Concepts about Estimating Mean Nutrient Intakes from Foods and Beverages (/nchs/tutorials/Dietary/Basic
/PopulationMeanIntakes/Info2.htm)
How to Estimate Mean Nutrient Intakes from Foods and Beverages Using SUDAAN (/nchs/tutorials/Dietary/Basic
/PopulationMeanIntakes/Task2a.htm)
How to Estimate Mean Nutrient Intakes from Foods and Beverages Using SAS (/nchs/tutorials/Dietary/Basic
/PopulationMeanIntakes/task2b.htm)

Task 3: Estimate Mean Nutrient Intakes from Supplements

This task describes key concepts related to mean nutrient intake being derived from dietary supplements. It also provides
details on how to estimate population means and standard errors, using intake of calcium from supplements as an example.

Key Concepts about Estimating Mean Nutrient Intakes from Supplements (/nchs/tutorials/Dietary/Basic
/PopulationMeanIntakes/Info3.htm)
How to Estimate Mean Nutrient Intakes from Supplements Using SUDAAN (/nchs/tutorials/Dietary/Basic
/PopulationMeanIntakes/Task3a.htm)
How to Estimate Mean Nutrient Intakes from Supplements Using SAS (/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes
/task3b.htm)

Task 4: Estimate Mean Total Nutrient Intakes

This task provides key concepts regarding combining data from both dietary recalls and the supplement questions to derive
total nutrient intake. It also shows how to estimate population means and standard errors of total nutrient intake.

Key Concepts about Estimating Mean Total Nutrient Intakes (/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info4.htm)
How to Estimate Mean Total Nutrient Intakes Using SUDAAN (/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes
/task4a.htm)
How to Estimate Mean Total Nutrient Intakes Using SAS (/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Task4b.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

1 of 2 1/14/2019, 9:20 PM

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Basic/PopulationMeanIntakes/...

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

2 of 2 1/14/2019, 9:20 PM

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Estimate Mean Food Intakes

Print Text!

Task 1: Key Concepts about Estimating Mean Food Intakes

Estimating mean intakes of selected foods is one of the most commonly conducted analyses of NHANES dietary data. To
obtain complete and accurate results from your analysis, consider the following issues before you begin.

Addressing Random Error and Bias

As noted in an earlier module, the mean of the usual intake distribution in the population is almost always the measure of
interest, and this is estimated using dietary recall data. Although dietary recall data are known to contain random errors,
especially large day-to-day variability, these errors can be generally assumed to cancel out. Therefore, the mean of 1-day
intakes can be used as an estimate of the mean of the usual intake distribution in the population without specific statistical
adjustment if the data are collected evenly throughout the year and the days of the week are evenly represented.

Dietary recall data also are known to contain bias, at least insofar as a tendency toward underreporting of energy. Little is
known regarding the extent to which energy underreporting extends to underreporting of foods. For that reason, and for
practical purposes, the current statistical convention is to assume that the recalls are not biased (i.e., that no
underreporting occurs). However, this assumption is more troubling than the one regarding random error and should be
noted as a limitation or caveat in any analysis of this type.

IMPORTANT NOTE

When estimating the mean of the population distribution of usual dietary intakes from 24-hour recalls, single day data are
sufficient and no specific statistical adjustment is necessary, but an assumption regarding lack of bias is required and
should be acknowledged. The second day of dietary recall is generally not used to estimate means but is used for more
advanced analyses.

Interpreting Measures of Central Tendency

If the data are highly skewed, as dietary data often are, means may not provide a very good representation of central
tendency. You may want to consider using the median instead of, or in addition to, the means in such an instance.
However, you should know that the simple median of reported intakes from a sample of one 24-hour recalls is not clearly
interpretable with regard to usual intake (as it really represents the median on a given day). For more information on how
to obtain the distribution of usual intake and its associated median, please see the Advanced Dietary Analyses course.

Grouping Foods for Analysis

Because more than 7,000 food codes are used in NHANES, food intake analysis almost always involves grouping like
foods together. Analysts can group foods for their own purposes, or use previously developed grouping schemes. One
such scheme is the Food Surveys Research Group-defined food groups that measures food in grams; another is the
MyPyramid food groups that measure food group equivalent amounts as defined by the MyPyramid Equivalents
Database. For more information about FSRG-defined food groups, the USDA Food Coding Scheme, or the MyPyramid
Equivalents Database, see the Resources for Dietary Data Analysis module of the Survey Orientation Course.

Choosing Whether to Include Non-Consumers

Another consideration with estimating mean food intake is whether you are interested in the mean amount among all
persons in the population, or only a given day’s consumers of the food. That is, you need to decide whether the non-
consumers should be included in the estimation. If you are interested in the per capita amount consumed, you should
include the non-consumers with their intake value at zero; if you are interested in the average amount consumed by users
of the food on days when the food is consumed, you should exclude the non-consumers.

Using Appropriate Statistical Procedures 1/2

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info1.htm

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Estimate Mean Food Intakes

Means should be examined along with their standard errors, to get an indication of the variation about the mean. Special
statistical procedures are required to get appropriate standard errors when using data from a complex sample such as the
NHANES. In addition, appropriate weighting factors should be applied, so that the data will represent the population as a
whole. See Module 13, Estimate Variance, Analyze Subgroups, and Calculate Degrees of Freedom for further information.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info1.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Task 1b

Print Text!

Task 1b: How to Estimate Mean Food Intakes Using SAS

This section describes how to use SAS to estimate mean food intakes along with standard errors. To illustrate this, consumption of milk is used as an example. As explained in the key
concepts section, there are different ways to group foods for analysis, and so it is with examining “milk” intakes. One way is to consider only fluid milk reported separately—not as part of
a combination—and another is to account for all milk and milk products—including milk, yogurt and cheese—whether reported separately or as part of a combination or mixture. In the
programs that follow, consumption of fluid milk not in combination, measured in grams, and consumption of all milk and milk products, measured in cup equivalents, are used as
examples.

The following analyses are for children ages 6-11, and mean intakes are estimated among users. Such estimates answer the question: on average, what quantity is consumed in a given
day by users of the food? Analysts interested in per capita consumption (that is, including zeroes for non-consumers) would need to specify that missing values should be set to zero.
See full program under Additional Resources for note about this.

Step 1: Compute Properly Weighted Estimated Means and Standard Errors

Sorting is not a necessary first step in SAS, as it is in SUDAAN. Therefore, properly weighted estimated means and standard errors can be obtained via a single SAS procedure, PROC
SURVEYMEANS.

In the sample below, the NOBS, MEAN, and STDERR options in the PROC SURVEYMEANS statement request that the number of observations, the estimated mean, and its estimated
standard error, respectively, be printed for each analysis variable. The DOMAIN statement designates the combination of variables required to obtain separate estimates by gender
(RIAGENDR) within the cohort of interest (INCOH). INCOH is a variable that has value 1 if the individual is “in the cohort” and zero otherwise. Here, children ages 6 to 11 with complete
and reliable recall data have INCOH=1. The FORMAT statement controls how levels of the RIAGENDR variable are printed on the output. As in the SUDAAN example above, the
weight variable being used is for the dietary recall Day 1 subsample (WTDRD1).

Estimating Mean Intake of Milk as a Beverage, in Grams, and Intake of Total Milk and Milk Products, in Cup Equivalents

Sample Code

*-------------------------------------------------------------------------;

* Use the PROC SURVEYMEANS procedure in SAS to compute properly weighted ;

* estimated means and standard errors ;

*;

* To properly perform a subdomain analysis, form a 2-way table of INCOH ;

* by RIAGENDR. In this example, the statistics of interest are those ;

* where INCOH=1 in the table. ;

*-------------------------------------------------------------------------;

proc surveymeans nobs mean stderr data = CALCMILK;
strata SDMVSTRA;
cluster SDMVPSU;
domain INCOH*RIAGENDR;
var MILK0 D_TOTAL;
weight WTDRD1;
format RIAGENDR GENDER. ;
title1 "Estimated daily intake of fluid milk drunk by itself as a beverage;
and of total milk and milk products" ;
title2 "children age 6-11, WWEIA, NHANES 2003-2004 - using SAS" ; run ;

Output of Program

Estimated daily intake of fluid milk drunk by itself as a beverage;
children age 6-11, WWEIA, NHANES 2003-2004 - using SAS

The SURVEYMEANS Procedure

Data Summary

Number of Strata 15

Number of Clusters 30

Number of Observations 10122

Number of Observations Used 9034

Number of Obs with Nonpositive Weights 1088

Sum of Weights 286222757

Statistics Std Error

Variable Label N Mean of Mean ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

MILK0 Fluid milk (g) consumed outside of a combination for consumers 2332 451.443365 12.905530

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Task1b.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Task 1b

D_TOTAL Total number of milk group (milk, yogurt & cheese) cupequivalents 8273 1.761007 0.048017

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Domain Analysis: Gender - Adjudicated*INCOH

Gender - Std Error

Adjudic. INCOH Variable Label N Mean of Mean

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Male 0 MILK0 Fluid milk (g) consumed outside of a combination for consumers 1034 503.310228 22.624207

D_TOTAL Total number of milk group (milk, yogurt & cheese) cupequivalents 3615 1.916645 0.059658

1 MILK0 Fluid milk (g) consumed outside of a combination for consumers 143 395.289004 45.893677

D_TOTAL Total number of milk group (milk, yogurt & cheese) cupequivalents 422 2.595515 0.162629

Female 0 MILK0 Fluid milk (g) consumed outside of a combination for consumers 984 426.001838 13.414503

D_TOTAL Total number of milk group (milk, yogurt & cheese) cupequivalents 3758 1.500916 0.048613

1 MILK0 Fluid milk (g) consumed outside of a combination for consumers 171 305.434366 27.785526

D_TOTAL Total number of milk group (milk, yogurt & cheese) cupequivalents 478 2.133974 0.141756

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Highlights from the output include:

9,034 observations (respondents) were read by the program; 1,088 additional observations were skipped because their sampling weight value was zero (due to recall being
unreliable or person otherwise ineligible).
900 respondents were included in this analysis; of these, 314 reported milk as a beverage; 143 were boys and 171 were girls.
The mean intake was 395 gm for boys and 305 gm for girls. These are estimates of the population mean intake of fluid milk on a given day among 6-11 year old boys and girls.
As noted in the Key Concepts section, these means also represent the mean usual intakes of fluid milk for these age-sex groups in the population.
The mean number of total milk group cup equivalents was 2.60 for boys and 2.13 for girls. These are estimates of the population mean intake of total milk cup equivalents on a
given day among 6-11 year old boys and girls and also represent the mean usual intakes of total milk cup equivalents for these age-sex groups in the population.
The standard errors of these means were 45.9 and 27.8, respectively. See the NHANES Analytic Guidelines for more information on how to interpret standard errors.
Unlike the SUDAAN output, the SAS output provides data for individuals who are not in the cohort of children ages 6-11. As we are not interested in those data, they can be
ignored.

IMPORTANT NOTE

It is important to note that the analysis above was conducted using only children ages 6-11 who were consumers of milk as a beverage. If, however all members (i.e. consumers and
non-consumers) of the selected age group were included (total n = 900; 422 males and 478 females), then the average amounts would be lower. For males, the mean milk intake would
be 145 gm and for females, it would be 136 gm (see the full Milk program in the Additional Resources section for example code). These means represent the per capita consumption.

Watch animation of program and output
Can't view the demonstration? Try our Tech Tips for troubleshooting help.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Task1b.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Estimate Mean Nutrient Intakes from Foods and Beverages

Print Text!

Task 2: Key Concepts about Estimating Mean Nutrient Intakes from Foods
and Beverages

Estimating mean intakes of selected foods and beverages is one of the most commonly conducted analyses of NHANES
dietary data.

WARNING

In NHANES 1999-2004, the nutrient amounts in the dietary recall interview files reflect only nutrients obtained from foods
and beverages, including sweetened water beverages. They DO NOT include nutrients obtained from plain drinking water.
Beginning in 2005, nutrients from plain drinking water will be included in the data release.

To obtain complete and accurate results from your analysis, consider the following issues before you begin.

Addressing Random Error and Bias

Nutrients can be obtained from diet (i.e., from foods and beverages) or from dietary supplements. Sometimes analysts are
interested in examining nutrient intake from diet, sometimes from supplements, and sometimes from both. This task
relates to estimating nutrient intake from diet only. That is, these estimates do not represent total nutrient intake from all
sources.

Nutrient intakes from foods and beverages are estimated using dietary recall data. Although dietary recall data are known
to contain random errors, especially large day-to-day variability, as noted in the previous module, we typically assume
these errors cancel out when estimating means. Therefore, no specific statistical adjustment is necessary, and the mean
of 1-day intakes can be used as an estimate of the mean of the usual intake distribution in the population.

As noted in the last task, dietary recall data are also known to contain bias, at least insofar as a tendency toward
underreporting of energy. Little is known regarding the extent to which energy underreporting extends to underreporting of
other nutrients. For that reason, and for practical purposes, the current statistical convention is to assume that the recalls
are not biased (i.e., that no underreporting occurs). However, this assumption is more troubling than the one regarding
random error and should be noted as a caveat in an analysis of this type.

Interpreting Measures of Central Tendency

If the data are highly skewed, as dietary data often are, means may not provide a very good representation of central
tendency. You may want to consider using the median instead of, or in addition to, the mean in such an instance.
However, you should know that the simple median of reported intakes from recalls is not clearly interpretable with regard to
usual intake (as it really represents the median on a given day). Information on how to obtain the distribution of usual
intake and its associated median can be found in the Advanced Dietary Analyses course.

Using Appropriate Statistical Procedures

Also, as mentioned in the last task, the standard errors of estimated means should be reported, to provide an indication of
the variation about the mean. Special statistical procedures are required to get appropriate standard errors when using
data from a complex sample such as the NHANES. In addition, the appropriate sample weights should be applied
because the inference should be to the population rather than the sample. See “Module 13: Estimate Variance, Analyze
Subgroups, and Calculate Degrees of Freedom” for more information.

IMPORTANT NOTE

When estimating the mean of the population distribution of usual dietary intakes from 24-hour recalls, single day data are
sufficient and no specific statistical adjustment is necessary, but an assumption regarding lack of bias is required and

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info2.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Estimate Mean Nutrient Intakes from Foods and Beverages

should be acknowledged. The second day of dietary recall is generally not used to estimate means but is used for more
advanced analyses.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info2.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Task 2b

Print Text!

Task 2b: How to Estimate Mean Nutrient Intakes from Foods and Beverages
Using SAS

This section describes how to use SAS to estimate mean nutrient intakes from food and beverages – that is, using only
data on the dietary recalls – along with standard errors. To illustrate this, consumption of calcium from foods and
beverages by children ages 6-11 is used as an example.

Step 1: Compute Properly Weighted Estimated Means and Standard Errors

Sorting is not a necessary first step in SAS as it is in SUDAAN. Therefore, properly weighted estimated means and
standard errors can be obtained via a single SAS procedure, PROC SURVEYMEANS.

In the sample below, the NOBS, MEAN, and STDERR options in the PROC SURVEYMEANS statement request that the
number of observations, the estimated mean, and its estimated standard error, respectively, be printed for each analysis
variable. The DOMAIN statement designates the combination of variables required to obtain separate estimates by
gender (RIAGENDR) within the cohort of interest (INCOH). INCOH is a variable that has value 1 if the individual is “in the
cohort” and zero otherwise. Here, children ages 6 to 11 with complete and reliable recall data have INCOH=1. The VAR
statement is used to identify the variable of interest. DR1TCALC is a variable available from the NHANES dataset that
represents total dietary calcium (i.e., from foods and beverages, not supplements). The FORMAT statement controls how
levels of the RIAGENDR variable are printed on the output. Note that the strata and PSU variables are identified with
strata and cluster statements, respectively. As in the SUDAAN example above, the weight variable being used is for the
dietary recall Day 1 subsample (WTDRD1).

Estimating Mean Calcium Intake from Foods and Beverages, in Milligrams

Sample Code

*-------------------------------------------------------------------------;

* Use the PROC SURVEYMEANS procedure in SAS to compute properly weighted ;

* estimated means and standard errors ; * To properly analyze subgroups using the proc survey

means procedure, a ;

* domain statement is used to form a 2-way table of INCOH by RIAGENDR. ;

* “INCOH” means “in the cohort.” In this example, the statistics of ;

* interest are those where INCOH=1 in the table. ; *-------------------------------------------------------------------------;

proc surveymeans nobs mean stderr data = CALCMILK;
strata SDMVSTRA;
cluster SDMVPSU;
domain INCOH*RIAGENDR;
var DR1TCALC;
weight WTDRD1;
format RIAGENDR GENDER. ;
title1 “ Estimated daily intake of total dietary Calcium”;
title2 "children age 6-11, WWEIA, NHANES 2003-2004 - using SAS" ;

run ;

Output of Program 1/2

Estimated daily
intake of total dietary Calcium
children age 6-11, WWEIA, NHANES 2003-2004 - using SAS

The SURVEYMEANS Procedure
Data Summary

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/task2b.htm

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Task 2b

Number of Strata 15
Number of Clusters 30
Number of Observations 10122
Number of Observations Used 9034
Number of Obs with Nonpositive Weights 1088
Sum of Weights 286222757

Statistics

Std Error

Variable Label N Mean of Mean

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

DR1TCALC Calcium (mg) 8894 918.299483 16.587039

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Domain Analysis: Gender - Adjudicated*INCOH

Gender - Std Error

Adjudicated INCOH Variable Label N Mean of Mean

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Male 0 DR1TCALC Calcium (mg) 3933 1019.861839 21.283685

1 DR1TCALC Calcium (mg) 422 1109.561331 48.847753

Female 0 DR1TCALC Calcium (mg) 4061 801.555808 15.852863
48.341631
1 DR1TCALC Calcium (mg) 478 945.933037

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Highlights from the output include:

900 persons were included in the cohort of interest (INCOH=1); 422 boys and 478 girls.
Unlike SUDAAN, SAS does not print out an overall mean for the cohort of interest, because of the different ways the
subpopulation/subdomain analysis is specified. The value of 918 mg for DR1TCALC is based upon all individuals in
the dataset, not all individuals in the cohort of interest. For this analysis, only the values for INCOH=1 in the output
are of interest.
Among all the children, the average calcium intake was not shown. For boys in the cohort, it was 1110 mg, and for
girls, it was 946 mg. These are estimates of the population mean intake of calcium on a given day among 6-11 year
old boys and girls. As noted in the Key Concepts section, these means also represent the mean usual intakes of
calcium for these age-sex groups in the population.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/task2b.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Estimate Mean Nutrient Intakes from Supplements

Print Text!

Task 3: Key Concepts about Estimating Mean Nutrient Intakes from
Supplements

This task relates to estimating mean nutrient intakes from supplements only. That is, these estimates do not include
nutrient intake from dietary (food and beverage) sources.

Nutrient intakes from supplements are estimated using data from the supplement questionnaire. Very little is known
regarding the extent of bias or random errors associated with dietary supplement data. For that reason, and for practical
purposes, the supplement data are generally treated as though none of either type of error occurred. However, the
possibility of both should be noted as a caveat in an analysis of this type.

WARNING

When estimating the mean of the population distribution of usual nutrient intakes from supplement data, no standard
convention for statistical adjustment currently exists.

Unlike the data derived from the recalls, there are no data files that provide total daily amounts of each nutrient across all
supplements. Therefore, this must be calculated for each person first. There are a few key points to note when calculating
supplement intake. First, each supplement could be reported with a different frequency, based on use over the past 30
days, so care must be taken in deriving the intakes from all supplements. Second, the measurement unit for a given
supplement may not be the same across all brands, so conversions may need to be made to combine nutrient values.
Third, nutrients may be listed as compounds and need to be converted to elemental form and amounts (e.g. calcium
carbonate would need to be converted to the corresponding amount of elemental calcium in order to determine total
calcium); however, this is less of a concern with the supplement data releases since 2001 because recent releases have
coded supplements using elemental (common) names rather than compounds.

The variables needed to calculate mean nutrient intake come from the Supplement Data Files 2, 3, and 4. File 2 provides
usage of the supplement, File 3 provides information on the supplement itself, and File 4 provides all the ingredients in the
supplement. (For more information on merging supplement files, see “Module 8: Merge and Append Datasets, Task 1:
Merge NHANES data.”) Missing data can be a limitation with several of the dietary supplement variables. File 2 contains
the most missing data. The number of cases of missing data and the possible remedies vary by the particular variable, as
follows:

DSD103, which is the number of days the supplement was taken in the past 30 days, has 227 cases of missing data
for 2003-2004. This is because some of the supplements were inadvertently reported in the prescription medication
section of the NHANES interview, and that section does not ask participants how often they took the product in the
past 30 days. Because this variable is needed to determine usual intake, analysts can either impute a number of
days or drop these records from the dataset. Imputation requires an assumption that the supplement was taken
regularly and is usually based on some other information the respondent provided, such as the number of days that
the respondent reported taking certain other types of dietary supplements.

The variables DSD122Q / DSD122U, which capture, in quantity and units, responses to the question, “On the days
you took the supplement, how much did you take?” also have more than 200 missing cases. This is for the same
reason as mentioned in the previous bullet. That is, this question is not asked in the medication portion of the
interview, where some respondents mistakenly report their supplement use. Analysts may want to impute data,
which requires an assumption that the respondent took the serving size listed on the label captured in variables
DSDSERVQ / DSDSERVU.

The supplement name has missing data because no match to the supplement was recorded, no similar supplement
exists on the market, or the recorded name was not comprehensible as denoted in the DSDMTCH variable.
However, these records are kept in the data because it is assumed that individuals did take a supplement, even
though the name is unknown. They should be retained for prevalence estimates. It may be best to exclude these
data from analyses in which mean intakes are being estimated. This action also would reduce missing data for
some other variables.

When estimating mean intake from supplements, analysts should decide whether or not they wish to include calcium from
antacids. Antacids are not included in DSD010 (any dietary supplements taken) or in DSDCOUNT (the number of
supplements taken). However, antacids are included in file 2, so the default is that they will be counted. The variable

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info3.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Estimate Mean Nutrient Intakes from Supplements

DSDANTA indicates whether or not the supplement is an antacid and where the antacid data were collected (i.e. during the
supplement section of the questionnaire or during the prescription medication section of the questionnaire). It is necessary
to be cautious when using antacid data in analyses because subsequent pilot studies and questions fielded in more recent
NHANES suggest antacids are used sporadically and more as a medication than as a supplement. Including these
products in usual intake estimates for calcium may skew results and overestimate usual calcium intake for some
individuals. However, a few antacids were reported during the dietary supplements section of the questionnaire, and they
may be assumed to be taken as a supplement.

Another consideration with estimating mean nutrient intake from supplements is whether you are interested in the mean
amount among all persons in the population, or only users of the supplement. That is, it is necessary to decide whether
non-consumers should be included in the estimation. If you are interested in the per capita amount consumed, you should
include the non-consumers with their intake value at zero. If you are interested in the average amount consumed by users
of the supplement, you should exclude the non-consumers.

Means should be examined along with their standard errors, to get an indication of the variation about the mean. Special
statistical procedures are required to get appropriate standard errors when using data from a complex sample such as the
NHANES. In addition, appropriate sample weights should be applied, so the results will represent the population as a
whole. See “Module 13, Estimate Variance, Analyze Subgroups, and Calculate Degrees of Freedom” for more information.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info3.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Task 3b

Print Text!

Task 3b: How to Estimate Mean Nutrient Intakes from Supplements Using
SAS

This section describes how to use SAS to estimate mean nutrient intakes from supplements, along with standard errors.
To illustrate this, consumption of supplemental calcium by females ages 20 and older is used as an example.

Step 1: Compute Properly Weighted Estimated Means and Standard Errors

Use the PROC SURVEYMEANS procedure to compute properly weighted estimated means and standard errors.

In the sample below, the NOBS, MEAN, and STDERR options in the PROC SURVEYMEANS statement request that the
number of observations, the estimated mean, and its estimated standard error, respectively, be printed for each analysis
variable. The DOMAIN statement designates the combination of variables required to obtain separate estimates by the
cohort of interest (INCOHF20) within each age group (AGEGRP). INCOHF20 is a variable that has value 1 if the individual
is “in the cohort” and zero otherwise. Here, females ages 20 and older have INCOHF20=1. As in the SUDAAN example
above, the weight variable being used is for the MEC subsample (WTMEC2YR).

Use SUDAAN to Estimate Mean Intake of Calcium, in Milligrams, from Supplements among Females Ages 20 years
and Older

Sample Code

*-------------------------------------------------------------------------;

* Use the PROC SURVEYMEANS procedure in SAS to estimate mean intakes of ;

* calcium from supplements using complex survey design factors (strata ;

* and PSU) ;

*-------------------------------------------------------------------------;

proc surveymeans nobs mean stderr data = DEMOOSTS;
strata SDMVSTRA;
cluster SDMVPSU;
domain INCOHF20*AGEGRP;
var DAILYAVG;
weight WTMEC2YR;
format AGEGRP AGEGRP. ;

run ;

Output of Program

Data Summary

Number of Strata 15
Number of Clusters 30
Number of Observations 10122
Number of Observations Used 9643
Number of Obs with Nonpositive Weights 479
Sum of Weights 286222757

Variable Statistics Std Error
N Mean of Mean

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/task3b.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Task 3b

--------------------------------------------------------

DAILYAVG 9618 142.999400 9.643379

--------------------------------------------------------

Domain Analysis: Age of subject*INCOHF20

Age of Std Error

subject INCOHF20 Variable N Mean of Mean

-------------------------------------------------------------------------------

20-39 0 DAILYAVG 767 62.602567 6.471849

1 DAILYAVG 885 135.856261 17.035929

40-59 0 DAILYAVG 653 148.634197 17.715262

1 DAILYAVG 679 251.803537 23.644341

>= 60 0 DAILYAVG 849 197.511972 14.782058

1 DAILYAVG 898 426.392544 26.575240

-------------------------------------------------------------------------------

Highlights from the output include:

9,643 observations (respondents) were read by the program; 479 were skipped because their sampling weight
value was zero (ineligible)
2,462 respondents were in the cohort; 885 were ages 20-39; 679 were ages 40-59; and 898 were ages 60 or older.
Females ages 20-39 reported 136 mg, while those ages 40-59 reported 252 mg, and those ages 60 and older
reported 426 mg. These are estimates of the population mean intake of supplemental calcium on a given day, which
is equivalent to the mean usual intake of supplemental calcium for these age groups in the population.
The standard errors of these means were 17.0, 23.6, and 26.6, respectively. See the NHANES Analytic Guidelines
for more information on how to interpret standard errors.
Unlike the SUDAAN output, the SAS output provides data for individuals who are not in the cohort. As we are not
interested in those data, they can be ignored.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/task3b.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Estimate Total Nutrient Intakes

Print Text!

Task 4: Key Concepts about Estimating Total Nutrient Intakes

This task relates to estimating total mean nutrient intakes from foods, as reflected in the 24-hour dietary recall, and dietary
supplements.

WARNING

In NHANES 1999-2004, the nutrient amounts in the dietary recall interview files reflect only nutrients obtained from foods
and beverages, including sweetened water beverages. They DO NOT include nutrients obtained from plain drinking water.
Beginning in 2005, nutrients from plain drinking water will be included in the data release.

Estimating total nutrient intake requires using data from both the 24-hour recalls and dietary supplement questionnaire.
(For more information on merging supplement files, see “Module 8: Merge and Append Datasets, Task 1: Merge NHANES
data.”) These two types of data have different reference periods and measurement error characteristics (see Tasks 2 and 3
in this module). Therefore, some data manipulation is required to combine and summarize the data. Also, the study
sample sizes may differ because some supplement users did not complete the dietary recall interview and persons who
completed the dietary recall may not be supplement users. Exploratory analyses are useful to identify the characteristics
of the supplement use and dietary recall samples.

All the key concepts and caveats regarding estimating nutrient intakes from both dietary (foods and beverages) and
supplement sources apply when estimating total nutrient intake. To date, no elegant procedures are available to combine
these two types of data.

WARNING

When estimating the mean of the population distribution of usual nutrient intakes from supplement data, no standard
convention for statistical adjustment currently exists.

For this course, the sample of persons with satisfactory data for both supplements and the Day 1 recall will be selected.
Then, for each person, the average daily nutrient intake from supplements will be determined and added to the nutrient
intake from the 24-hour recall. Finally, a weighted mean of those values will be obtained; in this example, the Day 1
dietary recall weight will be used because it represents the variable that applies to all members of the smallest analysis
subpopulation (See “Module 6: Locate Variables, Task 4: Identify Correct Sample Weights and Their File Locations” for
more information). This method assumes that the sample of persons with satisfactory data on both types of data is
representative of the population.

Because the units of measure are different between recalls and supplement data, ingredient units (DSDUNIT) for each
nutrient of interest on the supplement files will need to be converted to units used in the dietary intake data files. For
example, all calcium units should be converted to milligrams. Most of the supplement units for calcium are in milligrams,
but there are some units in grams that require conversion.

Also, nutrients may be listed as compounds and need to be converted to elemental form and amounts. For example, there
may be some instances of calcium carbonate, which will need to be converted to the corresponding amount of elemental
calcium. This is less of a concern with the supplement data releases since 2001, but this issue occurs periodically in
earlier surveys.

As in the case of estimating nutrient intakes from supplements alone, analysts must consider the possibility of missing data
and whether or not to include antacids (in the case of calcium or magnesium). For further information regarding these
topics, see the previous task (Task 3), “Key Concepts about Estimating Nutrient Intakes from Supplements.”

Means should be examined along with their standard errors, to get an indication of the variation about the mean. Special
statistical procedures are required to get appropriate standard errors when using data from a complex sample such as the
NHANES. In addition, appropriate sample weights should be applied, if the data are being used to represent the
population as a whole. See “Module 13: Estimate Variance, Analyze Subgroups, and Calculate Degrees of Freedom” for
further information.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info4.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Estimate Total Nutrient Intakes

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Info4.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Task 4b

Print Text!

Task 4b: How to Estimate Total Nutrient Intakes Using SAS

This section describes how to use SAS to estimate mean nutrient intakes from all sources – that is, from foods, beverages,
and supplements – along with standard errors. To illustrate this, consumption of calcium by adults ages 20 and older is
used as an example.

Step 1: Sort Data

Sort the previously saved datasets by SEQN and merge them, keeping data only for those individuals from the food
analysis data set. During the MERGE step, create a variable called TOTCALC that is the sum of the 24-hour recall and
supplemental calcium average values.

Step 2: Compute Properly Weighted Estimated Means and Standard Errors

Use the PROC SURVEYMEANS procedure to estimate mean intakes of calcium from diet, from supplements and from
their combination using complex survey design factors (e.g. strata and PSUs).

The WHERE clause tells SAS to subset the input data set and only include individuals ages 20 and older. We could have
defined an “in cohort” variable and used it as an additional variable in the domain statement as in other analyses in this
course, but because NHANES is designed to be representative of all individuals ages 20 and older, we can use this
shortcut approach.

Use SAS to Estimate Mean Intake of Calcium, in Milligrams, from Diet, Supplements, and Their Combination
among Females Ages 20 Years and older

Sample Code

*-------------------------------------------------------------------------; ;
* Sort the previously-saved data sets by SEQN and merge them, keeping
* data only for those individuals from the food analysis data set. ;
*-------------------------------------------------------------------------;

proc sort data =NH.CALCMILK out =CALC24(keep=SEQN WTDRD1 DR1TCALC RIDAGEYR);
by SEQN;

run

proc sort data =NH.DEMOOSTS out =CALCDS(keep=SEQN SDMVSTRA SDMVPSU RIAGENDR
AGEGRP DAILYAVG);
by SEQN;

run ;

data CALCTD;
merge CALC24(in =IN24) CALCDS;
by SEQN;
< if IN24;
* Create a variable that is the sum of 24HR and supplemental calcium;
TOTCALC= DR1TCALC + DAILYAVG;
* Use the LABEL statement to provide descriptive labels;
label DR1TCALC= 'Calcium (mg) from food and beverage sources on first 24HR'
DAILYAVG= 'Calcium (mg) from dietary supplements (Estimated daily average)'
TOTCALC= 'Estimated total calcium intake on day of first 24HR from all sources'

run ;

*-------------------------------------------------------------------------;
* Use the PROC SURVEYMEANS procedure to estimate mean intakes of calcium ;

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Task4b.htm 1/3

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Task 4b

* from diet, from supplements, and from their combination using complex ;

* survey design factors (e.g. strata and PSU) ;

*-------------------------------------------------------------------------;

proc surveymeans nobs mean stderr data = CALCTD(where=(RIDAGEYR >= 20 ));
strata SDMVSTRA;
cluster SDMVPSU;
domain RIAGENDR*AGEGRP;
var DR1TCALC DAILYAVG TOTCALC;
weight WTDRD1;
format RIAGENDR GENDER. AGEGRP AGEGRP. ;

run ;

Output of Program

Data Summary

Number of Strata 15
Number of Clusters 30
Number of Observations 5041
Number of Observations Used 4448
Number of Obs with Nonpositive Weights 593
Sum of Weights 205284669

Statistics

Std Error

Variable Label N Mean of Mean

---------------------------------------------------------------------------------------------------------------

DR1TCALC Calcium (mg) from food and beverage sources on first 24HR 4448 880.130855 16.722099

DAILYAVG Calcium (mg) from dietary supplements (Estimated daily average) 4438 195.756197 12.013442

TOTCALC Estimated total calcium intake on day of first 24HR from all sources 4438 1077.855751 26.089415

----------------------------------------------------------------------------------------------------------------

Domain Analysis

Gender - Age of Std Error

Adjudicated subject Variable N Mean of Mean

----------------------------------------------------------------------------

Male 20-39 DR1TCALC 709 1139.278271 31.958119

DAILYAVG 708 57.289018 6.022608

TOTCALC 708 1199.794452 34.012767

40-59 DR1TCALC 615 952.019669 32.736675

DAILYAVG 612 155.669226 15.930999

TOTCALC 612 1110.782568 37.304728

>= 60 DR1TCALC 811 825.843492 28.266203

DAILYAVG 810 200.238668 15.845799

TOTCALC 810 1026.113669 38.492539

Female 20-39 DR1TCALC 827 828.993076 32.424725

DAILYAVG 824 128.043456 15.980951

TOTCALC 824 959.569450 39.133256

40-59 DR1TCALC 636 746.093354 26.667954

DAILYAVG 636 278.951411 28.043083

TOTCALC 636 1025.044765 48.354061

>= 60 DR1TCALC 850 719.487424 18.471985

DAILYAVG 848 426.884605 23.980534

TOTCALC 848 1148.799863 28.292083

---------------------------------------------------------------------------

Highlights from the output include: 2/3

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Task4b.htm

12/19/2018 NHANES Dietary Web Tutorial: Estimate Population Mean Intakes: Task 4b

Unlike SUDAAN, this analysis has only one set of output. It is sorted by gender, then by age, and then by variable
of interest. These variables are intakes of calcium from foods and beverages reported on the 24-hour recalls,
intakes of calcium from supplements, and intakes of calcium for foods, beverages and supplements combined.
Females consumed less calcium from foods and beverages, and more calcium from supplements, than did men;
this was true for all age groups. When supplement intakes were combined with those from foods and beverages,
females ages 20-39 consumed less than males of the same age (960 mg. vs. 1,200mg., respectively), whereas in
older age groups females consumed more than males. However, to determine if these differences are statistically
significant, see “Module 16: Test Hypotheses."

Watch animation of program and output
Can't view the demonstration? Try our Tech Tips for troubleshooting help.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/PopulationMeanIntakes/Task4b.htm 3/3

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Basic/Ratios/intro.htm

Estimate Ratios

Purpose

Dietary researchers are often interested in questions that involve examining two variables in relation to one another. They
might wish to know the ratio of whole milk to low-fat milk, the proportion of calcium intake contributed by milk, or the
percentage of energy from fat. Ratios, including proportions and percentages, are useful devices for addressing such
relationships.

Task 1: Estimate Ratios

Whenever ratios involve the division of one variable by another—both of them, by definition, having varying values across
individuals in the population—analysts can use different ways to summarize the ratio, and these different calculations can lead
to different answers. This task describes common methods for calculating ratios in dietary analyses, how they differ, and when
to use each.

Key Concepts about Estimating Ratios (/nchs/tutorials/Dietary/Basic/Ratios/Info1.htm)
How to Estimate a Ratio of Means Using SUDAAN (/nchs/tutorials/Dietary/Basic/Ratios/Task1a.htm)
How to Estimate a Ratio of Means Using SAS (/nchs/tutorials/Dietary/Basic/Ratios/Task1b.htm)
How to Estimate a Mean Ratio Using SUDAAN (/nchs/tutorials/Dietary/Basic/Ratios/Task1c.htm)
How to Estimate a Mean Ratio Using SAS (/nchs/tutorials/Dietary/Basic/Ratios/Task1d.htm)

Task 2: Identifying Important Food Group Sources of Nutrients

There are two different ways to consider food sources of nutrients—as “important” vs. “rich” sources. Rich sources are those
foods with the greatest concentration of a nutrient, whereas important sources are those that contribute the most to a
population’s intake. This task describes methods for identifying important sources.

Key Concepts about Identifying Food Group Sources of Nutrients (/nchs/tutorials/Dietary/Basic/Ratios/Info2.htm)
How to Identify Important Food Group Sources of Nutrients Using SUDAAN (/nchs/tutorials/Dietary/Basic/Ratios
/Task2a.htm)
How to Identify Important Food Group Sources of Nutrients Using SAS (/nchs/tutorials/Dietary/Basic/Ratios/Task2b.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:21 PM

12/19/2018 NHANES Dietary Web Tutorial: Estimate Ratios: Estimate Ratios

Print Text!

Task 1: Key Concepts about Estimating Ratios

Ratios can be used to depict the value of one variable divided by the value of another. A proportion, often expressed as a
percentage, is a kind of ratio that can be used to represent the value of a single variable for one class divided by the value
for all classes combined.

Whenever multiple ratios are involved—either across many individuals in a group or over numerous days of intake for each
individual—analysts can use different ways to summarize them, and these different calculations can lead to different
answers. This is because the calculations involve both summation and division, and an elementary principle of
mathematics dictates that the order of these operations matters. The mathematical properties of ratios are the same,
whether one is considering simple ratios, proportions, or percentages.

In survey analyses involving multiple dietary recalls per person, consideration of which kind of summary ratio to use must
be made at both the group and individual levels.

Group-level Ratios

At the group level, two different, but equally correct, answers can be given in response to the question “What proportion of
the calcium that is consumed comes from milk?” This is because the question can have two different meanings:

How much of all the calcium consumed by the group comes from milk?” (Ratio of Means) or
What is the group’s daily contribution of milk to calcium intake?” (Mean Ratio)

Whenever ratios involve the division of one variable by another—both of them, by definition, free to vary—these two ratios
can be different from one another.

Ratio of Means

The ratio of means is used to answer questions such as, “How much of all the calcium consumed by the group comes from
milk?” It is calculated by summing the amount of calcium from milk for all persons and then dividing that by the
sum of the calcium from all foods for all persons. The answer would be the same if both the numerator and
denominator were divided by a constant, such as the sample size. Therefore, it can also be calculated by dividing the
group’s mean amount of calcium from milk by the group’s mean total calcium, and for this reason it can be thought of as a
ratio of means.

The ratio of means yields information about the diet of the population as a whole because both the numerator and the
denominator are computed for the whole population before the ratio is derived. That is, the whole population has only one
aggregate value and the distribution of the ratio among members of the population is not available. However, the ratio of
means can be obtained for various subgroups in the population, if comparisons are warranted. The ratio of means has
been employed to identify important sources of nutrients in the US diet as a whole and to examine diet quality using to the
Healthy Eating Index-2005.

Mean Ratio

The mean ratio is used to answer questions such as, “What is the group’s daily contribution of milk to calcium intake?” It
is determined by first calculating the proportion of calcium from milk for each person and then taking an
arithmetic mean of all the proportions. Often, the mean ratio is similar to the ratio of means; however, sometimes they
are quite different, depending on the variability in the ratio, variation in the denominator, and the correlation between the
ratio and the denominator.

The mean ratio requires that a ratio be calculated for each person before averaging . When the ratio itself varies
among the population, its distribution can be examined, and the ratio can be studied in relation to other
variables. Also, the distribution of ratios provides other summary statistics, such as the median, the ratio at other
percentiles, and the proportion of the population above or below a certain cut-off, in addition to the mean ratio.
Such statistics have been used in tracking progress toward meeting national health objectives.

When the intent is to say something about how the intake varies among the population, or how the ratio relates to
other factors, deriving the ratio for each person before summarizing (as with the mean ratio) is the method of

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/Ratios/Info1.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Estimate Ratios: Estimate Ratios

choice. However, it should be remembered that the generalizability of such an approach is subject to whatever
time period limitations the individual ratios impose. For example, if the individual ratios each represent only a
single day, then the mean ratio can only be used to make inferences “for a given day,” and relating a single day’s
ratio to some other factor is rarely of interest.

Means should be examined along with their standard errors to get an indication of the variation about the mean.
Special statistical procedures are required to get appropriate standard errors when using data from a complex
sample such as the NHANES. In addition, appropriate sample weights should be applied, if the data are being
used to represent the population as a whole. See “Module 13: Estimate Variance, Analyze Subgroups, and
Calculate Degrees of Freedom” for further information.

Individual-Level Ratios

If the mean ratio is being used at the group level, then an individual-level ratio is needed for each person. If you
are using only one observation per person—such as a single 24-hour recall—then there is only one value for the
numerator and one for the denominator and, therefore, only one way to derive the individual-level ratio. If data
were available for each person’s intake on every day over an extended period, then the individual’s daily ratios
would need to be summarized.

As with group-level ratios, two different questions could be posed: “How much of all the calcium consumed by
this person, over time, has come from milk?” or “What is the person’s daily contribution of milk to calcium
intake?” And again, because the ratios would involve the division of one variable by another, these two ratios
could be different from one another. Although long-term intake observations are not available in NHANES,
available data can be modeled to represent usual intake and, in that case, decisions about which individual-level
ratio to use must be made. That topic will be covered in the Advanced Dietary Analysis course. The example of
the single day’s ratio for each person will be used in the next section, to demonstrate the difference between the
group-level ratio of means and mean ratio.

IMPORTANT NOTE

Summary ratios can be calculated in multiple ways, at both the group and individual levels, depending on the
question of interest. At the group level, the ratio of means is the simplest calculation, and if it is used, individual-
level ratios do not need to be calculated. If the mean ratio is used, then an individual-level ratio is needed. If single
day ratios are being used, it should be noted that the mean ratio represents “a given day” rather than usual
intake. If usual intake estimates are being modeled, then the individual ratio can be either the ratio of usual intake
or the usual ratio of intake (see Advanced Dietary Analysis Course).

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Basic/Ratios/Info1.htm 2/2

Pages:

Click to View FlipBook Version