The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Home Explore NHANES Dietary Web Tutorial_397 pages

NHANES Dietary Web Tutorial_397 pages

Like this book? You can publish your book online for free in a few minutes!

Download PDF

Related Publications

Discover the best professional documents and content resources in AnyFlip Document Base.

Published by smlneyman, 2019-01-16 01:35:47

NHANES Dietary Web Tutorial_397 pages

Pages:

NHANES Dietary Web Tutorial_397 pages

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 1

Print Text!

Task 1: Key Concepts about Measurement Error

The concept of “usual” or long-term average intake is important because dietary recommendations are intended to be met
over time and diet-health hypotheses are based on dietary intakes over the long term. However, there is no perfect dietary
assessment tool to measure usual intake; all self-report dietary assessment instruments are prone to error.

In statistics, an “error” is a deviation of the sample from the true mean. It is estimated by calculating the residual (i.e., the
difference between a point and the sample mean). The variance is the sum of the squared residuals, divided by the
sample size (usually N-1 to create an unbiased estimate). Lots of error leads to a large variance; a small amount of error
leads to a small variance.

When considering variation in dietary intake data, it is important to distinguish variation between people from variation
within people. Between-person variability is a function of the difference between a person’s usual intake and the
population’s usual intake. However, within a person, we also expect variation around his/her usual intake. This type of
variation usually takes two forms: day-to-day variability and measurement error. These are depicted graphically in Figure
1. However, we cannot usually distinguish between these two sources of error, so they are jointly referred to as “within-
person variation”.

Figure 1. Between-person and within-person variation. Between-person variation is represented by the difference
between Person A’s and Person B’s usual intake and the population’s usual intake. The dark blue dots (and jagged line)
represent day to day variation in intake, whereas the light blue dots represent the measurement of intake. Taken together,
these comprise within-person variation.

Within-person variation may be random, resulting in an estimate of usual intake that is unbiased (Figure 2a), meaning that
a person’s true usual intake is estimated accurately on average, although with some error. However, measurement errors
also may be systematic, which lead to bias (Figure 2b). These are “mistakes” in the measurement. For example, a person
may not report intake of sugar in coffee, but drink many cups of coffee per day, resulting in a biased estimate of sugar
intake.

Figure 2a. Random within person errors

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info1.htm 1/5

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 1

Figure 2b. Systematic within person errors

Systematic within-person errors may arise in different ways:

They could occur equally for all NHANES participants. For example, a recipe error would be called a “systematic
additive error,” indicating that a constant error is added to each person’s reported intake. This could lead to over- or
underestimation for all participants by the same amount.
The bias could be related to the true intake of the nutrient or food assessed. This is called “intake-related bias.” For
example, people who consume high amounts of sweets may be less likely to report sweets intake.
Systematic errors also reflect person-specific bias. For example, person A and B may have the exact same intake
of sweets, but person A may accurately report his intake while person B underreports his intake. This could be
related to personal characteristics that are measured, such as obesity, or other unmeasured characteristics.

Like within-person error, between-person error may be random or systematic. When error is random between people, it
results in an unbiased estimate of usual intake for the population. Even with random measurement error within a person, it
is possible to calculate an unbiased estimate for the population, by balancing out overestimation of some individuals with
underestimation for others. With random error, the mean is estimated without bias, but the variance is inflated.

Systematic between-person error may arise if systematic within-person error occurs non-randomly. For example, if the
database was in error for collard greens, but people reported consumption of collard greens to varying degrees, systematic
between-person bias could occur. Of course, with self-report tools like the 24-hour recall and food propensity
questionnaire, systematic between-person error also can result from person-specific bias and intake-related bias.

Various types of bias have different effects on the estimated mean and distribution of usual intakes for a population. When
systematic errors are only additive, the mean of the distribution is shifted, but is otherwise unchanged. Although person-
specific bias results in a biased estimate of the individual’s mean intake, it does not lead to a biased estimate of the group
mean, and does not affect correlation with true intake. At the group level, the person-specific bias cancels out, but results
in a distribution with a larger variance, and decreased correlation with true intake. Systematic intake-related bias,
however, can shift the mean, and may also change the correlation with true intake. Depending on the direction of the bias
– whether it increases or decreases with intake – the correlation may be stronger or weaker.

Importantly, these types of systematic errors do not usually occur in isolation. When interest is on relating diet to a health
parameter, what is often observed is a “flattened slope” effect (Figure 3). Those with the lowest levels of intake tend to
overreport, and those with the highest levels of intake underreport; this results from a combination of intake-related bias

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info1.htm 2/5

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 1

and systematic error. These errors are often accompanied by person-specific bias, so the direction of the shift of the mean
and the correlation between the assessment tool and truth is not always clear.

Figure 3. The effects of random error on the relationship between usual intake and a health parameter. The black
dots and solid regression line represent the true relationship, and the blue triangles and dashed line represent the
observed attenuated relationship.

Statisticians have proposed models to separate the different sources of error using a measurement error model. When an
unbiased estimate of truth is available, the different types of errors may be estimated.

Among the most frequently used methods of assessing dietary intake are the 24-hour recall and the food frequency
questionnaire (FFQ). The FFQ administered in NHANES 03-06 does not have portion size assessment. The 24-hour
recall and the FFQ have key differences. The FFQ is focused on intake over an extended period. It captures the majority of
a person's diet, but is limited to foods on the instrument. Because of this and cognitive difficulties in recalling typical intake
over a long period, FFQ reports also fail to truly reflect a person's long-term average daily intake.

In contrast to the FFQ, during a recall, people are asked to report everything eaten and drunk during the previous 24
hours. Therefore, 24-hour recalls are generally preferred to the FFQ due to their ability to capture rich details about daily
intake of every item consumed (when, how, how much, with what). Validation studies have shown that the 24-hour recall is
less prone to measurement error than an FFQ. However, the biggest strength of the 24-hour recall also may be
considered its biggest limitation. Because food intake is only captured for one day, and most individuals’ diets vary from
day to day, one day of intake is not sufficient to capture usual intake for an individual. That is, a single recall does not
reflect a person's long-term average daily intake; it represents only a "snapshot in time."

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info1.htm 3/5

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 1

Validation studies have examined reported intakes on 24-hour recalls and FFQs and compared them to biomarkers for
energy and protein to try to understand the structure of measurement error for these self-report instruments. Both 24-hour
recalls and food frequency questionnaires have been shown to be prone to all of the systematic and random sources of
measurement error discussed above when measuring energy and protein (Kipnis et al., 2003; Neuhouser et al., 2008).
Because total energy is prone to error, at least some foods are subject to being reported with error on 24-hour recalls.
However, it is not possible to know the impact of measurement error on other nutrients or individual foods because
unbiased biomarkers are not available for other nutrients. In spite of this, in all of the methods described in the Dietary
Tutorial, we make the assumption that the 24-hour recall is an unbiased instrument, i.e., that it is subject only to random
within-person and between-person error, but not additive and intake-related error. It is important to acknowledge this
limitation of the 24-hour recall data when reporting the results of NHANES dietary intake analyses.

Even random error, however, may affect the estimates of usual intake from one or two 24-hour recalls. Figure 3 illustrates
the distribution curves from one 24-hour recall, the average of two recalls, and true usual intake. In surveillance, one may
be interested in examining mean intakes or estimating the fraction of the population above or below a cutpoint. If our
interest is in estimating the mean intake, recall data for one day will be adequate because with random error, the mean is
unbiased. However, random error results in inflated variance. Thus, if interest is in measuring the percentage of the
population whose intakes fall above or below a cutpoint, biased estimates of the prevalence of inadequate or excess intake
will be obtained with only one day of data. Even using the mean of two days will lead to biased estimates of inadequate or
excess intake. Therefore, statistical methods are needed to adjust for measurement error.

Figure 4. Hypothetical distribution of usual intake of a nutrient (black solid line), contrasted with the estimated
distribution from one 24-hour recall (gray dotted dashed line) or two day average of 24-hour recalls (blue dashed
line). The vertical dashed line represents a hypothetical cutpoint of interest.

With only 2 days of recall data, statistical modeling is needed to account for random measurement error. This course
describes statistical methods for estimating the effects of individual variables on usual intake, estimating the distribution of
usual intake, and estimating usual intake for use in relating it to health parameters.

References:

Kipnis V, Subar AF, Midthune D, Freedman LS, Ballard-Barbash R, Troiano RP, Bingham S, Schoeller DA, Schatzkin A,
Carroll RJ. Structure of dietary measurement error: results of the OPEN biomarker study. American Journal of
Epidemiology 2003 Jul 1;158(1):14-21; discussion 22-6.

Neuhouser ML, Tinker L, Shaw PA, Schoeller D, Bingham SA, Horn LV, Beresford SA, Caan B, Thomson C, Satterfield S,
Kuller L, Heiss G, Smit E, Sarto G, Ockene J, Stefanick ML, Assaf A, Runswick S, Prentice RL. Use of recovery

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info1.htm 4/5

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 1

biomarkers to calibrate nutrient consumption self-reports in the Women's Health Initiative. American Journal of
Epidemiology 2008 May 15;167(10):1247-1259.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info1.htm 5/5

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 2

Print Text!

Task 2: Key Concepts about Statistical Methods that have been used to
Estimate the Distribution of Usual Intake with a Few Days of 24-hour Recalls

Early attempts to compensate for the random error from the use of 24-hour recalls by averaging multiple (two to seven) 24-
hour recalls per respondent were deemed unsatisfactory due to high respondent burden. Moreover, averages over a small
number of days do not adequately represent individual usual intakes due to the large amount of random error. Thus, more
sophisticated methods based on statistical modeling evolved.

A few statistical methods have been developed to estimate the distribution of usual intake in a population and will be
described in detail later in this section. These include the National Research Council (NRC) Method (National Research
Council, 1986), the Iowa State University (ISU Method) (Nusser et al., 1996a), a simplification of the ISU Method called
the Best Power (BP) Method (Dodd, 1996), the Iowa State University Foods (ISUF) Method (Nusser et al., 1996b), and a
statistical method developed at the National Cancer Institute (NCI Method) (Tooze, 2006). Each statistical method makes
the assumption that the 24-hour recall is prone to random, not systematic error. For estimating dietary constituents that
are ubiquitously-consumed (consumed nearly every day by nearly everyone; generally refers to nutrients) and episodically-
consumed (consumed sporadically and not by everyone; generally refers to foods), methods also must meet the following
challenges:

A. Distinguish within-person from between-person variation, and

B. Account for consumption-day amounts that are positively skewed.

For episodically-consumed dietary constituents, two additional challenges must be addressed:

C. Account for reported days without consumption of a dietary constituent, and

D. Allow for the correlation between the probability of consuming a dietary constituent and the consumption-
day amount.

There is often interest in a final challenge, which is applicable to both ubiquitously-consumed and episodically-consumed
dietary constituents:

E. Relate covariate information to usual intake.

With 2 days of 24-hour recalls, all of the statistical methods that have been developed meet Challenge A. Between-person
variation in usual intake represents the variability of usual intake of a dietary constituent in the population. Within-person
day to day variability and measurement error is a nuisance for estimating usual intake. Therefore, the statistical methods
isolate the between-person variation and then estimate the distribution of usual intake from the estimate of the between-
person variance. The partitioning done in the statistical models is similar to the partitioning of total variability in a random
ANOVA model into variability between individuals and variability within individuals.

Unfortunately in dietary assessment, dietary constituents are rarely normally distributed, yet the statistical methods used
require a normality assumption. Therefore these methods must meet Challenge B, to account for positively (or right)
skewed distributions (having a small number of very large values instead of exhibiting a normal distribution’s symmetry
about its mean). To reconcile the desire to use the statistical properties of the normal distribution with the need to model
inherently non-normal data, statisticians often assume that a normal distribution approximates the distribution of a
(nonlinear) transformation of the observed data, rather than the observed data themselves. For example, if the data have
a highly skewed distribution, then the distribution obtained by taking the logarithm of each observation may be symmetric,
and therefore be better-approximated by a normal distribution. In this example, we say that the data have been
“transformed” to the log scale. For less-skewed data, weaker transformations, such as the square root and cube root, are
often sufficient to achieve approximate normality. If a particular transformation produces normally-distributed data, the
distribution of untransformed data can be described in terms of the normal distribution and the transformation.

The general process used in modeling is illustrated in Figure 5. On the original scale, data are not normally distributed
(A). First, data are transformed to approximate normality (B). With normally distributed data, the distribution can be fully
described by the mean and variance. Next, the within-person variation is removed, leading to a “skinnier” distribution,
reflecting the distribution of usual intake (C). Finally, the data are backtransformed to the original scale (D). The
backtransformation is the expression that related values in the transformed scale to usual intake in the original scale. All of
the methods use this general approach, although there are differences as to how it is done. The methods vary regarding
the assumption as to whether the intake is unbiased on the original or the transformed scale. When the unbiasedness is
assumed on original scale, the methods must apply a “correction factor” so the mean of the backtransformed data is the

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info2.htm 1/3

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 2

same as the mean of the data on the original scale. If unbiasedness is assumed on the transformed scale, the correction
factor is not necessary (for details see Dodd et al., 2006).

Figure 5. An illustration (for folate) of the transformation process used in statistical modeling of usual intake

When dietary constituents are consumed episodically, a spike in the distribution at zero will occur as a result of
observations with no consumption on recall days. Therefore, statistical methods for estimating usual intake of episodically-
consumed dietary constituents must meet Challenge C, accounting for reported days without consumption of the dietary
constituent. Furthermore, the probability of consuming a dietary constituent is often positively correlated with the amount
that is consumed on the consumption day. For example, people who have a higher probability of consuming whole grains
tend to eat more of them on the days on which they are eaten. Statistical methods must meet Challenge D, allowing for
the positive correlation between consuming a constituent and the amount consumed on a consumption day.

Finally, for all dietary constituents, there is often interest in incorporating covariates into statistical modeling. This is
Challenge E. Modules 19-21 cover various aspects of this challenge:

Module 19 discusses how to incorporate covariates into modeling to describe differences in intake by personal
characteristics.
Module 20 discusses making estimates of the distribution of usual intake for subpopulations through use of
covariates.
Module 21 discusses how to incorporate covariates to relate usual intake to health parameters using a regression
calibration approach.

Several statistical methods have been used to estimate usual intake of ubiquitously-consumed dietary constituents. A
common method is to use the average of 2 or more

days of recall data for a person. This is referred to as the within-person means method in this tutorial. Unfortunately, this
method usually leads to biased estimates of the prevalence of either inadequate or excess food intake because it does not
meet any of the challenges listed above, especially challenge A. The National Research Council (NRC) Method (National
Research Council, 1986) was the first statistical method developed to estimate the distribution of ubiquitously consumed
dietary constituents; it meets challenge A and challenge B when a simple transformation to approximate normality can be
used. Later, researchers at Iowa State University (ISU Method) (Nusser et al., 1996) proposed the use of a more complex
model, which incorporated a two-step transformation procedure. This procedure, which can be used with complex
datasets, meets challenges A and B, as does a simplification of the ISU Method called the Best Power (BP) Method,
(Dodd, 1996) when a simple transformation to approximate normality is appropriate. None of these methods incorporate
covariates (Challenge E), although the ISU Method allows for preliminary data adjustments such as interview sequence or
day of the week. Because these methods meet challenges A and B, they almost always produce less biased estimates of
usual intake than the within-person mean.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info2.htm 2/3

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 2

Another important distinction between the NRC Method and the ISU and BP Methods is the assumption regarding whether
the 24-hour recall data is unbiased on the original scale or the transformed scale. The NRC Method assumes
unbiasedness on the transformed scale, whereas the ISU and BP method assume unbiasedness on the original scale.

Until recently, only two methods have been developed to estimate the distribution of usual intake of episodically-consumed
dietary constitutents using 2 days of 24-hour recall data. The Iowa State University Foods (ISUF) Method meets
challenges A, B, and C. The premise of the ISUF Method is that usual intake is equal to the probability of consumption on
a given day times the average amount consumed on a "consumption day." It models zero observations separately from
positive (consumption day) observations; the ISU Method is used to model the positive observations. However, the
method does not allow for correlation between probability and amount (Challenge D) and, therefore, is not applicable for
use in modeling dietary constituents that exhibit this positive correlation. Additionally, it cannot incorporate covariate
information regarding usual intake (Challenge E), although it does adjust for day of week and sequence effects. A new
statistical method has been developed at the National Cancer Institute (NCI Method) to meet all five of the challenges
noted above. This method is described in detail in Task 3 of this module.

The macros to fit the NCI method may be downloaded from the NCI website. Software for fitting the ISU method is
available from the Center for Survey Statistics and Methodology at Iowa State University.

References:

Dodd KW. A Technical Guide to C-SIDE: Software for Intake Distribution Estimation. 1996. Technical Report 96-TR 32,
Dietary Assessment Research Series Report 9, Department of Statistics and Center for Agricultural and Rural
Development. Iowa State University.

Dodd KW, Guenther PM, Freedman LS, Subar AF, Kipnis V, Midthune D, Tooze JA, Krebs-Smith SM. Statistical methods
for estimating usual intake of nutrients and foods: a review of the theory. Journal of the American Dietetics Association
2006 Oct;106(10):1575-1587.

National Research Council. Nutrient Adequacy: Assessment Using Food Consumption Surveys. 1986. Washington, DC,
National Academy Press.

Nusser SM, Carriquiry AL, Dodd KW, Fuller WA. A semi-parametric transformation approach to estimating usual nutrient
intake distributions. Journal of the American Statistical Association 1996; 91:1440-1449.

Nusser SM, Fuller WA, Guenther PM. Estimation of usual dietary intake distributions: adjusting for measurement error and
nonnormality in 24-hour food intake data. In: Trewin D, ed. Survey Measurement and Process Quality. New York, NY:
Wiley; 1996:689-709.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info2.htm 3/3

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 3

Print Text!

Task 3: Key Concepts about Using a Unified Framework to Estimate Usual
Dietary Intakes

Overview of the NCI Method

In collaboration with colleagues from numerous institutions, the National Cancer Institute (NCI) has developed a unified
framework to predict usual dietary intakes of episodically-consumed or ubiquitously-consumed dietary constituents using
two or more 24-hour recalls for at least a subset of a sample. This method can be used for a variety of general
applications, including:

estimating the distribution of usual episodically-consumed or ubiquitously-consumed dietary constituent intakes for a
population or subpopulation;
examining the relationship between intake of an individual episodically-consumed or ubiquitously-consumed dietary
constituent and some health or disease indicator; and
examining the effects of individual covariates on episodically-consumed or ubiquitously-consumed dietary
constituent consumption.

The NCI method provides one way of estimating usual intake, but it is not the only method available.

Like the ISUF Method, the premise of the NCI method is that usual intake is equal to the probability of consumption on a
given day times the average amount consumed on a "consumption day." The exact methods used for dietary components
that are consumed nearly every day by nearly everyone (ubiquitously consumed) differ from those used for dietary
components that are not (episodically consumed). In general, the former category refers to nutrients and the latter category
refers to foods.

For episodically-consumed dietary constituents, a two-part model with correlated person-specific effects is used to model
usual intake. The first part of the model estimates the probability of consuming an episodically-consumed dietary
constituent using logistic regression with a person-specific random effect. The second part of the model specifies the
consumption-day amount using linear regression on a transformed scale, also with a person-specific random effect. The
person-specific effects represent the deviation of the individual’s probability of consumption and amount of intake from the
population mean. Because these effects are specific to individuals, they vary only between individuals; therefore, they
capture the between-person variation of usual intake in the population. The two parts of the model, probability and
consumption-day amount,are linked by allowing the two person-specific effects to be correlated and by including common
covariates (e.g., age, sex) in both parts of the model. Intake data from 24-hour recalls provide the values for the dependent
variable; one or no covariates may be incorporated into the statistical model. The resulting estimated model parameters
can then be used to estimate the final products, depending on the application of interest.

For a ubiquitously-consumed dietary constituent, the probability part of the model is not needed because the probability of
consumption is assumed to be one. With a ubiquitously-consumed dietary constituent, however, zero intakes may
occasionally occur. In this case, they are set to one-half of the minimum value in the sample.

Has the NCI method been validated?

Evidence for the validity of the NCI method, as it relates to estimating the distribution of usual intakes of episodically
consumed dietary constitutents, has been published through a series of papers in the Journal of the American Dietetic
Association. Simulation studies show that the NCI method for foods is an improvement over existing methods, including
the two-day mean and ISUF method (Tooze, 2006). Methodology for estimating usual food intakes for use in a regression
analysis -- for example, to examine relationships between diet and health -- have been published in Biometrics (Kipnis et
al., 2009). Analyses establishing the validity of the method to estimate the distribution of usual intakes of ubiquitously-
consumed dietary constituents indicate that the NCI method provides better estimates than using a 2-day mean, and
generally provides results that are very similar to the ISU method (Tooze, 2010), although a thorough investigation of the
comparison across a wide variety of ubiquitously-consumed dietary constituents has not been done.

How are covariates incorporated into the NCI Method?

Covariates, which are incorporated into the NCI method through the two regression models, may be used to explain some
of the variation of usual intake both between and within individuals. Fitting a categorical variable allows the mean usual
intake to vary for each level of the category. For example, incorporating interview sequence as a covariate in the modeling
allows the usual intake to be shifted by a constant amount, depending on whether the data were gathered on day 1 or day
2. Adjusting for this covariate would adjust for the tendency of day 2 recalls having lower usual intake than day 1 recalls, if
this were to occur.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info3.htm 1/4

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 3

In general, two broad classes of covariates may be incorporated into the NCI models: covariates that vary between
individuals, and covariates that may vary within an individual over time. Some examples of this first class of covariates
include demographics or other personal characteristics. These types of covariates may be used to explain variation in
usual intakes or make estimates for a subpopulation of interest. The latter class of covariates may (or may not) differ from
day 1 to day 2 of the recalls. They include weekend effects, interview season effects, and any other variables that may
vary from day to day. The purpose of incorporating these variables into the model is to adjust for differences due to known
day-to-day variation, such as different eating patterns on weekends versus weekdays or by season, and differences in
reporting that may occur with the repeated administration of the dietary recall.

In general, estimating the distribution of usual intake does not require the use of covariates to explain variation between
persons. This is because the total between-person variation (both explained and unexplained) is reflected in the
estimates. However, incorporating covariates may be useful for defining subpopulations for which you would like to
estimate the distribution of usual intake of a dietary component. Covariates also may be used to reflect the distribution of
usual intake in the week or year, rather than in the NHANES sample. For example, by incorporating a weekend covariate,
it is possible to obtain estimates that reflect differences in intake by weekend and weekday (and by season). In
applications relating usual intake to other variables, explaining the unknown variation in usual intake is beneficial, and is
necessary for some procedures, such as regression calibration.

How does the NCI method adjust for weekend or weekday consumption?

The NCI method adjusts in two ways for weekend or weekday consumption of the recall when estimating the usual intake
distribution. First, the weekend vs. weekday indicator variable is incorporated as a covariate in the modeling. Second,
different estimates of intake for weekend and weekday are weighted (by 3/7 and 4/7 respectively) when estimating the
distribution of usual intake. This ensures that the estimates reflect overall intake in the population rather than intake of the
sampled days.

Because an additional post-stratification step is performed for the dietary weights to balance recalls across days of the
week, it may seem as though the NCI method is “overadjusting” for weekend and weekday consumption. However, the
dietary weights are created to balance the sampling over the week. The weekend/weekday covariate is used to model
different levels of consumption by weekend/weekday and to estimate the distribution of usual intake accordingly.

What part does the frequency instrument play in the NCI method? Under what circumstances is it
helpful?

The NCI method involves using two or more 24-hour recalls as well as covariates, which may include data from an FFQ
such as the NHANES 2003-2006 Food Frequency Questionnaire (formerly called Food Propensity Questionnaire). A
frequency instrument can improve the power to detect relationships between dietary intakes as predictor variables and
other variables. The magnitude of improvement depends on the proportion of zeroes in the dietary constituent, with the
FFQ having a greater impact on dietary constituents with a large number of zero intakes (i.e., episodically-consumed
foods).. However, when applying the NCI method to estimate usual intake distributions, satisfactory results can generally
be obtained without the FFQ as a covariate.

What are the assumptions of the NCI method?

For a particular episodically-consumed dietary constituent such as a food, if the food is not reported on the 24-hour recall,
it is assumed to indicate no consumption of the food on that day. The NCI method assumes that the 24-hour recall is an
unbiased instrument for measuring usual intake on the original scale -- in other words, that it does not misclassify the
respondent's consumption, and that it provides an unbiased measure of the amount consumed on a consumption day. In
other words, it assumes that the amount consumed is prone only to random, not systematic error.

What important caveats are associated with the NCI method?

Many studies have found misreporting of energy intake on both 24-hour recalls and food frequency instruments, almost
always in the direction of underreporting. This suggests that some foods are underreported. Furthermore, there is
evidence for intake-related bias and person-specific bias on 24-hour recalls using recovery biomarker (i.e., a biomarker
that recovers intake without systematic bias, for energy and protein). However, without a recovery biomarker for each
dietary constituent of interest, it is not possible to correct for systematic errors on the 24-hour recall.

If only a limited number of repeated 24-hour recalls are available, reliable separation between non-consumers, irregular
consumers, and always-consumers is not possible. Therefore, in the absence of extra information about ever- vs. never-
consumption, the NCI method does not estimate the proportion of non-consumers/always-consumers of a given food.

Fitting the NCI Method

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info3.htm 2/4

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 3

An overview of the steps involved in fitting the NCI method is provided below, and an example of the applications of
interest is found in Table 1. The first step of any of the applications of the NCI method is to fit the statistical model. Then,
depending on the application of interest, the parameters are used in different ways.

Step 1: Fit a two-part statistical model with correlated person-specific effects. Then, use the estimated model
parameters to complete Step 2.

Step 2: Estimate final products depending on application of interest

Table 1. Statistical tasks and applications addressed by the NCI method and the SAS macros available for each
task.

Statistical Task Typical Application SAS
Macro
Estimating the mean and distribution What proportion of the population has usual
of intake for a population intakes above/below a cut-off? MIXTRAN
What is the intake at the nth percentile?
DISTRIB

Estimating the mean and distribution What proportion of subpopulation has usual MIXTRAN
of intake for a subpopulation intakes above/below cut-off?
DISTRIB
Estimating individual food intake to What proportion of subpopulation has usual MIXTRAN
make etiologic inferences intakes above/below cut-off?
DISTRIB
Approximating the effects of
individual covariates on food intake Which variables are associated with the INDIVINT
probability to consume a dietary constituent? MIXTRAN
Which are associated with the consumption-
day amount?

Macros for fitting the NCI Method are available on the NCI website. Version 1.1 of the macros was used in this tutorial.
We recommend that you check this website for macro updates before starting any analysis. Additional details regarding
the macros and additional examples may also be found on the website.

References:

Dodd KW, Guenther PM, Freedman LS, Subar AF, Kipnis V, Midthune D, Tooze JA, Krebs-Smith SM. Statistical methods
for estimating usual intake of nutrients and foods: a review of the theory. Journal of the American Dietetics Association
2006;106(10):1640-1650.

Kipnis V, Midthune D, Buckman DW, Dodd KW, Guenther PM, Krebs-Smith SM, Subar AF, Tooze JA, Carroll RJ,
Freedman LS. Modeling data with excess zeros and measurement error: application to evaluating relationships between
episodically consumed foods and health outcomes. Biometrics 2009; 65:1003-1010.

Subar AF, Dodd KW, Guenther PM, Kipnis V, Midthune D, McDowell M, Tooze JA, Freedman L, Krebs-Smith SM. The
Food Propensity Questionnaire (FPQ): concept, development and validation for use as a covariate in model to estimate
usual food intake. Journal of the American Dietetics Association 2006;106(10):1556-1563.

Tooze JA, Kipnis V, Buckman DW, Carroll RJ, Freedman LS, Guenther PM, Krebs-Smith SM, Subar AF, Dodd KW. A
mixed-effects model approach for estimating the distribution of usual intake of nutrients: the NCI method. Statistics in

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info3.htm 3/4

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 3

Medicine 2010 Nov 30;29(27):2857-2868.

Tooze JA, Midthune D, Dodd KW, Krebs-Smith SM, Subar AF, Carroll RJ, Kipnis V. A new statistical method for estimating
the distribution of usual intake of episodically consumed foods. Journal of the American Dietetics Association
2006;106(10):1575-1587.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info3.htm 4/4

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 4

Print Text!

Task 4: Key Concepts about Using Balanced Repeated Replication (BRR)

NHANES survey design affects variance estimates

As stated in the module on sampling in NHANES (Continuous Tutorial, Module 10), the NHANES has a complex,
multistage, probability cluster design. Typically, individuals within a cluster (e.g., county, school, city, census block) are
more similar to one another than to those in other clusters and this homogeneity of individuals within a given cluster is
measured by the intra-cluster correlation. When working with a complex sample, it is preferable to decrease the amount of
correlation between sample persons within clusters. To achieve this, we recommend sampling fewer people within each
cluster but sampling more clusters. However, because of operational limitations (e.g., cost of moving the survey mobile
examination centers [MECs], and geographic distances between primary sampling units [PSUs]), NHANES can sample
only 30 PSUs within a 2-year survey cycle. The sample size in each PSU is roughly equal and it is intended to yield about
5,000 examined persons per year.

In a complex sample survey setting such as NHANES, variance estimates computed using standard statistical software
packages that assume simple random sampling are generally too low (i.e., significance levels are overstated) and biased
because they do not account for the differential weighting and the correlation among sample persons within a cluster.
Some statistical software packages can incorporate differential weighting, but only a few account for both differential
weighting and the correlation among sample persons.

IMPORTANT NOTE

Standard statistical software packages that assume simple random sampling calculate variance estimates that are
generally too low and biased because they do not account for differential weighting and the correlation among sample
persons within a cluster.

Balanced repeated replication (BRR) is a statistical method for estimating sampling variability of a statistic, taking into
account NHANES’ complex sample design. This method is described in the following section.

Overview of Balanced Repeated Replication

In BRR, half of the sample is used at a time, including one of two PSUs from each stratum. The variance of the parameter
of interest, , is estimated by calculating the parameter for a half sample, h , repeating this process for many half
samples, and then computing the variances of the different parameter estimates. When the parameter is computed for the
half sample, the sample weights of the observations in the PSUs are doubled. For H half samples, the variance is given
by:

Equation 1.

With S strata, 2S replicates can be formed. However, it is possible to pick the half samples according to a particular
pattern so that just some of the possible replicates are chosen; this is what is done in BRR. The pattern is from a
Hadamard matrix. The Hadamard matrix is used to select the H “balanced” replicates. The number of replicates that are
needed for BRR is the smallest integer that is divisible by 4 and is greater than or equal to S. For 2 years of NHANES
data, this number is 16.

The method described in equation (1) above is standard BRR. In some situations, a modification of BRR, called Fay’s
method is needed to compute standard errors. In this method, the sample weights from BRR are not zero weighted in one
half of the sample, and double-weighted in the other half. Instead, they are weighted by a factor F (F is a proportion that
ranges between 0 and 1). This factor weights down one half of the sample by F, and the other half is weighted up by 2-F.
For example, when F=0.3, the weights are decreased by 30% in one half sample and increased by 70% in the other half
sample. The weights given in the dataset demoadv for the advanced dietary tutorial use F=0.3. When Fay’s method is
used, the estimated variance is computed as:

Equation 2.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info4.htm 1/2

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 4

Modeling the complex survey structure of NHANES requires procedures that account for both differential weighting of
individuals and the correlation among sample persons within a cluster. The NCI method calls the SAS procedure
NLMIXED, which can account for differential weighting by using the replicate statement. The use of BRR to calculate
standard errors accounts for the correlation among sample persons in a cluster. Therefore, NLMIXED (or any SAS
procedure that incorporates differential weighting) may be used with BRR to produce standard errors that are suitable for
NHANES data without using specialized survey procedures.

IMPORTANT NOTE
Note: The SAS procedure NLMIXED requires the use of integer weights.

Reference:

Korn EL, Graubard BI. Analysis of Health Surveys. Wiley, New York, 1999.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info4.htm 2/2

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 1

Print Text!

Task 4: How to Estimate Standard Errors with Balanced Repeated
Replication (BRR) Using SAS

In this example, we calculate the standard error of sample mean of day 1 calcium intake from foods and beverages. It is
possible to use the SAS procedure surveyfreq to calculate the sample mean and corresponding standard error. (In version
9.2, BRR is available as an option, although it cannot be used in combination with the domain statement.) Therefore, the
purpose of this example is not to suggest using BRR for this purpose (although it could be done), but to orient readers to
the BRR technique that is used throughout the Advanced Dietary Tutorial, with the illustration of a simple example. For the
complex models fit in Modules 19-22, BRR is needed to estimate standard errors.

The BRR weights provided in the Advanced Dietary Tutorial dataset demoadv were created by researchers at the Food
Surveys Research Group, Agricultural Research Service, US Department of Agriculture.

All of the examples that use BRR in the Advanced Dietary Tutorial use a common structure – the SAS macro. A SAS
macro is a useful technique for rerunning a block of code when the analyst only wants to change a few variables. In the
case of BRR, it is the sample weights that change in each iteration of the macro call. There are 17 weights in the file. The
weight with the _0 (w0304_0) is used for the point estimate of the mean, and the other weights (w0304_1 to w0304_16)
are used for the BRR procedure. The main attraction of BRR is that it produces standard errors suitable for NHANES data
without using the specialized survey procedures, as long as the differential weighting can be incorporated into each half-
sample estimate. Therefore, in this example we use the proc means procedure.

Step 1: Determine variables of interest

This example uses the demoadv dataset (download at Sample Code and Datasets). This dataset contains a variable
dr1tcalc that has a value of calcium intake from day 1 of the 24-hour recall. It also contains BRR weights (w0304_0 to
w0304_16). A new dataset is created (wwts) that has the people with BRR weights (those who completed the 24-hour
recall).

Step 2: Use a SAS macro and the means procedure to generate a mean and standard error using
BRR

Statements Explanation

%macro BRR184; The %macro statement is used to
indicate the start of the macro BRR184.
proc means data=demoadv noprint; The statements between the %macro
where sel= 1 ; line and the %mend line will be run by
var dr1tcalc; SAS each time the macro is invoked.
weight w0304_0;
output out=m(drop=_type_ _freq_) The proc means procedure is used to
mean=m_0; estimate the mean of the calcium intake
run; from the first day of 24-hour recalls
(dr1tcalc). The first BRR weight
(w0304_0) is used for the point estimate
of the mean. The output statement
saves the mean calcium value (m_0) in
a SAS temporary dataset called m.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Task4.htm 1/3

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 1

Statements Explanation

data m; This data step adds a variable
set m; “mergeby” to the m dataset that will be
mergeby=1; used to merge the point estimate of the
mean with the BRR runs.
run;

%do i = 1 %to 16; This statement is used within the SAS
proc means data=demoadv noprint; macro to create a macro variable &i that
var dr1tcalc; will be used to indicate the BRR weight
weight w0304_&i; that is used (i.e., w0304_&i), and save
output out=tmp(drop=_type_ _freq_) the value of the mean from the run using
mean=m_&i; that BRR weight (m_&i). The & symbol
run; indicates to SAS to replace the macro
variable &i with its value (i.e., the
number 1, 2, …, 16). The statements
between the %do and the %end will be
executed 16 times, each time
incrementing &i by 1.

The proc means procedure is run to
calculate the mean using the BRR
weights (w0304_1 to w0304_16). The
tmp dataset stores the mean for each
BRR run.

data tmp; This data step adds a variable
set tmp; “mergeby” to the tmp dataset that will be
mergeby=1; used to merge the point estimate of the
mean with the BRR runs.
run;
data m; This data step merges the BRR runs
with the estimate of the mean from the
merge m tmp; first run.
by mergeby;
run;

%end; The %end statement indicates the end
of the %do processing.
data brr;
set m; The array statement is also used for
array reps (16) m_1 - m_16; repeated processing of a variable. In
do i=1 to 16; this case, we want to take the means
reps(i) = reps(i) - m_0; from the runs with weights w0304_1 to
end; w0304_16 and subtract off m_0 from
brrse=sqrt(uss(of m_1-m_16)/(16 each of them in the do loop. Next, the
BRR standard error (brrse) is computed
* .49)); by dividing the uncorrected sum of
run; squares (uss function) of these
differences by H (16) and (1-F)2 (0.49),
and taking the square root of this value.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Task4.htm 2/3

12/17/2018 NHANES Dietary Web Tutorial: Modeling Usual Intake Using Dietary Recall Data: Task 1

Statements Explanation

title 'Mean Calcium Intake of Adults This code prints out the mean and

>=50 years' ; corresponding standard error

proc print data=m;
var m_0 brrse;

run;

%mend BRR184; The %mend statement indicates the end
%BRR184 of the BRR184 macro.

The statement %BRR184 invokes SAS
to run the BRR184 macro.

Step 3: Interpret Results

The estimated daily calcium intake from a single day in adults older than age 50 years is 780.5 mg. The BRR
standard error is 17.7 mg.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Task4.htm 3/3

12/17/2018 https://www.cdc.gov/nchs/tutorials/dietary/Downloads/mod18 task4 SAS.sas

LIBNAME NH "C:\NHANES\DATA";

*-------------------------------------------------------------------------;
* Use the PROC FORMAT procedure to assign text labels to the numeric ;
* values of user-defined formats ;
*-------------------------------------------------------------------------;

proc format;
value sel
1='age >=50 yrs'
2='age <50 yrs';

run;

*--------------------------------------------------------------------------;
* Create a dataset from the permanent dataset libname.demoadv ;
*--------------------------------------------------------------------------;

data demoadv;
format sel sel.;

set nh.demoadv;
if ridageyr ge 50 then sel=1;

else sel=2;
run;

*--------------------------------------------------------------------------;
* Use surveymeans to calculate the mean ;
*--------------------------------------------------------------------------;

%macro BRR184;

proc means n mean min max data=demoadv noprint; where sel=1;
var dr1tcalc;
weight w0304_0;
output out=m(drop=_type_ _freq_) mean=m_0;
run;

%do i = 1 %to 16;

proc means n mean min max data=demoadv noprint; where sel=1;
var dr1tcalc;
weight w0304_&i;
output out=tmp(drop=_type_ _freq_) mean=m_&i;
run;

data m;
merge m tmp;

run;

%end;

data brr;
set m;
array reps (16) m_1 - m_16;
do i=1 to 16;
reps(i) = reps(i) - m_0;
end;
* The .49 is (1-f)^2;
brrse=sqrt(uss(of m_1-m_16)/(16 * .49));

run;

title 'Mean Calcium Intake of Adults >=50 years';
proc print data=brr;

var m_0 brrse;
run;

https://www.cdc.gov/nchs/tutorials/dietary/Downloads/mod18%20task4%20SAS.sas 1/2

12/17/2018 https://www.cdc.gov/nchs/tutorials/dietary/Downloads/mod18 task4 SAS.sas

%mend BRR184;
%BRR184

https://www.cdc.gov/nchs/tutorials/dietary/Downloads/mod18%20task4%20SAS.sas 2/2

NHANES Dietary Web Data Tutorial - Evaluating the Effects of Covariat... https://www.cdc.gov/nchs/tutorials/dietary/advanced/EvaluateCovariates...

Evaluating the Effects of Covariates on Usual Dietary Intake

Purpose

Researchers are often interested in variables that are associated with intake of dietary constituents. For example, they may
want to know whether a personal characteristic, like age, is associated with consumption of a particular food or nutrient. This
type of inference can be made from fitting the model that is used in the method developed by researchers at NCI and elsewhere
(i.e., the “NCI method”). The term “dietary intake” in this module refers to food and beverages reported on the 24-hour recalls.

Task 1: Evaluating the Effects of Covariates on Usual Intake of a Single Ubiquitously-
Consumed Dietary Constituent

When a dietary component is ubiquitously consumed, as is the case for many nutrients and some food groups, a single-part
model can be used to estimate the amount of the dietary component consumed because it is not necessary to estimate the
probability of consumption as in a two-part model. This task describes how to fit the model with covariates and how to evaluate
the effects of the covariates on usual intake consumption.

IMPORTANT NOTE
Many of the statistical methods used in this course are advanced, and may require consultation with a statistician. For modules
18-22, it is required that you have the statistical knowledge of mixed effects models, and program knowledge of calling in SAS
macros. Since Module 18 provides the background information for Modules 19-22, it is advised that you carefully read Module
18 first before tackling other modules.

Key Concepts about Evaluating the Effects of Covariates on Usual Intake of a Single Ubiquitously-Consumed Dietary
Constituent (/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Info1.htm)
How to Evaluate the Effects of Covariates on Usual Intake of a Single Ubiquitously-Consumed Dietary Constituent (/nchs
/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm)
Download Sample Code and Datasets (/nchs/tutorials/Dietary/downloads/downloads.htm)

Task 2: Evaluating the Effects of Covariates on Usual Intake of a Single Episodically-
Consumed Dietary Constituent

When a dietary component is episodically consumed, as is the case for many foods and food groups, a two-part model to
estimate the probability of consumption and the amount of the dietary constituent consumed on the consumption day is used.
This task describes how to fit this two-part model with covariates and how to evaluate the effects of the covariates on usual
intake consumption.

Key Concepts about Evaluating the Effects of Covariates on Usual Intake of a Single Episodically-Consumed Dietary
Constituent (/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Info2.htm)
How to Evaluate the Effects of Covariates on Usual Intake of a Single Episodically-Consumed Dietary Constituent (/nchs
/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm)
Download Sample Code and Datasets (/nchs/tutorials/Dietary/downloads/downloads.htm)

Page last updated: May 2, 2013
Page last reviewed: May 2, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:23 PM

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

Print Text!

Task 1: Key Concepts about Evaluating the Effects of Covariates on Usual
Intake of a Single Ubiquitously-Consumed Dietary Constituent

The goal of the analysis described in this task is to describe differences in usual intake by personal characteristics or other
covariates. These inferences are made for the mean usual intake. To illustrate, we answer the question, ”Does the mean
usual intake of calcium from food and beverages in women differ by race or ethnicity?”

Because of the measurement error that arises from the use of 24-hour recalls to measure usual intake, the model
partitions between-person from within-person variability (see Module 18 "Model Usual Intake Using Dietary Recall Data",
Task 1 for more details on measurement error). To accommodate the skewed consumption amounts, a Box-Cox
transformation is used. In particular, the model used is a mixed effects model with a random person-specific effect and a
built-in Box-Cox transformation to normality. The Box-Cox parameter (lambda) is estimated during the model fitting
procedure at the same time the covariate effects are estimated so that the best transformation is chosen after adjusting for
these effects. The person-specific effect is a latent variable that represents an individual’s tendency to eat a particular
amount of a food. Balanced Repeated Replication (BRR) (Module 18 "Model Usual Intake Using Dietary Recall Data", Task
4) is used to calculate standard errors that account for the complex sampling design of NHANES.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Info1.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

Print Text!

Task 1: How to Evaluate the Effects of Covariates on Usual Intake of a
Single Ubiquitously-Consumed Dietary Constituent

In this example, the relationship association of two covariates—race/ethnicity and age—with calcium intake from food and
beverages in adult women ages 19 years and older is modeled.

This example uses the demoadv dataset (download at Sample Code and Datasets). The variables w0304_0 to
w0304_16 are the weights (integer dietary weights and Balanced Repeated Replication [BRR] weights) used in the
analysis of 2003-2004 dietary data; the use of BRR to calculate correct standard errors is required. The model is run 17
times, including 16 runs using BRR (see Module 18 "Model Usual Intake Using Dietary Recall Data ", task 4 for more
information). BRR uses weights w0304_1 to w0304_16.

IMPORTANT NOTE

Note: if 4 years of NHANES data are used, 32 BRR runs are required.

A SAS macro is a useful technique for rerunning a block of code when the analyst only wants to change a few variables;
the macro BRR191 is created and called in this example. The BRR191 macro calls the MIXTRAN macro, and calculates
BRR standard errors of the parameter estimates. The MIXTRAN macro obtains preliminary estimates for the values of the
parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary reports of the model
fit.

Recall that modeling the complex survey structure of NHANES requires procedures that account for both differential
weighting of individuals and the correlation among sample persons within a cluster. The SAS procedure NLMIXED can
account for differential weighting by using the replicate statement. The use of BRR to calculate standard errors accounts
for the correlation among sample persons in a cluster. Therefore, NLMIXED (or any SAS procedure that incorporates
differential weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using
specialized survey procedures.

The MIXTRAN macro used in this example was downloaded from the NCI website. Version 1.1 of the macro was used.
We recommend that you check this website for macro updates before starting any analysis. Additional details regarding
the macro and additional examples also may be found on the website and in the users’ guide.

Step 1: Create a dataset so that each row corresponds to a single person day and define
indicator variables if necessary

First, select only those people with dietary data by selecting those without missing BRR weights.

data demoadv;
set nh.demoadv;
if w0304_0 ne . ;
run ;

The variables DR1TCALC and DR2TCALC are NHANES variables representing total calcium (mg) consumed on days 1
and 2 respectively from all foods and beverages (other than water). To create a dataset with 2 records per person, the
demoadv dataset is set 2 times to create 2 datasets, one where day=1 and one where day=2. The same variable name,
DRTCALC, is used for calcium on both days. This variable is created by setting it equal to DR1TCALC for day 1 and
DR2TCALC for day 2. The datasets also select women ages 19 and older.

data day1; 1/9
set demoadv;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

if riagendr= 2 and ridageyr>= 19 ;
DRTCALC=DR1TCALC;
day= 1 ;
run ;

data day2;
set demoadv;
if riagendr= 2 and ridageyr>= 19 ;
DRTCALC=DR2TCALC;
day= 2 ;
run ;

Finally, these data sets are appended, and dummy variables are created. To use the NLMIXED procedure, dummy
variables must be created (there is no CLASS statement to create dummy variables as in other SAS procedures). In this
example, the following code was used:

data calcium;
set day1 day2;
eth1=(ridreth1= 1 );
eth2=(ridreth1= 2 );
eth3=(ridreth1= 3 );
eth4=(ridreth1= 4 );
run ;

Because ridreth1 (race/ethnicity) has 5 levels, 4 dummy variables are needed. This type of programming creates a
variable called, for example, eth1 if the variable ridreth1 is equal to 1, and it is coded as 0 otherwise.

IMPORTANT NOTE

Note: if the variable you are using has missing values, these will be coded to zero using the above code. Additional code
would need to be added to set these to missing. Also, if you use the “<” symbol in SAS to create a dummy variable, note
that missing data are automatically assigned negative values of very large magnitude, so they must always considered to
be <0 and will be coded as missing.).

Step 2: Sort the dataset by respondent and day

It is important to sort the dataset by respondent and by day of the intake (day 1 and day 2) before fitting the NLMIXED
procedure because the procedure uses this information to estimate the model parameters.

Step 3: Create the BRR191 macro

The BRR191 macro calls the MIXTRAN macro and computes standard errors of parameter estimates. After creating this
macro and running it one time, it may be called multiple times, each time changing the macro variables.

Create the BRR191 Macro Explanation
Statements

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm 2/9

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

Statements Explanation

%macro BRR191(data, response, foodtype, The start of the BRR191 macro
subject, repeat, covars_prob, covars_amt, is defined. All of the terms
outlib, modeltype, lambda, seq, weekend, inside the parentheses are the
vargroup, numvargroups, subgroup, macro variables that are used
start_val1, start_val2, start_val3, in the macro.
vcontrol, nloptions, titles, printlevel,
final); Within the BRR191 macro the
%MIXTRAN (data=&data, response=&response, MIXTRAN macro is called. All
foodtype= &foodtype, subject= &subject, of the variables preceded by
repeat=&repeat, covars_prob=&covars_prob, “&” will be defined by the
covars_amt= &covars_amt, outlib=&outlib, BRR191 macro call. The only
modeltype=&modeltype, lambda=&lambda, variable without an “&” is the
replicate_var=w0304_0, seq=&seq, replicate_var macro variable; it
weekend=&weekend, vargroup= &vargroup, is set to w0304_0 for the first
numvargroups=&numvargroups, run.
subgroup=&subgroup,
This data step defines macro
start_val1=&start_val1, variables that will be used in
start_val2=&start_val2, start_val3= the next step of the macro.
&start_val3, vcontrol=&vcontrol,
nloptions=&nloptions, titles= &titles,
printlevel=&printlevel)
data _null_;

format old varA $255. ;

%let I=1; This code recreates the way

that the MIXTRAN macro
%let varamtu= %upcase (INTERCEPT &covars_amt); defines the parameter names,

%do %until ( %qscan (&varamtu,&I, %str ( ))= and makes a list of parameter
%str ()); names that are stored in the
_param_unc_&foodtype (called

%let varb&I= %qscan (&varamtu,&I, %str ( )); &old). It also counts the
number of parameters (&cnt).

%if %eval (&i) le 9 %then %let znum = "0";

%else %let znum='';

num= %eval (&i);

varA= strip( 'A' ||strip(&znum)||strip(num)||
'_' || strip( "&&varb&i." ));

old = trim(old)|| ' ' ||trim(varA);

%let I= %eval (&I+1);

%end ;

%let cnt= %eval (&I-1);

%if &covars_amt= %str () %then %let cnt=1;

call symput( 'old' ,old);
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm 3/9

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

Statements Explanation

data parms; The dataset
set & outlib.._ param_unc_&foodtype; _param_unc_&foodtype is
array old (&cnt) &old; defined in the MIXTRAN
array new (&cnt) &varamtu; macro. This data step sets the
do k= 1 to dim(new); dataset
_param_unc_&foodtype and
renames the parameters to
their variable names.

new[k]=old[k];

end;

keep &varamtu;

run; Lambda is fixed in the BRR
data _null_ runs. The lambda value from
the first run is saved in a macro
set & outlib.._ param_unc_&foodtype; variable called &lamb.

call symput ( 'lamb' ,a_lambda);

run; This code starts a loop to run
%do run= 1 %to 16 ; the 16 BRR runs.

%MIXTRAN (data=&data, response=&response, Within the BRR191 macro the
foodtype=&foodtype, subject= &subject, MIXTRAN macro is called for
repeat=&repeat, covars_prob=&covars_prob, the BRR run. All of the
covars_amt= &covars_amt, outlib=&outlib, variables preceded by “&” will
modeltype=&modeltype, lambda=&lamb, be defined by the BRR191
replicate_var=w0304_&run, seq=&seq, macro call. The only variable
weekend=&weekend, vargroup= &vargroup, without an “&” is the
numvargroups=&numvargroups, replicate_var macro variable; it
subgroup=&subgroup, start_val1=&start_val1, is set to w0304_&run where
start_val2=&start_val2, start_val3= &run=1 to 16. Notice that the
&start_val3, vcontrol=&vcontrol, &lamb from the previous
nloptions=&nloptions, titles=&titles, dataset is fixed for lambda.
printlevel= 2 )

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm 4/9

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

Statements Explanation

data _null_; This data step defines macro
format old varA var new $255. ; variables that will be used in
the next step of the macro.

%let I=1; As before, this code recreates
the way that the MIXTRAN
%do %until ( %qscan (&varamtu,&I, %str ( ))= macro defines the parameter
%str ()); names, and makes a list of
parameter names that are
%let varb&I= %qscan (&varamtu,&I, %str ( )); stored in the
_param_unc_&foodtype (called
%if %eval (&i) lt 9 %then %let znum = "0"; &old). It also creates a list of
the intercept and the other
%else %let znum= %str () ; variables in the model with the
BRR run number at the end
num= %eval (&i); (called &var).

varA= strip( 'A' ||strip(&znum)||strip(num)||
'_' || strip( "&&varb&i." ));

old = trim(old)|| ' ' ||trim(varA);

var= strip(strip( "&&varb&i." )|| '_' ||strip(
"&run" ));

new = trim(new)|| ' ' ||trim(var);

%let I= %eval (&I+1);

%end ;

call symput( 'old' ,old);

call symput( 'new' ,new); The dataset
run; _param_unc_&foodtype from
data parmsbrr; the MIXTRAN macro. This
data step sets the dataset
set & outlib.._ param_unc_&foodtype; _param_unc_&foodtype and
array old (&cnt) &old; renames the parameters to
array new (&cnt) &new; their variable names with the
BRR run number at the end.
do k= 1 to dim(new);
new[k]=old[k];
end;
keep &new;
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm 5/9

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

Statements Explanation

data parms; The point estimates of the
merge parms parmsbrr; parameters are merged with
the BRR runs.

run; After merging, the information
proc datasets nolist; delete parmsbrr; parmsbrr can be deleted.

%end ; The end of the BRR runs.
%let I=1;
This code starts a loop where
%do %until ( %qscan (&varamtu,&I, %str ( ))= the following code is evaluated
%str ()); for the intercept and the other
variables in the model one at a
%let varb&I= %qscan (&varamtu,&I, %str ( time until all variables are
)); evaluated.
data _null_;
This code creates a macro
format var call $255. ; variable with the BRR run
number appended to the
set parms; variable name.

call= "" ;

%do r= 1 %to 16 ;

var = strip(strip( "&&varb&i." )|| '_'
||strip( "&r" ));

call = strip(strip(call)|| ' '
||strip(var));

%end ;

call symput ( 'call' ,call);
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm 6/9

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

Statements Explanation

data brr; For the 16 BRR runs, the value
format variable $32. ; of the point estimate is
set parms; subtracted from the estimate of
the parameter from the BRR
array reps ( 16 ) &call; run. The standard error is
calculated.

do m= 1 to 16 ;
reps[m] = reps[m] - &&varb&i;

end;
estimate=&&varb&i;

brrse=sqrt(uss(of &call)/( 16 * .49 ));

variable= "&&varb&i" ; The datasets for each variable
keep variable estimate brrse; are appended to the dataset
run; allvars.
proc append base=allvars data=brr; The dataset brr is deleted.

proc datasets nolist; delete brr; The variable I is incremented,
run; and the end of the variable loop
%let I= %eval (&I+1); is defined.
%end ; The final dataset is defined,
data &final; and p-values are calculated.

format pvalue 6.4 ;
set allvars;
t=estimate/brrse;

pvalue= 2 *( 1 -probt(abs(t), 15 )); The final dataset is printed.
proc print; var variable estimate brrse t
pvalue; The dataset parms is deleted.

run; The end of the BRR191 macro
proc datasets nolist; delete parms; is indicated.

run;
%mend BRR191;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm 7/9

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

Step 4: Run the BRR191 macro to obtain parameter estimates for the covariates of interest from
the model used in the NCI method

Use the BRR191 macro to obtain parameter estimates. It is possible to call the BRR191 macro several times, varying the
values of the parameters each time. For example, the variables of interest could be changed. This merely requires calling
the macro again (using a call similar to that below), not redefining the macro each time.

Statements Run the BRR191 Macro
Explanation

%BRR191(data=calcium, This code calls the BRR191 macro. The dataset
response=DRTCALC,
foodtype=Calcium, calcium defined in Step 1 is used; the macro variable
subject=seqn, response for which you want to model the distribution
repeat=day, is DRTCALC. The macro variable foodtype is used to

covars_amt=ridageyr eth1 label the param dataset. The variable seqn identifies
eth2 eth3 eth4, the subject, and the macro variable repeat defines the
outlib=work,
modeltype=amount, variable that identifies the repeats on the subject,
titles= 1 , printlevel= 2 which is day. The covariates ridageyr eth1 eth2 eth3
, final=nh.m19task1) eth4 are included in the model.

The macro variable outlib specifies the library where
the data are to be stored. In this case, the working
directory, work, was used.

Because this is a ubiquitously consumed dietary
constituent, modeltype= amount is specified. This fits
the amount model.

The macro variable titles saves one line for a title
supplied by the user. The printlevel is 2, which prints
the output from the NLMIXED runs and the summary.

The variable final specifies the name of the final
dataset produced.

Step 5: Interpret parameter estimates for the covariates of interest

Depending on the print level selected, the output from each NLMIXED run will be printed in the output. The first
NLMIXED output (replicate variable w0304_0) is a listing of the point estimates for the estimation of calcium.
However, the standard errors are incorrect because they do not account for the complex sampling design of
NHANES. The other NLMIXED runs are from the BRR replications.

The estimated parameters and the BRR-based standard errors for the variables in the model are below:

Model Fitting for Calcium Intake

Variable Estimate BRRSE t p-value

INTERCEPT 17.054 0.36397 46.855 0.0000

RIDAGEYR -0.0181 0.00484 -3.7532 0.0019

ETH1 1.1710 0.39637 2.9543 0.0098

ETH2 0.2559 0.57850 0.4423 0.6646

ETH3 1.2704 0.32183 3.9475 0.0013

ETH4 -0.2537 0.36923 -0.6870 0.5026

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm 8/9

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 1

For every increase in age by 1 year, usual calcium consumption declines. This is statistically significant
(p=0.0019).

Non-hispanic Whites (ETH3) have the highest usual calcium consumption levels (p=0.0013). The reference
category for race/ethnicity is ‘Other/Multi’.

The parameter estimates are on the transformed scale (lambda=0.2618), and must be interpreted in light of this.
It may be useful to backtransform them to the original scale. For example, among consumers, the median intake is
about 178mg higher for 40 year-old Non-Hispanic Whites compared to Non-Hispanic Blacks. The
backtransformation for the Box-Cox in the model is:
backtransformed median usual intake = ((β0 + β1Age + β2Eth1 + β3Eth3 + β4Eth1 + β5Eth4) λ + 1) 1/λ
where λ (lambda) is the Box-Cox parameter, β0 is the intercept (17.0540), β1 is the parameter for age (-0.0181),
and β2 through β5 are the parameters associated with the ethnicity indicator variables. (When data are
transformed, their relative order (including the middle value) is maintained. For a symmetric distribution, the mean
and median are equivalent; therefore, when you backtransform, the backtransformation is to the median, not the
mean on the original scale, where the distribution is skewed.) To compare 40 year-old Non-Hispanic Whites to
Non-Hispanic Blacks, you calculate:
((17.0540 + -0.0181 × 40 + 1.2704) × 0.2618 + 1) 1/0.2618 − ((17.0540 + - 0.0181 x 40 + - 0.2537) × 0.2618 + 1)
1/0.2618 = 178.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task1.htm 9/9

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

Print Text!

Task 2: Evaluating the Effects of Covariates on Usual Intake of a Single
Episodically-Consumed Dietary Constituent

When a dietary component is episodically consumed, as is the case for many food groups and some food groups, a two-
part model is used to estimate: (1) the probability of the dietary component being consumed, and (2) the amount of the
dietary component consumed on a consumption day. This task describes how to fit the two-part model with covariates and
how to evaluate the effects of the covariates on usual intake consumption.

The first part of the model estimates the probability of consuming an episodically-consumed dietary constituent using
logistic regression with a person-specific random effect. The second part of the model specifies the consumption-day
amount using linear regression with a built in Box-Cox transformation, also with a person-specific random effect. Because
of the measurement error that arises from the use of 24-hour recalls to measure usual intake, the amount model partitions
between-person from within-person variability. The Box-Cox parameter (lambda) is estimated during the model fitting
procedure at the same time the covariate effects are estimated. The person-specific effects are latent variables that
represent the deviation of the individual’s probability of consumption and amount of intake from the population mean.
Because these effects are specific to individuals, they vary only between individuals; therefore, they capture the between-
person variation of usual intake in the population. The two parts of the model—probability and consumption-day amount—
are linked by allowing the two person-specific effects to be correlated and by including common covariates (e.g., age, sex)
in both parts of the model. Balanced Repeated Replication (BRR) (Module 18 "Model Usual Intake Using Dietary Recall
Data", Task 4) is used to calculate standard errors.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Info2.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

Print Text!

Task 2: How to Evaluate the Effects of Covariates on Usual Intake of a
Single Episodically-Consumed Dietary Constituent

In this example, the relationship between race/ethnicity and age on dairy intake in adult women (older than age 50 years)
is modeled.

This example uses the demoadv dataset (download at Sample Code and Datasets). The variables w0304_0 to
w0304_16 are the weights (dietary weights and Balanced Repeated Replication [BRR] weights) used in the analysis of
2003-2004 dietary data that requires the use of BRR to calculate standard errors. The model is run 17 times, including 16
runs using BRR (see Module 18 "Model Usual Intake Using Dietary Recall Data", task 4 for more information). BRR uses
weights w0304_1 to w0304_16.

IMPORTANT NOTE

Note: if 4 years of NHANES data are used, 32 BRR runs are required.

A SAS macro is a useful technique for rerunning a block of code when the analyst only wants to change a few variables;
the macro BRR192 is created and called in this example. The BRR192 macro calls the MIXTRAN macro, and calculates
BRR standard errors of the parameter estimates. The MIXTRAN macro obtains preliminary estimates for the values of the
parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary reports of the model
fit.

Recall that modeling the complex survey structure of NHANES requires procedures that account for both differential
weighting of individuals and the correlation among sample persons within a cluster. The SAS procedure NLMIXED can
account for differential weighting by using the replicate statement. The use of BRR to calculate standard errors accounts
for the correlation among sample persons in a cluster. Therefore, NLMIXED (or any SAS procedure that incorporates
differential weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using
specialized survey procedures.

The MIXTRAN macro used in this example was downloaded from the NCI website. Version 1.1 of the macro was used.
We recommend that you check this website for macro updates before starting any analysis. Additional details regarding
the macro and additional examples also may be found on the website and in the users’ guide.

Step 1: Create a dataset so that each row corresponds to a single person day and define
indicator variables if necessary

First, select only those people with dietary data by selecting those without missing BRR weights.
data demoadv;

set nh.demoadv;
if w0304_0 ne . ;
run ;

The variables d_milk_d1 and d_milk_d2 are derived variables representing total milk consumed (cup equivalents) on days
1 and 2 respectively using My Pyramid Equivalences (see Module 4 "Resources for Dietary Data Analysis" and Module 9
"Review Data and Create New Variables", Task 4). To create a dataset with 2 records per person, the demoadv dataset is
set 2 times to create 2 datasets, one where day=1 and one where day=2. The same variable name, d_milk, is used for
dairy on both days. It is created by setting it equal to d_milk_d1 for day 1 and d_milk_d2 for day 2. This code also selects
women older than age 50 years.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm 1/12

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

data day1;
set demoadv;
if riagendr= 2 and ridageyr>= 51 ;
d_milk=d_milk_d1;
day= 1 ;
run ;

data day2;
set demoadv;
if riagendr= 2 and ridageyr>= 51 ;
d_milk=d_milk_d2;
day= 2 ;
run ;

Finally, these data sets are appended, and dummy variables are created. To use the NLMIXED procedure, dummy
variables must be created (there is no CLASS statement to create dummy variables as in other SAS procedures). In this
example, the following code was used:

data calcium;
set day1 day2;
eth1=(ridreth1= 1 );
eth2=(ridreth1= 2 );
eth3=(ridreth1= 3 );
eth4=(ridreth1= 4 );
run ;

Finally, these data sets are appended, and dummy variables are created. To use the NLMIXED procedure, dummy
variables must be created (there is no CLASS statement). This example uses the following code:

data milk;
set day1 day2;
eth1=(ridreth1= 1 );
eth2=(ridreth1= 2 );
eth3=(ridreth1= 3 );
eth4=(ridreth1= 4 );
run ;

Because ridreth1 has 5 levels, 4 dummy variables are needed. This type of programming creates a variable called, for
example, eth1 if the variable ridreth1 is equal to 1, and it is coded as 0 otherwise.

IMPORTANT NOTE

Note: if the variable you are using has missing values, these will be coded to zero using the above code. Additional code
would need to be added to set these to missing. Also, if you use the “<” symbol in SAS to create a dummy variable, note
that missing data are automatically assigned negative values of very large magnitude, so they must always considered to
be <0 and will be coded as missing.).

Step 2: Sort the dataset by respondent and day 2/12

It is important to sort the dataset by respondent and day because the NLMIXED procedure uses this information to
estimate the model parameters

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

Step 3: Create the BRR192 macro

The BRR192 macro calls the MIXTRAN macro and computes standard errors of parameter estimates. After creating this
macro and running it one time, you may call it several times, each time changing the macro variables.

Create the BRR192 Macro

Statements Explanation

%macro BRR192(data, response, foodtype, The start of the BRR192 macro
subject, repeat, covars_prob, covars_amt, is defined. All of the terms
outlib, modeltype, lambda,seq, weekend, inside the parentheses are the
vargroup, numvargroups, subgroup, macro variables that are used in
start_val1, start_val2, start_val3, the macro.
vcontrol, nloptions, titles, printlevel,
final); Within the BRR192 macro the
%MIXTRAN MIXTRAN macro is called. All
of the variables preceded by “&”
(data=&data, response=&response, will be defined by the BRR192
foodtype=&foodtype, subject= &subject, macro call. The only variable
repeat=&repeat, covars_prob=&covars_prob, without an “&” is the
covars_amt= &covars_amt, outlib=&outlib, replicate_var macro variable; it
modeltype=&modeltype, lambda=&lambda, is set to w0304_0 for the first
replicate_var=w0304_0, seq=&seq, run.
weekend=&weekend, vargroup= &vargroup,
numvargroups=&numvargroups,
subgroup=&subgroup,

start_val1=&start_val1, This data step defines macro
start_val2=&start_val2, start_val3= variables that will be used in the
&start_val3, vcontrol=&vcontrol, next step of the macro.
nloptions=&nloptions, titles= &titles,
printlevel=&printlevel)
data _null_;

format old varA $255. ;

%let I=1; This code recreates the way

that the MIXTRAN macro
%let varamtu= %upcase (INTERCEPT &covars_amt); defines the parameter names,

%do %until ( %qscan (&varamtu,&I, %str ( ))= and makes a list of parameter
%str ()); names that are stored in the
_param_&foodtype (called &old)

%let varb&I= %qscan (&varamtu,&I, %str ( )); for the amount part of the
model. It also counts the

%if %eval (&i) lt 9 %then %let znum = "0"; number of parameters (&cnt).

%else %let znum= %str ();

num= %eval (&i);

varA= strip( 'A' ||strip(&znum)||strip(num)||
'_' || strip( "&&varb&i." ));

old = trim(old)|| ' ' ||trim(varA);

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm 3/12

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

%let I= %eval (&I+1);

%end ;

%let cnt= %eval (&I-1);

%if &covars_ata _null_;

format old varA $255. ;

%let I=1;

%let varamtu= %upcase (INTERCEPT &covars_amt);

%do %until ( %qscan (&varamtu,&I, %str ( ))=
%str ());

%let varb&I= %qscan (&varamtu,&I, %str ( ));

%if %eval (&i) lt 9 %then %let znum = "0";
%else %let znum= %str () ;

num= %eval (&i);

varA= strip( 'A' ||strip(&znum)||strip(num)||
'_' ||strip( "&&varb&i." ));

old = trim(old)|| ' ' ||trim(varA);

%let I= %eval (&I+1);

%end ;

%let cnt= %eval (&I-1);

%if &covars_amt= %str () %then %let cnt=1;

call symput( 'old' ,old);
run;

amt= %str () %then %let cnt=1;

call symput( 'old' ,old); The dataset
run; _param_unc_&foodtype is
data parms_amt; defined in the MIXTRAN
macro. This data step sets the
set & outlib.._ param_unc_&foodtype; dataset _param_&foodtype and
array old (&cnt) &old; renames the amount
array new (&cnt) &varamtu; parameters to their variable
names.
do k= 1 to dim(new);
new[k]=old[k];

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm 4/12

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

end;

keep &varamtu;

run; This data step defines macro
data _null_; variables that will be used in the
next step of the macro.
format oldpr varP $255. ;

%let I=1; This code recreates the way
that the MIXTRAN macro
%let varprobu= %upcase (INTERCEPT defines the parameter names,
&covars_prob); and makes a list of parameter
names that are stored in the
%do %until ( %qscan (&varprobu,&I, %str ( ))= _param_&foodtype (called &old)
%str ()); for the probability part of the
mdoel. It also counts the
%let varp&I= %qscan (&varprobu,&I, %str ( )); number of parameters (&cnt).

%if %eval (&i) lt 9 %then %let znum = "0";
%else %let znum= %str (); num= %eval (&i);

varP= strip( 'P' ||strip(&znum)||strip(num)||
'_' ||strip( "&&varp&i." ));

oldpr = trim(oldpr)|| ' ' ||trim(varP);

%let I= %eval (&I+1);

%end ;

%let cntp= %eval (&I-1);

%if &covars_amt= %str () %then %let cntp=1;

call symput( 'oldpr' ,oldpr); The dataset _param_&foodtype
run; is defined in the MIXTRAN
data parms_prob; macro. This data step sets the
dataset _param_&foodtype and
set & outlib.._ param_&foodtype; renames the probability
array old (&cntp) &oldpr; parameters to their variable
array new (&cntp) &varprobu; names.

do k= 1 to dim(new);
new[k]=old[k];
end;
keep &varprobu;
run;

*save lambda; Lambda (the Box-Cox
transformation parameter) is

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm 5/12

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

data _null_; fixed in the BRR runs. The
set & outlib.._ param_&foodtype; lambda value from the first run
call symput ( 'lamb' ,a_lambda); is saved in a macro variable
called &lamb.

run;

*start BRR runs; This code starts a loop to run
the 16 BRR runs.
%do run= 1 %to 16 ;
%MIXTRAN Within the BRR192 macro, the
MIXTRAN macro is called for
(data=&data, response=&response, the BRR run. All of the
foodtype=&foodtype, subject= &subject, variables preceded by “&” will
repeat=&repeat, covars_prob=&covars_prob, be defined by the BRR192
covars_amt= &covars_amt, outlib=&outlib, macro call. The only variable
modeltype=&modeltype, lambda=&lamb, without an “&” is the
replicate_var=w0304_&run, seq=&seq, replicate_var macro variable; it
weekend=&weekend, vargroup= &vargroup, is set to w0304_&run where
numvargroups=&numvargroups, &run=1 to 16. Notice that the
subgroup=&subgroup, start_val1=&start_val1, &lamb from the previous
start_val2=&start_val2, start_val3= dataset is fixed for lambda.
&start_val3, vcontrol=&vcontrol,
nloptions=&nloptions, titles=&titles, This data step defines macro
printlevel= 2 ) variables that will be used in the
data _null_; next step of the macro.

format old var new varA $255. ;

%let I=1; As before, this code recreates
the way that the MIXTRAN
%do %until ( %qscan (&varamtu,&I, %str ( ))= macro defines the parameter
%str ()); names, and makes a list of
parameter names that are
%let varb&I= %qscan (&varamtu,&I, %str ( )); stored in the _param_&foodtype
(called &old). It also creates a
%if %eval (&i) lt 9 %then %let znum = "0"; list of the intercept and the other
%else %let znum= %str () ; variables in the model (called
&var).
num= %eval (&i);

varA= strip( 'A' ||strip(&znum)||strip(num)||
'_' ||strip( "&&varb&i." ));

old = trim(old)|| ' ' ||trim(varA);

var= strip(strip( "&&varb&i." )|| '_' ||strip(
"&run" ));

new = trim(new)|| ' ' ||trim(var);

%let I= %eval (&I+1);

%end ;

%let cnt= %eval (&I-1); 6/12

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

%if &covars_amt= %str () %then %let cnt=1;

call symput( 'old' ,old);

call symput( 'new' ,new); The dataset _param_&foodtype
run; is from the MIXTRAN macro.
data parmsbrr_amt; This data step sets the dataset
_param_&foodtype and
set & outlib.._ param_&foodtype; renames the amount
array old (&cnt) &old; parameters to their variable
array new (&cnt) &new; names with the run number.

do k= 1 to dim(new); The point estimates of the
new[k]=old[k]; parameters are merged with the
end; BRR runs for the amount
keep &new; variables.
run; After merging, the information
data parms_amt; parmsbrr_amt can be deleted.
merge parms_amt parmsbrr_amt; The dataset _param_&foodtype
run; is from the MIXTRAN macro.
proc datasets nolist; delete parmsbrr_amt; This data step sets the dataset
_param_&foodtype and
data _null_; renames the probability
parameters to their variable
format oldpr varpr newpr varP $255. ; names.

%let I=1;

%do %until ( %qscan (&varprobu,&I, %str ( ))=
%str ());

%let varp&I= %qscan (&varprobu,&I, %str ( ));

%if %eval (&i) lt 9 %then %let znum = "0";
%else %let znum= %str () ;

num= %eval (&i);

varP= strip( 'P' ||strip(&znum)||strip(num)||
'_' ||strip( "&&varp&i." ));

oldpr = trim(oldpr)|| ' ' ||trim(varP);

varpr = strip(strip( "&&varp&i." )|| '_'
||strip( "&run" ));

newpr = trim(newpr)|| ' ' ||trim(varpr);

%let I= %eval (&I+1); 7/12

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

%end ;

%let cntp= %eval (&I-1);

%if &covars_amt= %str () %then %let cntp=1;

call symput( 'oldpr' ,oldpr);

call symput( 'newpr' ,newpr); The dataset _param_&foodtype
run; is from the MIXTRAN macro.
data parmsbrr_prob; This data step sets the dataset
_param_&foodtype and
set & outlib.._ param_&foodtype; renames the probability
array old (&cntp) &oldpr; parameters to their variable
array new (&cntp) &newpr; names with the run number.

do k= 1 to dim(new); The point estimates of the
new[k]=old[k]; parameters are merged with the
end; BRR runs for the probability
keep &newpr; variables.
run;
data parms_prob;
merge parms_prob parmsbrr_prob;
run;

proc datasets nolist; delete parmsbrr_prob; After merging, the information
parmsbrr_prob can be deleted.

%end ; The end of the BRR runs.
%let I=1;
This code starts a loop where
%do %until ( %qscan (&varamtu,&I, %str ( ))= the following code is evaluated
%str ()); for the intercept and the other
variables in the amount model
%let varb&I= %qscan (&varamtu,&I, %str ( one at a time until all variables
)); are evaluated.
data _null_;
This code creates a macro
format var call $255. ; variable with the BRR run
number appended to the
set parms; variable name.

call= "" ;

%do r= 1 %to 16 ;

var = strip(strip( "&&varb&i." )|| '_' 8/12
||strip( "&r" ));

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

call = strip(strip(call)|| ' '
||strip(var));

%end ;

call symput ( 'call' ,call);
run;

data brr_amt; For the 16 BRR runs, the value
format variable $32. ; of the point estimate is
set parms_amt; subtracted from the estimate of
the parameter from the BRR
array reps ( 16 ) &call; run. The standard error is
do m= 1 to 16 ; calculated.
reps[m] = reps[m] - &&varb&i;
end;

estimate=&&varb&i;
brrse=sqrt(uss(of &call)/( 16 * .49 ));
variable= "&&varb&i" ;
type= 'AMOUNT' ;
keep variable estimate brrse type;
run;

proc append base=amts data=brr_amt; The datasets for each variable
is appended to the dataset
amts.

proc datasets nolist; delete brr_amt; run; The dataset brr_amt is deleted.

%let I= %eval (&I+1); The variable I is incremented,
%end ; and the end of the variable loop
is defined.

%let I=1; This code starts a loop where

the following code is evaluated

%do %until ( %qscan (&varprobu,&I, %str ( ))= for the intercept and the other

%str ()); variables in the probability

model one at a time until all
%let varp&I= %qscan (&varprobu,&I, %str ( variables are evaluated.

));

data _null_; This code creates a macro
format var callp $255. ; variable with the BRR run
number appended to the
set parms_prob; variable name.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm 9/12

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

callp= "" ;

%do r= 1 %to 16 ;

var = strip(strip( "&&varp&i." )|| '_'
||strip( "&r" ));

callp = strip(strip(callp)|| ' '
||strip(var));

%end ;

call symput ( 'callp' ,callp);

run; For the 16 BRR runs, the value
data brr_prob; of the point estimate is
subtracted from the estimate of
format variable $32. ; the parameter from the BRR
set parms_prob; run. The standard error is
calculated.
array reps ( 16 ) &callp;

do m= 1 to 16 ;
reps[m] = reps[m] - &&varp&i;

end;

estimate=&&varp&i;

brrse=sqrt(uss(of &callp)/( 16 * .49 ));

variable= "&&varp&i" ;

type= 'PROB' ;

keep variable estimate brrse type;

run; The datasets for each variable
proc append base=probs data=brr_prob is appended to the dataset
probs.
p roc datasets nolist; delete brr_prob; run;
%let I= %eval (&I+1); The dataset brr_prob is deleted.
%end ;
data brr; The variable I is incremented,
format type $6. ; and the end of the variable loop
is defined.

The probability and amount
datasets are appended.

set probs amts;

run; The final dataset is printed.
proc print; var variable estimate brrse t The dataset parms is deleted.
pvalue; run;
proc datasets nolist; delete parms; run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm 10/12

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

%mend BRR192; The end of the BRR192 macro
is indicated.

Step 4: Run the BRR192 macro to obtain parameter estimates for the covariates of interest from
the model used in the NCI method

Use the BRR192 macro to obtain parameter estimates. Once the macro has been run, it is possible to call the macro
multiple times, varying the values of the parameters each time. For example, the variables of interest could be changed.
This merely requires calling the macro again (using a call similar to that below), not redefining the macro each time.

Run the BRR191 Macro

Statements Explanation

%BRR192(data=milk, This code calls the BRR192 macro. The dataset milk
response=d_milk,
foodtype=milk, defined in Step 1 is used; the macro variable response
subject=seqn, for which you want to model the distribution is d_milk.
repeat=day, The macro variable foodtype is used to label the param
covars_amt=ridageyr dataset. The variable seqn identifies the subject, and the

eth1 eth2 eth3 eth4, macro variable repeat defines the variable that identifies
covars_prob=eth1 eth2
eth3 eth4, outlib=work, the repeats on the subject, which is day. The covariates
modeltype=corr, titles= ridageyr eth1 eth2 eth3 eth4 are included in the amount
1 ,printlevel= 2 part of the model, and the covariates eth1 eth2 eth3 eth4
,final=nh.m19task2) are included in the probability part of the model.

The macro variable outlib specifies the library where the
data are to be stored. In this case, the working directory,
work, was used.

Because this is a food model, modeltype=corr is
specified. This fits the two-part model with correlated
random effects.

The macro variable titles saves 1 line for a title supplied
by the user. The printlevel is 2, which prints the output
from the NLMIXED runs and the summary.

The variable final specifies the name of the final dataset
produced.

Step 5: Interpret parameter estimates for the covariates of interest

Depending on the print level selected, the output from each NLMIXED run will be printed in the output. The first
NLMIXED output (replicate variable w0304_0) is a listing of the point estimates for the estimation of milk (see
“Results from Fitting Correlated Model” in the output file). However, the standard errors are incorrect. The other
NLMIXED runs are from the BRR replications.

The output of the parameter estimates is given below:

Model Fitting for Milk Intake

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm 11/12

12/19/2018 NHANES Dietary Web Tutorial: Evaluating the Effects of Covariates on Usual Dietary Intake: Task 2

Type Variable Estimate BRRSE t p-value
PROB INTERCEPT 1.72150 0.57116 3.01406 0.0087
PROB ETH1 0.16051 0.92495 0.17353 0.8646
PROB ETH2 0.00913 0.60280 0.01515 0.9881
PROB ETH3 0.65417 0.66277 0.98702 0.3393
PROB ETH4 -0.20045 0.66135 -0.30309 0.7660
AMOUNT INTERCEPT -1.00918 0.21332 -4.73076 0.0003
AMOUNT RIDAGEYR 0.00407 0.00243 1.67505 0.1146
AMOUNT ETH1 0.24275 0.19596 1.23880 0.2345
AMOUNT ETH2 0.21402 0.22144 0.96652 0.3491
AMOUNT ETH3 0.22668 0.11118 2.03890 0.0595
AMOUNT ETH4 -0.24727 0.17871 -1.38363 0.1867

None of the race/ethnicity variables are significantly associated with the probability to consume milk. The variable
ETH3, which represents Non-Hispanic Whites, is marginally associated with the amount of milk consumed on the
consumption day (p=0.0595). The reference category for race/ethnicity is ‘Other/Multi’.

Age is positively associated with the amount of milk that is usually consumed; however, it is not statistically
significant (p=0.1146).

The parameter estimates for probability may be interpreted as log odds ratios as in logistic regression.

The parameter estimates for amount are on the transformed scale (lambda=0.2782), and must be interpreted in
light of this. It may be useful to backtransform them to the original scale. The backtransformation of the Box-Cox
transformation for the model is:
backtransformed median usual intake = ((β0 + β1Age + β2Eth1 + β3Eth3 + β4Eth1 + β5Eth4) λ + 1) 1/λ
where λ (lambda) is the Box-Cox parameter, β0 is the intercept (-1.00918), β1 is the parameter for age (0.00407),
and β2 through β5 are the parameters associated with the ethnicity indicator variables. (When data are
transformed, their relative order (including the middle value) is maintained. For a symmetric distribution, the mean
and median are equivalent; therefore, when you backtransform, the backtransformation is to the median, not the
mean on the orignal scale, where the distribution is skewed.) For example, for Non-Hispanic Whites age 40 years,
compared to the Other/Multi group, you calculate:
((-1.00918 + 0.00407 × 40 + 0.22668) × 0.2782 + 1) 1/0.2782 − ((-1.00918 + 0.00407 × 40) × 0.28782 + 1) 1/0.2782
= 0.1255
and conclude that the consumption day median intake is about 1/8 cup equivalent higher for 40 year-old Non-
Hispanic Whites compared to the Other/Multi group.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EvaluateCovariates/Task2.htm 12/12

NHANES Dietary Web Data Tutorial - Estimating Population-Level Dist... https://www.cdc.gov/nchs/tutorials/dietary/advanced/EstimateDistribution...

Estimating Population-Level Distributions of Usual Dietary Intake

Purpose

The term “dietary intake” in this module includes foods and beverages reported on the 24-hour recalls. Researchers often are
interested in estimating the distribution of usual intake of dietary components for a population or subpopulation. This module
focuses on using the method developed by researchers at NCI and elsewhere (i.e., the “NCI method”) for this purpose,
comparing and contrasting to other methods, as appropriate. It also directs interested users to references on the Iowa State
University (ISU) and National Research Council (NRC) methods. Due to the different statistical properties of distributions for
ubiquitiously-consumed dietary constituents (e.g., nutrients and food groups consumed on a daily or almost daily basis), and
episodically-consumed dietary constituents (e.g., nutrients and food groups that are not consumed every day for more than
about 5% of the population), different models are fit for ubiquitously-consumed and episodically-consumed dietary
constituents. In the NCI method, the model for ubiquitously-consumed dietary constituents is a simple case of the model for
episodically-consumed foods.

IMPORTANT NOTE
Many of the statistical methods used in this course are advanced, and may require consultation with a statistician. For modules
18-22, it is required that you have the statistical knowledge of mixed effects models, and program knowledge of calling in SAS
macros. Since Module 18 provides the background information for Modules 19-22, it is advised that you carefully read Module
18 first before tackling other modules.

Task 1: Estimating Distributions of Usual Intake for a Single Ubiquitously-consumed
Dietary Constituent for One Population or Subpopulation

This task describes the use of statistical methods to estimate the distribution of usual intake for one ubiquitously-consumed
dietary constituent, such as a nutrient that is consumed daily, for a population or for one subpopulation. It also describes the
steps analysts should take to calculate this estimation using the NCI method specifically.

Key Concepts about Estimating Distributions of Usual Intake for a Single Ubiquitously-consumed Dietary Constituent
(/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Info1.htm)
How to Estimate the Distribution of Usual Intake for a Single Ubiquitously-consumed Dietary Constituent for One
Population or Subpopulation using the NCI Method (/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm)
Download Sample Code and Datasets (/nchs/tutorials/Dietary/downloads/downloads.htm)

Task 2: Estimating Distributions of Usual Intake for a Single Ubiquitously-consumed
Dietary Constituent for Two or more Subpopulations using a Covariate

This task describes the use of statistical methods, for subpopulations defined by a covariate, to estimate the distribution of
usual intake for one ubiquitously-consumed dietary constituent, such as a nutrient that is consumed daily, using a covariate to
define the subpopulation. It also describes the steps analysts should take to calculate this estimation using the NCI method.

Key Concepts about Estimating Distributions of Usual Intake for a Single Ubiquitously-consumed Dietary Constituent with
a Few Days of 24-hour Recalls for Subpopulations using a Covariate (/nchs/tutorials/Dietary/Advanced/EstimateDistributions
/Info2.htm)
How to Estimate Distributions of Usual Intake for a Single Ubiquitously-consumed Dietary Constituent with a Few Days of
24-hour Recalls for Subpopulations using a Covariate (/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm)
Download Sample Code and Datasets (/nchs/tutorials/Dietary/downloads/downloads.htm)

Task 3: Estimating Distributions of Usual Intake for a Single Episodically-consumed
Dietary Constituents

This task describes the use of a two-part model for estimating usual intake of a single episodically-consumed dietary
constituent, such as a food group that is not consumed on a daily basis, and how to estimate the distribution of usual intake of a
single episodically-consumed dietary component using the NCI method.

Key Concepts about Estimating Distributions of Usual Intake for a Single Episodically-consumed Dietary Constituent
(/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Info3.htm)
How to Estimate Distributions of Usual Intake for a Single Episodically-consumed Dietary Constituent using the NCI
Method (/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task3.htm)

1 of 2 1/14/2019, 9:23 PM

NHANES Dietary Web Data Tutorial - Estimating Population-Level Dist... https://www.cdc.gov/nchs/tutorials/dietary/advanced/EstimateDistribution...

Download Sample Code and Datasets (/nchs/tutorials/Dietary/downloads/downloads.htm)

Task 4: Estimating Population Distributions of Ratios of Usual Intakes of Two Dietary
Constituents that are Ubiquitously Consumed

This task describes the use of a model for estimating the population distribution of the ratio of usual intakes of two
ubiquitously-consumed dietary constituents, such as a nutrient and energy, and how to estimate this ratio using the NCI
method.

Key Concepts about Estimating Distributions of Usual Intake for the Ratio of Two Ubiquitously-consumed Dietary
Constituents (/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Info4.htm)
How to Estimate Distributions of Usual Intake for the Ratio of Two Ubiquitously-consumed Dietary Constituent Using the
NCI Method (/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm)
Download Sample Code and Datasets (/nchs/tutorials/Dietary/downloads/downloads.htm)

Page last updated: July 20, 2011
Page last reviewed: July 20, 2011
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

2 of 2 1/14/2019, 9:23 PM

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Print Text!

Task 1: Key Concepts about Estimating Distributions of Usual Intake for a
Single Ubiquitously-consumed Dietary Constituent

Because they measure intake only on a single day, measures of usual intake from 24-hour recalls are prone to
measurement error. Using a simple average of a few days does not adequately represent usual intake. Thus, more
sophisticated methods based on statistical modeling are necessary. All of the statistical methods that have been developed
make the assumption that the 24-hour recall is prone to random, not systematic error. For estimating ubiquitously-
consumed dietary constituents, these methods must meet the following challenges. They must:

A. Distinguish within-person from between-person variation, and
B. Account for consumption-day amounts that are positively skewed.

Four statistical methods have been developed to estimate the distribution of ubiquitously-consumed dietary constituents:
the National Research Council (NRC) method (National Research Council, 1986), the method developed at Iowa State
University (ISU method) (Nusser et al., 1996), a simplification of the ISU method called the Best Power Method (Dodd,
1996), and the NCI method (Tooze, 2006). All of these methods meet challenges A and B.

The four methods differ in terms of the methods for transforming the data, estimating the distribution of usual intake based
on the estimated variance of usual intake, and backtransforming the data. In particular, the ISU method uses a
sophisticated procedure to transform the data to approximate normality. The other methods use a more simple power
transformation; the NCI method uses the Box-Cox transformation estimated within the statistical model. The NRC, ISU,
and Best Power methods use a shrinkage estimator to estimate the distribution usual intake for the population.

Because it may include covariates in the model and may accommodate episodically-consumed dietary constituents, the
NCI method uses a Monte Carlo procedure to estimate the distribution of usual intake. In this procedure, 100 realizations
of usual intake are generated for each person using the estimated distribution of usual intake from the statistical model.
This value is added to the linear predictor for each person, and then backtransformed to the original scale. The NRC
method does not use a transformation to adjust the backtransformed mean to the mean on the original scale as the other
methods do. For more details on the methods see Module 18 Task 2 and Dodd et al (2006).

The macros to fit the NCI method may be downloaded from the NCI website. Software for fitting the ISU method is
available from the Center for Survey Statistics and Methodology at Iowa State University.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Info1.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Print Text!

Task 1: How to Estimate the Distribution of Usual Intake for a Single
Ubiquitously-consumed Dietary Constituent for One Population or
Subpopulation using the NCI Method

The following example shows how the distribution of calcium from foods and beverages can be estimated for women ages
19 years and older.

This example uses the demoadv dataset (download at Sample Code and Datasets). The variables w0304_0 to
w0304_16 are the weights (dietary weights and Balanced Repeated Replication [BRR] weights) used in the analysis of
2003-2004 dietary data that require the use of BRR to calculate standard errors. The model is run 17 times, including 16
runs using BRR (see Module 18, Task 4 for more information). BRR uses weights w0304_1 to w0304_16.

IMPORTANT NOTE

Note: If 4 years of NHANES data are used, 32 BRR runs are required. Additional weights are found in the demoadv
dataset.

A SAS macro is a useful technique for rerunning a block of code when you want only to change a few variables; the macro
BRR201 is created and called in this example. The BRR201 macro calls the MIXTRAN macro and the DISTRIB macro,
and calculates BRR standard errors of the parameter estimates. The MIXTRAN macro obtains preliminary estimates for
the values of the parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary
reports of the model fit.

Modeling the complex survey structure of NHANES requires procedures that account for both differential weighting of
individuals and the correlation among sample persons within a cluster. The SAS procedure NLMIXED can account for
differential weighting by using the replicate statement. The use of BRR to calculate standard errors accounts for the
correlation among sample persons in a cluster. Therefore, NLMIXED (or any SAS procedure that incorporates differential
weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using specialized
survey procedures. The DISTRIB macro estimates the distribution of usual intake, producing estimates of percentiles and
the percent of the population below a cutpoint.

IMPORTANT NOTE

Note that the DISTRIB macro currently requires that at least 2 cutpoints be requested in order to calculate the percent of
the population below a cutpoint.

The effect of the sequence of the 24-hour recall is removed from the estimated nutrient intake distribution (Day 1 or Day 2
24-hour recall). An adjustment is also made for day of the week the 24-hour recall was collected, dichotomized as
weekend (Friday-Sunday) or weekday (Monday-Thursday). (See Module 18, Task 3 for more information on covariate
adjustment.) BRR (Module 18, Task 4) is used to calculate standard errors.

The MIXTRAN and DISTRIB macros used in this example were downloaded from the NCI website. Version 1.1 of the
macros was used. Check this website for macro updates before starting any analysis. Additional details regarding the
macros and additional examples also may be found on the website.

Step 1: Create a dataset so that each row corresponds to a single person day and define
variables if necessary

Statements Explanation

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm 1/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Statements Explanation

data demoadv; First, select only those people with dietary data by selecting

set nh.demoadv; those without missing BRR weights.

if w0304_0 ne . ;

run ;

data day1; The variables DR1TCALC and DR2TCALC are NHANES

set demoadv; variables representing total calcium consumed on days 1 and 2,

if riagendr= 2 and respectively, from all foods and beverages (other than water).

ridageyr>= 19 ;

DRTCALC=DR1TCALC; To create a dataset with 2 records per person, the demoadv

day= 1 ; dataset is set 2 times to create 2 datasets, one where day=1

run ; and one where day=2. The same variable name, DRTCALC, is

used for calcium on both days. It is created by setting it equal

to DR1TCALC for day 1 and DR2TCALC for day 2. Adult

data day2; women ages 19 years and older are selected for analysis.

set demoadv;

if riagendr= 2 and

ridageyr>= 19 ;
DRTCALC=DR2TCALC;

day= 2 ;

run ;

data calcium; Finally, these data sets are appended, and day of the week

set day1 day2; dummy variables are created. To use the NLMIXED procedure,
if DAY_WK in ( 1 , dummy variables must be created (there is no CLASS
statement).
6 , 7 ) then

weekend= 1 ;

else if DAY_WK
in ( 2 , 3 , 4 , 5 )
then weekend= 0 ;
run ;

Step 2: Sort the dataset by respondent and day

It is important to sort the dataset by respondent and intake day (day 1 and 2) because the NLMIXED procedure uses this
information to estimate the model parameters.

Step 3: Create the BRR201 macro

The BRR201 macro calls the MIXTRAN macro and DISTRIB macro and computes standard errors of parameter
estimates. After creating this macro and running it 1 time, it may be called several times, each time changing the macro
variables.

Statements Explanation

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm 2/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Statements Explanation

%include This code reads the MIXTRAN and

'C:\NHANES\Macros\mixtran_macro_v1.1.sas' DISTRIB macros into SAS so that
; these macros may be called.

%include
'C:\NHANES\Macros\distrib_macro_v1.1.sas'
;

%macro BRR201(data, response, foodtype, The start of the BRR201 macro is
subject, repeat, covars_prob, covars_amt, defined. All of the terms inside the
outlib, pred, param, modeltype, lambda,
seq, weekend, vargroup, numvargroups parentheses are the macro
,subgroup, start_val1, start_val2, variables that are used in the
start_val3, vcontrol, nloptions, titles, macro.

printlevel, cutpts, ncutpts, nsim_mc,
byvar, final);

%MIXTRAN (data=&data, response=&response, Within the BRR201 macro, the
foodtype= &foodtype, subject=&subject,
repeat=&repeat, covars_prob= MIXTRAN macro is called. All of
&covars_prob, covars_amt=&covars_amt, the variables preceded by & will be
outlib=&outlib, modeltype=&modeltype, defined by the BRR201 macro call.
lambda=&lambda, replicate_var= w0304_0, The only variable without an & is

seq=&seq, weekend=&weekend, the replicate_var macro variable; it
vargroup=&vargroup,
numvargroups=&numvargroups, is set to w0304_0 for the first run.

subgroup=&subgroup,
start_val1=&start_val1,
start_val2=&start_val2,
start_val3=&start_val3,
vcontrol=&vcontrol, nloptions=
&nloptions, titles=&titles,
printlevel=&printlevel)

%DISTRIB (seed= 0 , nsim_mc=&nsim_mc, Within the BRR201 macro, the

modeltype=&modeltype, pred= &pred, param= DISTRIB macro is called. All of the
&param, outlib=&outlib, cutpoints= variables preceded by & will be
&cutpts, ncutpnt=&ncutpts, byvar=&byvar, defined by the BRR201 macro call.
subgroup= &subgroup, subject=&subject,
titles=&titles, food= &foodtype); The seed for generating the
distribution has been set to 0, which

will use the clock to randomly start

a sequence. The datasets defined

by the macro variables pred and

param (_pred_unc_&foodtype and

_param_unc_&foodtype) are

created in the MIXTRAN run.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm 3/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Statements Explanation

data dist; The dataset
descript_&foodtype_w0304_0 is
set & outlib..d defined in the DISTRIB macro.
escript_&foodtype._w0304_0; This data step keeps the
parameters of interest from that
mergeby= 1 ; dataset and defines a variable
keep &subgroup mergeby numsubjects mergeby that will be used later.
mean_mc_t tpercentile1-tpercentile99
cutprob1-cutprob&& ncutpts. mergeby;

run;

%do run= 1 %to 16 ; This code starts a loop to run the 16
BRR runs.

options nonotes; Notes are turned off to save room in
the log.

%put ~~~~~~~~~~~~~~~~~~~ Run &run The run number is printed to the
~~~~~~~~~~~~~~~~~~~~; log.

%MIXTRAN (data=&data, response=&response, Within the BRR201 macro, the
foodtype=&foodtype, subject=&subject,
repeat=&repeat, MIXTRAN macro is called. All of
covars_prob=&covars_prob, the variables preceded by & will be
covars_amt=&covars_amt, outlib=&outlib, defined by the BRR201 macro call.
modeltype=&modeltype, lambda=&lambda, The only variable without an & is

replicate_var=w0304_&run, seq=&seq, the replicate_var macro variable; it
weekend=&weekend, vargroup=&vargroup,
numvargroups=&numvargroups, is set to w0304_&run where &run
subgroup=&subgroup, equals 1 to 16.

start_val1=&start_val1, start_val2=
&start_val2, start_val3=&start_val3,
vcontrol=&vcontrol,

nloptions=&nloptions, titles=&titles, Within the BRR201 macro, the
printlevel= &printlevel) DISTRIB macro is called. All of the
%DISTRIB (seed= 0 , nsim_mc=&nsim_mc, variables preceded by & will be
modeltype=&modeltype, pred=&pred, defined by the BRR201 macro call.
param=&param, utlib=&outlib, The seed for generating the
cutpoints=&cutpts, ncutpnt=&ncutpts, distribution has been set to 0, which
byvar=&byvar, subgroup=&subgroup, will use the clock to randomly start
subject=&subject, titles=&titles, a sequence. The datasets defined
food=&foodtype); by the macro variables pred and
param (_pred_unc_&foodtype and
_param_unc_&foodtype) are
created in the MIXTRAN run.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm 4/10

Pages:

Click to View FlipBook Version