The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by smlneyman, 2019-01-16 01:35:47

NHANES Dietary Web Tutorial_397 pages

NHANES Dietary Web Tutorial_397 pages

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Statements Explanation

data distbrr; The dataset
set & outlib..d descript_&foodtype_w0304_&run is
escript_&foodtype._w0304_&run; defined in the DISTRIB macro.
rename numsubjects=bnumsubjects This data step keeps the
mean_mc_t=bmean_mc_t tpercentile1- parameters of interest from that
tpercentile99=btpercentile1- dataset and renames the variables.
btpercentile99 It defines a variable mergeby that
will be used later.

cutprob1-cutprob&& ncutpts. =bcutprob1-
bcutprob&& ncutpts. ;

run=&run;

mergeby= 1 ;

data distbrr;

set distbrr; The BRR datasets are appended
into a dataset called brr_runs.
keep &subgroup bnumsubjects bmean_mc_t
btpercentile1-btpercentile99 bcutprob1- After appending the information to
bcutprob&& ncutpts. mergeby; brr_runs, distbrr can be deleted.

run; The BRR runs end.
proc append base=brr_runs data=distbrr;
run; The data are sorted before
proc datasets nolist; delete distbrr; merging.
run;

%end ;
proc sort data=dist; by &subgroup
mergeby;

proc sort data=brr_runs; by &subgroup
mergeby;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm 5/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Statements Explanation

data distall; The datasets brr_runs and distbrr

merge dist brr_runs; by &subgroup are merged, and the squared
mergeby; difference between the BRR
estimate and the parameter from

array bvar (*) bmean_mc_t btpercentile1- the first run are created.
btpercentile99 bcutprob1-bcutprob&&

ncutpts. ;

array varo (*) mean_mc_t tpercentile1-
tpercentile99 cutprob1-cutprob&&
ncutpts. ;

array dsqr (*) dbmean_mc_t
dbtpercentile1-dbtpercentile99
dbcutprob1-dbcutprob&& ncutpts. ;

do i= 1 to dim(bvar);

dsqr[i]=(bvar[i]-varo[i])** 2 ;

end;

run; by The sum of squares is computed.
proc means data=distall sum;
&subgroup mergeby;

var dbmean_mc_t dbtpercentile1-
dbtpercentile99 dbcutprob1-dbcutprob&&
ncutpts. ;

output out=sums sum= sum_dbmean_mc_t

sum_dbtpercentile1-sum_dbtpercentile99

sum_dbcutprob1-sum_dbcutprob&& ncutpts. ;
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm 6/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Statements Explanation

data brr; The standard errors are computed.
set sums; Each SE is multiplied by -1 to make
array sumt (*) sum_dbmean_mc_t it print out in parentheses in the
final step.

sum_dbtpercentile1-sum_dbtpercentile99

sum_dbcutprob1-sum_dbcutprob&& ncutpts. ;
array se (*) se_mean_mc_t
se_tpercentile1-se_tpercentile99

se_cutprob1-se_cutprob&& ncutpts. ;

do j= 1 to dim(sumt);

se[j]=- 1 *sqrt((sumt[j])/( 16 * .49 ));
end;

mergeby= 1 ;

keep se_mean_mc_t se_tpercentile1-
se_tpercentile99 se_cutprob1-se_cutprob&&
ncutpts. &subgroup mergeby;

run;

data toprint1; To create the final dataset, the point

set dist; estimates are saved in a file called
toprint1. The variable line will

line= 1 ; * These are the point estimates; identify them as estimates.

keep &subgroup numsubjects mean_mc_t
tpercentile1-tpercentile99 cutprob1-
cutprob&& ncutpts. line;

run; The standard errors are saved in a
dataset called toprint2. The
data toprint2; variable line will identify them as
set brr; standard errors.
line= 2 ;
keep &subgroup mean_mc_t tpercentile1- The final dataset is created by
tpercentile99 appending toprint1 and toprint2.
cutprob1-cutprob&& ncutpts. line;
run; The final dataset is sorted.

data &final;
set toprint1 toprint2;
run;

proc sort data=&final;
by &subgroup line;
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm 7/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Statements Explanation

proc print data=&final split= ' ' noobs; The final dataset is printed. The
format negparen will make the
var &subgroup line tpercentile5
standard errors print in
tpercentile10 tpercentile25 tpercentile50 parentheses.

tpercentile75 tpercentile90
tpercentile95;

format line line. mean_mc_t
tpercentile1-tpercentile99 negparen10.1
cutprob1-cutprob&& ncutpts. negparen6.2 ;

title 'Usual Intake of Calcium' ;

title2 'NHANES 2003-04' ; The end of the BRR201 macro is
indicated.
run;
%mend BRR201;

Step 4: Run the BRR201 macro to obtain parameter estimates for the covariates of interest from
the model used in the NCI Method

Use the BRR201 macro to obtain parameter estimates. It is possible to call the BRR201 macro several times, varying the
values of the parameters each time. For example, the variables of interest could be changed. This merely requires calling
the macro again (using a call similar to that below), not redefining the macro each time.

Statements Explanation

%BRR201(data=calcium, This code calls the BRR201 macro.
response=DRTCALC, foodtype=Calcium, The dataset calcium defined in Step 1 is
subject=seqn, used; the macro variable response for
repeat=day, which we want to model the distribution
seq=day, is DRTCALC. The macro variable
covars_amt=, foodtype is used to label the pred and
weekend=weekend, param datasets. The variable seqn
outlib=work, identifies the subject, and the macro
pred = work._pred_unc_Calcium, variable repeat defines the variable that
param = work._param_unc_Calcium, identifies the repeats on the subject,
modeltype=amount, which is day. No covariates are
titles= 1 , included in the model, although they
printlevel= 2 , could be specified with the covars_amt
cutpts= 500 1000 1500 , macro variable.
ncutpts= 3 ,
nsim_mc= 100 , final=nh.m20task1) The weekend macro variable includes a
weekend effect in the model, and it
calculates the distribution by 4/7 for
weekdays and 3/7 for weekends. It
must be set equal to a variable called
weekend in the dataset.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm 8/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Statements Explanation

The macro variable outlib specifies the
library where the data are to be stored.
In this case, the working directory, work,
was used. It is important to note that
the macro variables pred and param
must specify the outlib directory, and
they must use foodtype to identify the
food modeled. Because the example
presented here is a ubiquitously-
consumed dietary constituent and the
amount model was used, the dataset
from MIXTRAN has the term _unc_ as
part of the dataset name for pred and
param.

Because this is a ubiquitously-
consumed dietary constituent,
modeltype= amount is specified. This
fits the amount model.

The macro variable titles saves 1 line
for a title supplied by the user. The
printlevel is 2, which prints the output
from the NLMIXED runs and the
summary.

By specifying the cutpoints (cutpts) of
500, 1000, and 1500 mg, the macro will
produce an estimate of the proportion of
the population below these values.
Because there are 3 cutpoints, this is
specified in the ncutpts macro variable.

IMPORTANT NOTE

Note that the DISTRIB macro currently
requires that at least 2 cutpoints be
requested in order to calculate the
percent of the population below a
cutpoint.

The macro variable nsim_mc is used to
specify the number of pseudo-
individuals for which the distribution is
simulated per respondent.

The variable final specifies the name of
the final dataset produced.

Step 5: Interpret parameter estimates for the variable of interest 9/10

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 1

Depending on the printlevel selected, the output from each NLMIXED run will be printed in the output. The first
NLMIXED output (replicate variable w0304_0) is a listing of the point estimates for the estimation of calcium.
However, the standard errors are incorrect. (The standard errors from BRR should be used.) The other NLMIXED
runs are from the BRR replications. Percentile estimates are also printed for the base run and the BRR runs.
Selected percentiles and standard errors are printed at the end of the output.

The median (i.e., 50th percentile) of calcium intake for women ages 19 years and older is 692 mg (SE=25 mg).
Only 5% of women consume more than 1,296 mg of calcium.
Recall that the cutpoints were 500 mg, 1000 mg, and 1500 mg; thus 83% of women consume less than 1000 mg of
calcium per day.

IMPORTANT NOTE

Note: Your results may vary slightly, as a random seed is used to estimate the distribution of usual intake. However, they
would not be expected to vary by more than 1%.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task1.htm 10/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Print Text!

Task 2: Key Concepts about Estimating Distributions of Usual Intake for a
Single Ubiquitously-consumed Dietary Constituent with a Few Days of 24-
hour Recalls for Subpopulations using a Covariate

Because the dietary assessment queries intake only on a single day or a few days, measures of usual intake from 24-hour
recalls are prone to measurement error. Using a simple average of 2 days does not adequately represent usual intake.
Thus, more sophisticated methods based on statistical modeling are necessary. All of the statistical methods that have
been developed make the assumption that the 24-hour recall is prone to random, not systematic error. For estimating
ubiquitously-consumed dietary constituents, they must meet the following challenges. The methods must:

A. Distinguish within-person from between-person variation, and
B. Account for consumption-day amounts that are positively skewed.

The methods developed before the NCI method required stratification to estimate the distribution of usual intake for a
subpopulation (see Module 18, Task 2 for more details on these methods). Because it is able to accommodate covariates
in the statistical model, the NCI method can provide more efficient estimates (i.e., smaller standard errors) of the
distribution of usual intake for a subpopulation than these other methods compared to fully stratified models that do not
share parameters. This Task addresses the estimation of subpopulations defined by a covariate. This method could be
used when estimating usual intake for multiple groups is of interest. When just one subpopulation is of interest, the
methods described in Task 1 could be used. Balanced Repeated Replication (Module 18, Task 2) is used to calculate
standard errors that are corrected for the complex sampling design of NHANES.

The macros to fit the NCI method may be downloaded from the NCI website.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Info2.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Print Text!

Task 2: How to Estimate Distributions of Usual Intake for a Single
Ubiquitously-consumed Dietary Constituent with a Few Days of 24-hour
Recalls for Subpopulations using a Covariate

The following example shows how the distribution of calcium from foods and beverages can be estimated for women by
age group (at or younger vs. older than age 50 years).

This example uses the demoadv dataset (download at Sample Code and Datasets). The variables w0304_0 to
w0304_16 are the weights (dietary weights and Balanced Repeated Replication [BRR] weights) used in the analysis of
2003-2004 dietary data that require the use of BRR to calculate standard errors. The model is run 17 times, including 16
runs using BRR (see Module 18, Task 4 for more information). BRR uses weights w0304_1 to w0304_16.

IMPORTANT NOTE

Note: If 4 years of NHANES data are used, 32 BRR runs are required. Additional weights are found in the demoadv
dataset.

The effect of the sequence of the 24-hour recall is removed from the estimated nutrient intake distribution (Day 1 or Day 2
24-hour recall). An adjustment is also made for day of the week the 24-hour recall was collected, dichotomized as
weekend (Friday-Sunday) or weekday (Monday-Thursday). (See Module 18, Task 3 for more information on covariate
adjustment.)

A SAS macro is a useful technique for rerunning a block of code when you want to change only a few variables; the macro
BRR202 is created and called in this example. The BRR202 macro calls the MIXTRAN macro and the DISTRIB macro,
and calculates BRR standard errors of the parameter estimates. The MIXTRAN macro obtains preliminary estimates for
the values of the parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary
reports of the model fit.

Modeling the complex survey structure of NHANES requires procedures that account for both differential weighting of
individuals and the correlation among sample persons within a cluster. The SAS procedure NLMIXED can account for
differential weighting by using the replicate statement. The use of BRR to calculate standard errors accounts for the
correlation among sample persons in a cluster. Therefore, NLMIXED (or any SAS procedure that incorporates differential
weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using specialized
survey procedures. The DISTRIB macro estimates the distribution of usual intake, producing estimates of percentiles and
the percent of the population below a cutpoint.

IMPORTANT NOTE

Note that the DISTRIB macro currently requires that at least 2 cutpoints be requested in order to calculate the percent of
the population below a cutpoint.

The MIXTRAN and DISTRIB macros used in this example were downloaded from the NCI website. Version 1.1 of the
macros was used. Check this website for macro updates before starting any analysis. Additional details regarding the
macros and additional examples also may be found on the website.

Step 1: Create a dataset so that each row corresponds to a single person day and define
variables if necessary

Statements Explanation

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 1/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Statements Explanation

data demoadv; First, select only those people with dietary data by selecting

format sel sel. ; those without missing BRR weights. The variable sel is created
set nh.demoadv; to identify the subgroups of interest, those who are 50 years
if w0304_0 ne . old or younger, and those who are older than 50 years.
; /* keeps
only those with

dietary data */

if ridageyr ge

51 then sel= 1 ;

else sel= 2 ;

run ;

data day1; The variables DR1TCALC and DR2TCALC are NHANES

set demoadv; variables representing total calcium consumed on days 1 and

if riagendr= 2 and 2, respectively, from all foods and beverages (other than
ridageyr>= 19 ; water).
DRTCALC=DR1TCALC;

day= 1 ; To create a dataset with 2 records per person, the demoadv

run ; dataset is set 2 times to create 2 datasets, one where day=1

and one where day=2. The same variable name, DRTCALC,

is used for calcium on both days. It is created by setting it

data day2; equal to DR1TCALC for day 1 and DR2TCALC for day 2.

set demoadv; Adult women ages 19 years and older are selected for

if riagendr= 2 and analysis.

ridageyr>= 19 ;
DRTCALC=DR2TCALC;

day= 2 ;

run ;

data calcium; Finally, these data sets are appended, and day of the week

set day1 day2; dummy variables are created. To use the NLMIXED
if DAY_WK in ( 1 , procedure, dummy variables must be created (there is no
CLASS statement).
6 , 7 ) then

weekend= 1 ;
/** should be
named

'weekend'**/

else if DAY_WK

in ( 2 , 3 , 4 , 5 )

then weekend= 0 ;

run ;

Step 2: Sort the dataset by respondent and day

It is important to sort the dataset by respondent and intake day (day 1 and 2) because the NLMIXED procedure uses this
information to estimate the model parameters.

Step 3: Create the BRR202 macro

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 2/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

The BRR202 macro calls the MIXTRAN macro and DISTRIB macro and computes standard errors of parameter
estimates. After creating this macro and running it 1 time, it may be called several times, each time changing the macro
variables.

Statements Explanation

%include This code reads the MIXTRAN and

'C:\NHANES\Macros\mixtran_macro_v1.1.sas' DISTRIB macros into SAS so that
; these macros may be called.

%include
'C:\NHANES\Macros\distrib_macro_v1.1.sas'
;

%macro BRR202(data, response, foodtype, The start of the BRR202 macro is
subject, repeat, covars_prob, covars_amt, defined. All of the terms inside the
outlib, pred, param, modeltype, lambda,
seq, weekend, vargroup, numvargroups parentheses are the macro
,subgroup, start_val1, start_val2, variables that are used in the
start_val3, vcontrol, nloptions, titles, macro.

printlevel, cutpts, ncutpts, nsim_mc,
byvar, final);

%MIXTRAN (data=&data, response=&response, Within the BRR202 macro, the
foodtype= &foodtype, subject=&subject,
repeat=&repeat, covars_prob= MIXTRAN macro is called. All of
&covars_prob, covars_amt=&covars_amt, the variables preceded by & will be
outlib=&outlib, modeltype=&modeltype, defined by the BRR202 macro call.
lambda=&lambda, replicate_var= w0304_0, The only variable without an & is

seq=&seq, weekend=&weekend, the replicate_var macro variable; it
vargroup=&vargroup,
numvargroups=&numvargroups, is set to w0304_0 for the first run.

subgroup=&subgroup,
start_val1=&start_val1,
start_val2=&start_val2,
start_val3=&start_val3,
vcontrol=&vcontrol, nloptions=
&nloptions, titles=&titles,
printlevel=&printlevel)

%DISTRIB (seed= 0 , nsim_mc=&nsim_mc, Within the BRR202 macro, the

modeltype=&modeltype, pred= &pred, param= DISTRIB macro is called. All of the
&param, outlib=&outlib, cutpoints= variables preceded by & will be
&cutpts, ncutpnt=&ncutpts, byvar=&byvar,
subgroup= &subgroup, subject=&subject, defined by the BRR202 macro call.
titles=&titles, food= &foodtype); The seed for generating the
distribution has been set to 0, which

will use the clock to randomly start

a sequence. The datasets defined

by the macro variables pred and

param (_pred_unc_&foodtype and

_param_unc_&foodtype) are

created in the MIXTRAN run.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 3/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Statements Explanation

data dist; The dataset
descript_&foodtype_w0304_0 is
set & outlib..d defined in the DISTRIB macro.
escript_&foodtype._w0304_0; This data step keeps the
parameters of interest from that
mergeby= 1 ; dataset and defines a variable
keep &subgroup mergeby numsubjects mergeby that will be used later.
mean_mc_t tpercentile1-tpercentile99
cutprob1-cutprob&& ncutpts. mergeby;

run;

%do run= 1 %to 16 ; This code starts a loop to run the 16
BRR runs.

options nonotes; Notes are turned off to save room in
the log.

%put ~~~~~~~~~~~~~~~~~~~ Run &run The run number is printed to the
~~~~~~~~~~~~~~~~~~~~; log.

%MIXTRAN (data=&data, response=&response, Within the BRR202 macro, the
foodtype=&foodtype, subject=&subject,
repeat=&repeat, covars_prob=&covars_prob, MIXTRAN macro is called. All of
covars_amt=&covars_amt, outlib=&outlib,
modeltype=&modeltype, lambda=&lambda, the variables preceded by & will be
defined by the BRR202 macro call.

replicate_var=w0304_&run, seq=&seq, The only variable without an & is

weekend=&weekend, vargroup=&vargroup, the replicate_var macro variable; it
numvargroups=&numvargroups,
subgroup=&subgroup, is set to w0304_&run where &run
start_val1=&start_val1, start_val2= equals 1 to 16.

&start_val2, start_val3=&start_val3,
vcontrol=&vcontrol,
nloptions=&nloptions, titles=&titles,
printlevel= &printlevel)

%DISTRIB (seed= 0 , nsim_mc=&nsim_mc, Within the BRR202 macro, the
modeltype=&modeltype, pred=&pred, DISTRIB macro is called. All of the
param=&param, utlib=&outlib, variables preceded by & will be
cutpoints=&cutpts, ncutpnt=&ncutpts, defined by the BRR202 macro call.
byvar=&byvar, subgroup=&subgroup, The seed for generating the
subject=&subject, titles=&titles, distribution has been set to 0, which
food=&foodtype); will use the clock to randomly start
a sequence. The datasets defined
by the macro variables pred and
param (_pred_unc_&foodtype and
_param_unc_&foodtype) are
created in the MIXTRAN run.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 4/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Statements Explanation

data distbrr; The dataset
set & outlib..d descript_&foodtype_w0304_&run is
escript_&foodtype._w0304_&run; defined in the DISTRIB macro.
rename numsubjects=bnumsubjects This data step keeps the
mean_mc_t=bmean_mc_t tpercentile1- parameters of interest from that
tpercentile99=btpercentile1- dataset and renames the variables.
btpercentile99 It defines a variable mergeby that
will be used later.

cutprob1-cutprob&& ncutpts. =bcutprob1-
bcutprob&& ncutpts. ;

run=&run;

mergeby= 1 ;

data distbrr;

set distbrr; The BRR datasets are appended
into a dataset called brr_runs.
keep &subgroup bnumsubjects bmean_mc_t
btpercentile1-btpercentile99 bcutprob1- After appending the information to
bcutprob&& ncutpts. mergeby; brr_runs, distbrr can be deleted.

run; The BRR runs end.
proc append base=brr_runs data=distbrr;
run; The data are sorted before
proc datasets nolist; delete distbrr; merging.
run;

%end ;
proc sort data=dist; by &subgroup
mergeby;

proc sort data=brr_runs; by &subgroup
mergeby;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 5/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Statements Explanation

data distall; The datasets brr_runs and distbrr

merge dist brr_runs; by &subgroup are merged, and the squared
mergeby; difference between the BRR
estimate and the parameter from

array bvar (*) bmean_mc_t btpercentile1- the first run are created.
btpercentile99 bcutprob1-bcutprob&&

ncutpts. ;

array varo (*) mean_mc_t tpercentile1-
tpercentile99 cutprob1-cutprob&&
ncutpts. ;

array dsqr (*) dbmean_mc_t
dbtpercentile1-dbtpercentile99
dbcutprob1-dbcutprob&& ncutpts. ;

do i= 1 to dim(bvar);

dsqr[i]=(bvar[i]-varo[i])** 2 ;

end;

run; by The sum of squares is computed.
proc means data=distall sum;
&subgroup mergeby;

var dbmean_mc_t dbtpercentile1-
dbtpercentile99 dbcutprob1-dbcutprob&&
ncutpts. ;

output out=sums sum= sum_dbmean_mc_t

sum_dbtpercentile1-sum_dbtpercentile99

sum_dbcutprob1-sum_dbcutprob&& ncutpts. ;
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 6/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Statements Explanation

data brr; The standard errors are computed.
set sums; Each SE is multiplied by -1 to make
array sumt (*) sum_dbmean_mc_t it print out in parentheses in the
final step.

sum_dbtpercentile1-sum_dbtpercentile99

sum_dbcutprob1-sum_dbcutprob&& ncutpts. ;
array se (*) se_mean_mc_t
se_tpercentile1-se_tpercentile99

se_cutprob1-se_cutprob&& ncutpts. ;

do j= 1 to dim(sumt);

se[j]=- 1 *sqrt((sumt[j])/( 16 * .49 ));
end;

mergeby= 1 ;

keep se_mean_mc_t se_tpercentile1-
se_tpercentile99 se_cutprob1-se_cutprob&&
ncutpts. &subgroup mergeby;

run;

data toprint1; To create the final dataset, the point

set dist; estimates are saved in a file called
toprint1. The variable line will

line= 1 ; * These are the point estimates; identify them as estimates.

keep &subgroup numsubjects mean_mc_t
tpercentile1-tpercentile99 cutprob1-
cutprob&& ncutpts. line;

run; The standard errors are saved in a
dataset called toprint2. The
data toprint2; variable line will identify them as
set brr; standard errors.
line= 2 ;
keep &subgroup mean_mc_t tpercentile1- The final dataset is created by
tpercentile99 appending toprint1 and toprint2.
cutprob1-cutprob&& ncutpts. line;
run; The final dataset is sorted.

data &final;
set toprint1 toprint2;
run;

proc sort data=&final;
by &subgroup line;
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 7/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Statements Explanation

proc print data=&final split= ' ' noobs; The final dataset is printed. The
format negparen will make the
var &subgroup line tpercentile5
standard errors print in
tpercentile10 tpercentile25 tpercentile50 parentheses.

tpercentile75 tpercentile90
tpercentile95;

format line line. mean_mc_t
tpercentile1-tpercentile99 negparen10.1
cutprob1-cutprob&& ncutpts. negparen6.2 ;

title 'Usual Intake of Calcium' ;

title2 'NHANES 2003-04' ; The end of the BRR202 macro is
indicated.
run;
%mend BRR202;

Step 4: Run the BRR202 macro to obtain parameter estimates for the covariates of interest from
the model used in the NCI Method

Use the BRR202 macro to obtain parameter estimates. It is possible to call the BRR202 macro several times, varying the
values of the parameters each time. For example, the variables of interest could be changed. This merely requires calling
the macro again (using a call similar to that below), not redefining the macro each time.

IMPORTANT NOTE
This is the same macro used in Task 1 of this module. Only the call is different.

Statements Explanation

%BRR202(data=calcium, This code calls the BRR202 macro.
response=DRTCALC, The dataset calcium defined in Step 1 is
foodtype=Calcium, used; the macro variable response for
subject=seqn, which we want to model the distribution
repeat=day, is DRTCALC. The macro variable
seq=day, covars_amt=sel, foodtype is used to label the pred and
subgroup=sel, param datasets. The variable seqn
weekend=weekend, outlib=work, identifies the subject, and the macro
pred = work._pred_unc_Calcium, variable repeat defines the variable that
param = work._param_unc_Calcium, identifies the repeats on the subject,
modeltype=amount, which is day. The covars_amt macro
titles= 1 , variable needs to include the
printlevel= 2 , subpopulation, which is defined by sel.
cutpts= 500 1000 1500 , The macro variable subgroup also
ncutpts= 3 , identifies the subpopulation.
nsim_mc= 100 ,
final=nh.m20task2) The weekend macro variable includes a
weekend effect in the model, and it
calculates the distribution by 4/7 for

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 8/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Statements Explanation

weekdays and 3/7 for weekends. It
must be set equal to a variable called
weekend in the dataset.

The macro variable outlib specifies the
library where the data are to be stored.
In this case, the working directory, work,
was used. It is important to note that
the macro variables pred and param
must specify the outlib directory, and
they must use foodtype to identify the
food or nutrient modeled. Because this
is a ubiquitously-consumed dietary
constituent and the amount model was
used, the dataset from MIXTRAN has
the term _unc_ as part of the dataset
name for pred and param.

Because this is a ubiquitously-
consumed dietary constituent,
modeltype= amount is specified. This
fits the amount model.

The macro variable titles saves 1 line
for a title supplied by the user. The
printlevel is 2, which prints the output
from the NLMIXED runs and the
summary.

By specifying the cutpoints (cutpts) of
500, 1000, and 1500 mg, the macro will
produce an estimate of the proportion of
the population below these values.
Because there are 3 cutpoints, this is
specified in the ncutpts macro variable.

IMPORTANT NOTE

Note that the DISTRIB macro currently
requires that at least 2 cutpoints be
requested in order to calculate the
percent of the population below a
cutpoint.

The macro variable nsim_mc is used to
specify the number of pseudo-
individuals for which the distribution is
simulated per respondent.

The variable final specifies the name of
the final dataset produced.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 9/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 2

Step 5: Interpret parameter estimates for the variable of interest

Depending on the printlevel selected, the output from each NLMIXED run will be printed in the output. The first
NLMIXED output (replicate variable w0304_0) is a listing of the parameter estimates for calcium. However, the
standard errors are incorrect. (The standard errors from BRR should be used.) The other NLMIXED runs are from
the BRR replications. Percentile estimates also are printed for the base run and the BRR runs.
Selected percentiles and standard errors are printed at the end of the output by the subpopulation variable
(age>=51 years).

The median of calcium intake for women ages 19 years and older is 693 mg (SE=25 mg), as we estimated
previously. For women ages 51 years and older, the median usual intake is 661 mg (SE=22 mg), and for women
younger than age 51 years, the median is 717 mg (SE=31 mg).
Recall that the cutpoints were 500 mg, 1000 mg, and 1500 mg. 85% of women older than age 50 years consume
less than 1000mg of calcium.
If the subgroups are run separately (by stratification) rather than using the subgroup option, your results will be
different because the subgroup option estimates parameters based on the full group using subgroup categories in
the model.

IMPORTANT NOTE

Note: Your results may vary slightly, as a random seed is used to estimate the distribution of usual intake. However, they
would not be expected to vary by more than 1%.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task2.htm 10/10

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 3

Print Text!

Task 3: Key Concepts about Estimating Distributions of Usual Intake for a
Single Episodically-consumed Dietary Constituent

Because the dietary assessment queries intake only on a single day or a few days, measures of usual intake from 24-hour
recalls are prone to measurement error. Using a simple average of 2 days does not adequately represent usual intake,
particularly for episodically-consumed foods and nutrients. Thus, more sophisticated methods based on statistical
modeling are necessary. All of the statistical methods that have been developed to estimate usual intake from 24-hour
recalls make the assumption that the 24-hour recall is prone to random, not systematic error (see Module 18, Task 2 for
more details on these methods).

Two statistical methods have been developed to model the distribution of usual intake of episodically-consumed foods,
adjusting for measurement error. These are the method developed at Iowa State University for foods (ISUF method), and
the method developed at the National Cancer Institute (NCI method). Both methods operate under the premise that usual
intake is equal to the probability of consumption on a given day times the average amount consumed on a consumption
day.

To estimate the usual intake of episodically-consumed dietary constituents, a method must meet several challenges.
These include the challenges for ubiquitously consumed dietary constituents plus additional challenges. It must:

A. Distinguish within-person from between-person variation,
B. Account for consumption-day amounts that are positively skewed, and
C. Account for reported days without consumption of the dietary constituent.

Challenge C is met by estimating the probability of consumption as well as the amount consumed on a consumption day.
For most episodically-consumed foods, there is a positive correlation between the probability and amount. Therefore, a
method must meet the following additional challenge:

D. Allow for the correlation between the probability of consuming a dietary constituent and the consumption-
day amount.

The ISUF method meets challenges A-C, but cannot be used for foods for which probability of consumption is correlated
with amount, and it does not incorporate covariates.

This task describes the use of the NCI method to model the distribution of usual intake of episodically-consumed dietary
constituents. It involves a two-part model with correlated person-specific effects. The first part of the model estimates the
probability of consuming an episodically-consumed dietary constituent using logistic regression with a person-specific
random effect. The second part of the model specifies the consumption-day amount using linear regression on a
transformed scale, also with a person-specific random effect. Parts I and II are then linked by allowing the two person-
specific effects to be correlated and by including common covariates in both parts of the model.

The macros to fit the NCI method may be downloaded from the NCI website.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Info3.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 3

Print Text!

Task 3: How to Estimate Distributions of Usual Intake for a Single
Episodically-consumed Dietary Constituent using the NCI Method

In this example, the distribution of milk intake is estimated for women.

This example uses the demoadv dataset (download at Sample Code and Datasets). The variables w0304_0 to
w0304_16 are the weights (dietary weight and Balanced Repeated Replication [BRR] weights) used in the analysis of
2003-2004 dietary data that requires the use of BRR to calculate standard errors. The model is run 17 times, including 16
runs using BRR (see Module 18, task 4 for more information). BRR uses weights w0304_1 to w0304_16. (Note: if 4 years
of NHANES data are used, 32 BRR runs are required. Additional weights are found in the demoadv dataset.)

IMPORTANT NOTE

Note: If 4 years of NHANES data are used, 32 BRR runs are required. Additional weights are found in the demoadv
dataset.

uld not be expected to vary by more than 1%.

The effect of the sequence of the 24-hour recall is removed from the estimated nutrient intake distribution (Day 1 or Day 2
24-hour recall). An adjustment is also made for day of the week the 24-hour recall was collected, dichotomized as
weekend (Friday-Sunday) or weekday (Monday-Thursday). (See Module 18, Task 3 for more information on covariate
adjustment.)

A SAS macro is a useful technique for rerunning a block of code when you want to change only a few variables; the macro
BRR203 is created and called in this example. The BRR203 macro calls the MIXTRAN macro and the DISTRIB macro,
and calculates BRR standard errors of the parameter estimates. The MIXTRAN macro obtains preliminary estimates for
the values of the parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary
reports of the model fit.

Modeling the complex survey structure of NHANES requires procedures that account for both differential weighting of
individuals and the correlation among sample persons within a cluster. The SAS procedure NLMIXED can account for
differential weighting by using the replicate statement. The use of BRR to calculate standard errors accounts for the
correlation among sample persons in a cluster. Therefore, NLMIXED (or any SAS procedure that incorporates differential
weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using specialized
survey procedures. The DISTRIB macro estimates the distribution of usual intake, producing estimates of percentiles and
the percent of the population below a cutpoint.

IMPORTANT NOTE
Note that the DISTRIB macro currently requires that at least 2 cutpoints be requested in order to calculate the percent of
the population below a cutpoint.

The MIXTRAN and DISTRIB macros used in this example were downloaded from the NCI website. Version 1.1 of the
macros was used. Check this website for macro updates before starting any analysis. Additional details regarding the
macros and additional examples also may be found on the website.

Step 1: Create a dataset so that each row corresponds to a single person day and define
variables if necessary

Statements Explanation

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task3.htm 1/8

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 3

Statements Explanation

data demoadv; First, select only those people with dietary data by selecting

set nh.demoadv; those without missing BRR weights.

if w0304_0 ne .

;

run ;

data day1; The variables d_milk_d1 and d_milk_d2 are created variables

set demoadv; representing total milk consumed on days 1 and 2 respectively

if riagendr= 2 and from all foods and beverages. (Note: the total milk
ridageyr>= 19 ; consumption for each day can be computed based on
d_milk=d_milk_d1; MyPyramid Equivalences Database. See Module 4, Task 2 and

day= 1 ; Module 9, Task 4 of the dietary tutorial for more information.)

run ;

To create a dataset with 2 records per person, the demoadv

dataset is set 2 times to create 2 datasets, one where day=1

data day2; and one where day=2. The same variable name, d_milk, is

set demoadv; used for milk on both days. It is created by setting it equal to

d_milk_d1 for day 1 and d_milk_d2 for day 2. Adult women
if riagendr= 2 and ages 19 years and older are selected for analysis.
ridageyr>= 19 ;
d_milk=d_milk_d2;

day= 2 ;

run ;

data milk; Finally, these data sets are appended, and day of the week

set day1 day2; dummy variables are created. To use the NLMIXED
if DAY_WK in ( 1 , procedure, dummy variables must be created (there is no
CLASS statement).
6 , 7 ) then

weekend= 1 ;
/** should be
named

'weekend'**/

else if DAY_WK

in ( 2 , 3 , 4 , 5 )

then weekend= 0 ;

run ;

Step 2: Sort the dataset by respondent and day

It is important to sort the dataset by respondent and intake day (day 1 and 2) because the NLMIXED procedure uses this
information to estimate the model parameters.

Step 3: Create the BRR203 macro

The BRR203 macro calls the MIXTRAN macro and DISTRIB macro and computes standard errors of parameter
estimates. After creating this macro and running it 1 time, it may be called several times, each time changing the macro
variables.

Statements Explanation

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task3.htm 2/8

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 3

Statements Explanation

%include This code reads the MIXTRAN and

'C:\NHANES\Macros\mixtran_macro_v1.1.sas' DISTRIB macros into SAS so that
; these macros may be called.

%include
'C:\NHANES\Macros\distrib_macro_v1.1.sas'
;

%macro BRR203(data, response, foodtype, The start of the BRR203 macro is
subject, repeat, covars_prob, covars_amt,
outlib, pred, param, modeltype, lambda, defined. All of the terms inside the
seq, weekend, vargroup, numvargroups parentheses are the macro
,subgroup, start_val1, start_val2, variables that are used in the

start_val3, vcontrol, nloptions, titles, macro.
printlevel, cutpts, ncutpts, nsim_mc,
byvar, final);

%MIXTRAN (data=&data, response=&response, Within the BRR203 macro, the
foodtype= &foodtype, subject=&subject,
repeat=&repeat, covars_prob= MIXTRAN macro is called. All of
&covars_prob, covars_amt=&covars_amt, the variables preceded by & will be
outlib=&outlib, modeltype=&modeltype, defined by the BRR203 macro call.
lambda=&lambda, replicate_var= w0304_0, The only variable without an & is

seq=&seq, weekend=&weekend, the replicate_var macro variable; it
vargroup=&vargroup,
numvargroups=&numvargroups, is set to w0304_0 for the first run.

subgroup=&subgroup,
start_val1=&start_val1,
start_val2=&start_val2,
start_val3=&start_val3,
vcontrol=&vcontrol, nloptions=
&nloptions, titles=&titles,
printlevel=&printlevel)

%DISTRIB (seed= 0, nsim_mc=&nsim_mc, Within the BRR203 macro, the

modeltype=&modeltype, pred= &pred, param= DISTRIB macro is called. All of the
&param, outlib=&outlib, cutpoints= variables preceded by & will be
&cutpts, ncutpnt=&ncutpts, byvar=&byvar,
subgroup= &subgroup, subject=&subject, defined by the BRR203 macro call.
titles=&titles, food= &foodtype); The seed for generating the
distribution has been set to 0, which

will use the clock to randomly start

a sequence. The datasets defined

by the macro variables pred and

param (_pred_unc_&foodtype and

_param_unc_&foodtype) are

created in the MIXTRAN run.

data dist; The dataset
set & outlib..d descript_&foodtype_w0304_0 is
escript_&foodtype._w0304_0; defined in the DISTRIB macro.
mergeby= 1; This data step keeps the
keep &subgroup mergeby numsubjects parameters of interest from that
mean_mc_t tpercentile1-tpercentile99 dataset and defines a variable
cutprob1-cutprob&& ncutpts. mergeby; mergeby that will be used later.
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task3.htm 3/8

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 3

Statements Explanation

%do run= 1 %to 16; This code starts a loop to run the 16
BRR runs.

options nonotes; Notes are turned off to save room in
the log.

%put ~~~~~~~~~~~~~~~~~~~ Run &run The run number is printed to the
~~~~~~~~~~~~~~~~~~~~; log.

%MIXTRAN (data=&data, response=&response, Within the BRR203 macro, the
foodtype=&foodtype, subject=&subject,
repeat=&repeat, covars_prob=&covars_prob, MIXTRAN macro is called. All of
covars_amt=&covars_amt, outlib=&outlib, the variables preceded by & will be
modeltype=&modeltype, lambda=&lambda, defined by the BRR203 macro call.

replicate_var=w0304_&run, seq=&seq, The only variable without an & is

weekend=&weekend, vargroup=&vargroup, the replicate_var macro variable; it
numvargroups=&numvargroups,
subgroup=&subgroup, is set to w0304_&run where &run
start_val1=&start_val1, start_val2= equals 1 to 16.

&start_val2, start_val3=&start_val3,
vcontrol=&vcontrol, nloptions=&nloptions,
titles=&titles, printlevel=
&printlevel)

%DISTRIB (seed= 0, nsim_mc=&nsim_mc, Within the BRR203 macro, the
modeltype=&modeltype, pred=&pred, DISTRIB macro is called. All of the
param=&param, utlib=&outlib, variables preceded by & will be
cutpoints=&cutpts, ncutpnt=&ncutpts, defined by the BRR203 macro call.
byvar=&byvar, subgroup=&subgroup, The seed for generating the
subject=&subject, titles=&titles, distribution has been set to 0, which
food=&foodtype); will use the clock to randomly start
a sequence. The datasets defined
by the macro variables pred and
param (_pred_unc_&foodtype and
_param_unc_&foodtype) are
created in the MIXTRAN run.

data distbrr; The dataset
set & outlib..d descript_&foodtype_w0304_&run is
escript_&foodtype._w0304_&run; defined in the DISTRIB macro.
rename numsubjects=bnumsubjects This data step keeps the
mean_mc_t=bmean_mc_t tpercentile1- parameters of interest from that
tpercentile99=btpercentile1- dataset and renames the variables.
btpercentile99 It defines a variable mergeby that
cutprob1-cutprob&& ncutpts. =bcutprob1- will be used later.
bcutprob&& ncutpts. ;
run=&run; The BRR datasets are appended
mergeby= 1; into a dataset called brr_runs.
data distbrr;
set distbrr;
keep &subgroup bnumsubjects bmean_mc_t
btpercentile1-btpercentile99 bcutprob1-
bcutprob&& ncutpts. mergeby;
run;

proc append base=brr_runs data=distbrr;
run;

proc datasets nolist; delete distbrr; After appending the information to
run; brr_runs, distbrr can be deleted.

%end ; The BRR runs end.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task3.htm 4/8

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 3

Statements Explanation

proc sort data=dist; by &subgroup The data are sorted before
mergeby; merging.
proc sort data=brr_runs; by &subgroup
mergeby;

data distall; The datasets brr_runs and distbrr
merge dist brr_runs; by &subgroup
mergeby; are merged, and the squared
array bvar (*) bmean_mc_t btpercentile1- difference between the BRR
btpercentile99 bcutprob1-bcutprob&& estimate and the parameter from

ncutpts. ; the first run are created.

array varo (*) mean_mc_t tpercentile1-
tpercentile99 cutprob1-cutprob&&

ncutpts. ;
array dsqr (*) dbmean_mc_t
dbtpercentile1-dbtpercentile99

dbcutprob1-dbcutprob&& ncutpts. ;

do i= 1 to dim(bvar);

dsqr[i]=(bvar[i]-varo[i])** 2;
end;
run;

proc means data=distall sum; by The sum of squares is computed.
&subgroup mergeby;
var dbmean_mc_t dbtpercentile1-
dbtpercentile99 dbcutprob1-dbcutprob&&
ncutpts. ;
output out=sums sum= sum_dbmean_mc_t
sum_dbtpercentile1-sum_dbtpercentile99
sum_dbcutprob1-sum_dbcutprob&& ncutpts. ;
run;

data brr; The standard errors are computed.
set sums; Each SE is multiplied by -1 to make
array sumt (*) sum_dbmean_mc_t it print out in parentheses in the
sum_dbtpercentile1-sum_dbtpercentile99 final step.
sum_dbcutprob1-sum_dbcutprob&& ncutpts. ;
array se (*) se_mean_mc_t
se_tpercentile1-se_tpercentile99

se_cutprob1-se_cutprob&& ncutpts. ;

do j= 1 to dim(sumt);

se[j]=- 1*sqrt((sumt[j])/( 16 * .49));
end;
mergeby= 1;
keep se_mean_mc_t se_tpercentile1-
se_tpercentile99 se_cutprob1-se_cutprob&&
ncutpts. &subgroup mergeby;
run;

data toprint1; To create the final dataset, the point
set dist; estimates are saved in a file called
line= 1; * These are the point estimates; toprint1. The variable line will
keep &subgroup numsubjects mean_mc_t identify them as estimates.
tpercentile1-tpercentile99 cutprob1-

cutprob&& ncutpts. line;
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task3.htm 5/8

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 3

Statements Explanation

data toprint2; tpercentile1- The standard errors are saved in a
set brr; dataset called toprint2. The
line= 2; variable line will identify them as
keep &subgroup mean_mc_t standard errors.
tpercentile99

cutprob1-cutprob&& ncutpts. line;
run;

data &final; The final dataset is created by
set toprint1 toprint2; appending toprint1 and toprint2.
run;

proc sort data=&final; The final dataset is sorted.
by &subgroup line;
run;

proc print data=&final split= ' ' noobs; The final dataset is printed. The
format negparen will make the
var &subgroup line tpercentile5
standard errors print in
tpercentile10 tpercentile25 tpercentile50 parentheses.

tpercentile75 tpercentile90
tpercentile95;

format line line. mean_mc_t
tpercentile1-tpercentile99 negparen10.1
cutprob1-cutprob&& ncutpts. negparen6.2 ;

title 'Usual Intake of Calcium' ;

title2 'NHANES 2003-04' ; The end of the BRR203 macro is
indicated.
run;
%mend BRR203;

Step 4: Run the BRR203 macro to obtain parameter estimates for the covariates of interest from
the model used in the NCI Method

To obtain parameter estimates, the BRR203 macro may be used. It is possible to call the BRR203 macro several times,
varying the values of the parameters each time. For example, the variables of interest could be changed. This merely
requires calling the macro again (using a call similar to that below), not redefining the macro each time.

IMPORTANT NOTE
This is the same macro used in Task 1 and Task 2 of this module. Only the call is different.

Statements Explanation

%BRR203(data=milk, response=d_milk, This code calls the BRR203 macro. The
foodtype=milk, dataset milk defined in Step 1 is used;
subject=seqn, the macro variable response for which
repeat=day, we want to model the distribution is
seq=day, d_milk. The macro variable foodtype is
covars_prob, used to label the pred and param
covars_amt=,

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task3.htm 6/8

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 3

Statements Explanation

subgroup=, datasets. The variable seqn identifies
weekend=weekend, the subject, and the macro variable
outlib=work, repeat defines the variable that identifies
pred = work._pred_milk, the repeats on the subject, which is day.
param = work._param_milk, No covariates or subpopulation are
modeltype=corr, included in the probability or amount
titles= 1 , model, although they could be specified
printlevel= 2 , with the covars_prob, covars_amt and/or
cutpts= 0.5 1.0 1.5 , subgroup macro variables, respectively.
ncutpts= 3 , nsim_mc= 100 ,
final=nh.m20task3) The weekend macro variable includes a
weekend effect in the model, and it
calculates the distribution by 4/7 for
weekdays and 3/7 for weekends. It must
be set equal to a variable called
weekend in the dataset.

The macro variable outlib specifies the
library where the data are to be stored.
In this case, the working directory, work,
was used. It is important to note that the
macro variables pred and param must
specify the outlib directory, and they
must use foodtype to identify the food
modeled.

Because this is a food model,
modeltype=corr is specified. This fits the
two-part model with correlated random
effects.

The macro variable titles saves 1 line for
a title supplied by the user. The
printlevel is 2, which prints the output
from the NLMIXED runs and the
summary.

By specifying the cutpoints (cutpts) of
0.5, 1.0, and 1.5 cups, the macro will
produce an estimate of the proportion of
the population below these values.
Because there are 3 cutpoints, this is
specified in the ncutpts macro variable.

IMPORTANT NOTE 7/8

Note that the DISTRIB macro currently
requires that at least 2 cutpoints be
requested in order to calculate the
percent of the population below a
cutpoint.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task3.htm

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 3

Statements Explanation

The macro variable nsim_mc is used to
specify the number of pseudo-individuals
for which the distribution is simulated per
respondent.

The variable final specifies the name of
the final dataset produced.

Step 5: Interpret parameter estimates for the variable of interest

Depending on the printlevel selected, the output from each NLMIXED run will be printed in the output. The first
NLMIXED output (replicate variable w0304_0) is a listing of the parameter estimates for the estimation of milk (see
‘Results from Fitting Correlated Model’). However, the standard errors are incorrect. (The standard errors from BRR
should be used.) The other NLMIXED runs are from the BRR replications. Percentile estimates also are printed for
the base run and the BRR runs.
Selected percentiles and standard errors are printed at the end of the output.

The median of usual milk intake for women ages 19 years and older is 0.5 cup equivalents (SE=0.1 cup).
Recall that the cutpoints were for 0.5, 1.0, and 1.5 cups of milk per day. From cutpoint 2, we can see that 77%
(SE=3%) of women ages 19 years older consume less than 1.0 cup of milk per day.
IMPORTANT NOTE
Note: Your results may vary slightly, as a random seed is used to estimate the distribution of usual intake. However, they
would not be expected to vary by more than 1%.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task3.htm 8/8

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Print Text!

Task 4: Key Concepts about Estimating Population Distributions of Ratios
of Usual Intakes of Two Dietary Constituents that are Ubiquitously
Consumed

With daily intake data, such as that collected on 24-hour recalls, the ratio of dietary components may be constructed in one
of two ways:

1. The ratio of intakes may be constructed for each day and the “usual” such ratio determined, or
2. The “usual” intake for each component may be computed, and the ratio of these usual intakes calculated.

The first way is termed the “usual ratio of intakes”, and the second the “ratio of usual intakes.” (For more information about
ratios, see Module 15 and Freedman et al. 2010a in the Key References page). The NCI method may be used to calculate
either method.

The usual ratio of intakes may be computed by calculating the ratio of the dietary components of interest each day, and
applying the methods for a single ubiquitously consumed dietary component, as described in Module 20, Task 1. The ratio
of usual intakes can be estimated by modeling both ubiquitously consumed nutrients simultaneously in a bivariate model.
Modeling the ratio of usual intakes will be described in the remainder of this Task.

The method for estimating the ratio of usual intakes of two ubiquitously-consumed dietary components is similar to the
approach used for the NCI method for estimating a single ubiquitously-consumed dietary component (see Module 20, Task
1), except that the two dietary components are modeled simultaneously. The model assumes that, after appropriate Box-
Cox transformations, the 24-hour recall reported values for the two dietary components follow a bivariate linear mixed
effects model. The model includes an intercept and covariates for each dietary component, plus two person-specific
random effects, one for each of the two ubiquitously-consumed dietary components. The two person-specific effects are
assumed to have a bivariate normal distribution and be independent of the person-specific terms.

In this task, we reproduce an analysis of the ratio of saturated fat to energy, described in Freedman et al. (2010a). This
task demonstrates an alternative method to estimate Box-Cox transformation parameters. In Task 1 of this module, the
Box-Cox parameter was estimated simultaneously with the other model parameters using maximum likelihood estimation
for fitting a nonlinear model; the nonlinearity occurred due to the inclusion of the Box-Cox parameter in the model.

In this task, we choose the Box-Cox transformation that minimizes the mean squared error around a straight line fit to a
weighted QQ plot, using the sampling survey weights of each participant, before fitting the model. After choosing the Box-
Cox parameter and transforming the 24-hour recalls, the other parameters can be estimated using a bivariate linear mixed
effects model. (Because nonlinear mixed models can sometimes be numerically unstable, the “transform to linearity”
approach could potentially produce stable estimates when the maximum likelihood approach fails.)

In the bivariate linear mixed effects model, there are two random person-specific effects, corresponding to each
ubiquitously-consumed dietary component, which have a joint normal distribution, and two within-person random errors,
corresponding to each each ubiquitously-consumed dietary component, which also have a joint normal distribution, and
are independent of the random effects, and independent across repeats.

The model for each nutrient includes an indicator for whether the reported day was a weekday or a weekend, an indicator
for the sequence number (first versus second) of the report, indicators for each 5-year age group, a person-specific
random effect and a within-subject error term. For children, an extra covariate for sex was included and, for adults, men
and women were analyzed separately. In this model, the covariate for age group was included to allow estimation of
subpopulations by age, similar to the model fit in Module 20, Task 2. (See Module 18, Task 3 for more information on
covariate adjustment.)

As in the NCI method for single nutrients, Monte Carlo simulations are generated using the parameter values estimated
from the bivariate model. The ratio of usual intakes is calculated for each pseudo-individual generated from the Monte
Carlo simulations, and using the sampling weights, the percentiles of the distribution of the ratio of usual intakes are
estimated. The balanced repeated replication method (BRR, see Module 18, Task 4) is used to estimate standard errors
that account for the complex sampling of NHANES.

The method described in Freedman et al (2010a) was extended to estimate the distribution of the HEI-2005 score. This
requires an extra calculation to convert the ratio to the HEI-2005 score in the Monte-Carlo step of the procedure.

The macros to fit the NCI method may be downloaded from the NCI website.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Info4.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Info4.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Print Text!

Task 4: How to Estimate Distributions of Usual Intake for a Single
Episodically-consumed Dietary Constituent using the NCI Method

In this example, the distribution of the percent of usual energy intake from saturated fat in men will be estimated.

This example uses the demoadv dataset (download at Sample Code and Datasets). The variables w0304_0 to
w0304_16 are the weights (dietary weights and Balanced Repeated Replication [BRR] weights) used in the analysis of
2003-2004 dietary data that requires the use of BRR to calculate standard errors. The model is run 17 times, including 16
runs using BRR (see Module 18, task 4 for more information). BRR uses weights w0304_1 to w0304_16. (Note: if 4 years
of NHANES data are used, 32 BRR runs are required. Additional weights are found in the demoadv dataset.)

IMPORTANT NOTE

Note: If 4 years of NHANES data are used, 32 BRR runs are required. Additional weights are found in the demoadv
dataset.

uld not be expected to vary by more than 1%.

A SAS macro is a useful technique for rerunning a block of code when you want to change only a few variables; the
macros rununi, runbivar, and rundist are created and called in this example. The rununi macro calls the
NLMixed_Univariate macro and the BoxCox_Survey macro, and fits a univariate model. The runbivar macro calls the
NLMixed_Bivariate macro and fits a bivariate model. Finally, the rundist macro calls the Distrib_Bivariate and
Percentiles_Survey macros and estimates the distribution of the ratio of the ubiquitously-consumed dietary constituent to
energy.

The NLMixed_Univariate, BoxCox_Survey, NLMixed_Bivariate, Distrib_Bivariate and Percentiles_Survey macros used in
this example were downloaded from the NCI website. Version 1.0 of the macros was used. Check this website for macro
updates before starting any analysis. Additional details regarding the macros and additional examples may also be found
on the website.

Step 1: Create a dataset so that each row corresponds to a single person day

Statements Explanation

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 1/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

data adultm; The demoadv dataset is used. Men
ages 19 years and older are selected.
format agegrp ageg. ; The ordinal variable agegrp is created.
Next, dummy variables corresponding to
set nh.demoadv; the age groups are created (age_19to30
age_31to50 age_51to70 age_71plus).
if riagendr= 1 and ridageyr>= 19 ; This type of dummy coding must be
used with the MIXTRAN macro because
IF RIDAGEYR > = 19 AND RIDAGEYR there is no class statement in the
NLMIXED procedure.
<= 30 THEN AGEGRP= 5 ;

IF RIDAGEYR >= 31 AND RIDAGEYR <=

50 THEN AGEGRP= 6 ;

IF RIDAGEYR >= 51 AND RIDAGEYR <=

70 THEN AGEGRP= 7 ;

IF RIDAGEYR > 70 THEN AGEGRP= 8
;

array a (*) age_19to30 age_31to50
age_51to70 age_71plus;

do i = 1 to dim(a);

a(i) = 0 ;

end ;

a(agegrp- 4 ) = 1 ;

if w0304_0 ne . ;

drop i;

run ;

data day1; The variables DR1TSFAT and
set adultm; DR2TSFAT are NHANES variables
DAY_WK=DR1DAY; representing total saturated fat (g) from
DRTSFAT=DR1TSFAT; all foods and beverages reported on the
DRTKCAL=DR1TKCAL; 24-hour recalls on days 1 and 2
day= 1 ; respectively. To create a dataset with 2
run ; records per person, the adultm dataset
is set 2 times to create 2 datasets, one
data day2; where day=1 and one where day=2.
set adultm; The same variable name, DRTSFAT, is
DAY_WK=DR2DAY; used for saturated fat on both days. It is
DRTSFAT=DR2TSFAT; created by setting it equal to DR1TSFAT
DRTKCAL=DR2TKCAL; for day 1 and DR2TSFAT for day 2. The
day= 2 ; variables DR1TKCAL and DR2TKCAL
run ; are NHANES variables representing
total energy consumed on days 1 and 2
respectively from all foods and
beverages.

data sfat; Finally, these data sets are appended,

set day1 day2; and dummy variables are created for

if DAY_WK in ( 1 , 6 , 7 ) then weekend= weekend days.

1;

else if DAY_WK in ( 2 , 3 , 4 , 5 )

then weekend= 0 ;

run ;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 2/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Step 2: Sort the dataset by respondent and day

It is important to sort the dataset by respondent and intake day (day 1 and 2) because the NLMIXED procedure uses this
information to estimate the model parameters.

proc sort ; by seqn day; run ;

Step 3: Create a macro to fit a univariate statistical model for a ubiquitously-consumed dietary
constituent

One SAS macro is created in Step 3, rununi. The rununi macro calls the NLMixed_Univariate and Boxcox_Survey macros
to obtain preliminary estimates for the bivariate model. After creating this macro and running it 1 time, it may be called
several times, each time changing the macro variables.

Multiple or no covariates may be used in the modeling. This example uses age group and adjusts the model for weekend
and sequence effects.

Statements Explanation

%macro rununi (data, foodvar, The start of the rununi macro is
modeltype, nloptions, outlib, run); defined. The variables in parentheses
are the macro variables that will be used
data nhanes (keep=id age race repeat in the macro.
dayofweek weekend agegrp age_19to30
age_31to50 age_51to70 age_71plus The dataset (&data) is set, and the
variables are renamed. Note that day of
w0304_&run recall_food); week and weekend are used in this
example.

set &data;
id = seqn;
age = ridageyr;
race = ridreth1;
repeat = day;
dayofweek = DAY_WK;
weekend = weekend;
recall_food = &foodvar;
run;

proc sort data=nhanes; by id The data are sorted by id and repeat.
repeat; run;
A dummy variable called repeat2 is
data nhanes; created. This variable is equal to 0 for
set nhanes; day 1, so that the parameter only needs
to be added to the model for day 2.
if (repeat = 2 ) then repeat2 = 1 ;
The minimum amount consumed on a
else repeat2 = 0 ; consumption day is calculated and
run; saved in a dataset called min_a.

proc means data=nhanes noprint;

where (recall_food > 0 );
var recall_food;
output out=min_a(keep=min_a)
min=min_a;
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 3/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

data nhanes; The minimum amount is added to the
merge nhanes min_a; dataset. For one-part models (amount
modeltype = upcase( "&modeltype" ); models), zero values are set to ½ of the
if (modeltype = "ONEPART" & mimimum amount.

recall_food = 0 ) then recall_food =

min_a / 2 ;
run;

data nhanes_boxcox; A dataset is created to be used to find
set nhanes; the best Box-Cox transformation to
normality. For this dataset, zero values
by id; are deleted.
if (first.id);
if (modeltype ^= "ONEPART" &

recall_food = 0 ) then delete;
run;

%BoxCox_Survey (data =nhanes_boxcox, The macro BoxCox_Survey is called to

subject = id, find the best Box-Cox transformation.

var = recall_food,

weight = w0304_&run,
print = Y,

ntitle = 2 );

data _null_; The dataset _lambda from the
set _lambda;
BoxCox_Survey macro is set; the macro
call symput( "lambda" ,
variable lambda is created.
trim(left(put(lambda_recall_food, 4.2

)))); run;

proc sort data=nhanes; by id repeat; After sorting the data to make sure they
run; are in the correct order, the macro
title2 "Males 19+" ; NLMixed_Univariate is called to fit the
model.

% NLMixed _Univariate (data = The Box-Cox transformation parameter
(lambda) is set equal to the value
nhanes, selected by macro BoxCox_Survey
through macro variable &.
lambda = &lambda,

subject = id,

repeat = repeat,

response = recall_food,

modeltype = &modeltype,

covars_prob = repeat2 weekend
age_19to30 age_31to50 age_51to70,

covars_amt = repeat2 weekend
age_19to30 age_31to50 age_51to70,

replicate_var= w0304_&run,

nloptions = &nloptions,

print = Y,

ntitle = 3 );

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 4/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

data & outlib..p The dataset
arms_u_&foodvar._&run; outlib.parms_u_&foodvar._&run is
set parms_u; created to save the parameters from the
run; model.

data & outlib..p red_u_&foodvar._&run; The dataset
set pred_x_u; outlib.pred_u_&foodvar._&run is created
run;
to save the predicted values from the

model.

proc datasets nolist; delete Unneeded datasets are deleted.
nhanes_boxcox min_a parms_u pred_x_u The end of the rununi macro.
nhanes; run; quit;
%mend rununi;

Step 4: Create a macro to fit a bivariate statistical model for two ubiquitously-consumed dietary
constituents

One SAS macro is created in Step 4, runbivar. The runbivar macro calls the NLMixed_Bivariate macro to fit a bivariate
model of two ubiquitously-consumed dietary constituents. After creating this macro and running it 1 time, it may be called
several times, each time changing the macro variables.

Multiple or no covariates may be used in the modeling; in this example, age group is used, and the model is adjusted for
weekend and sequence effects.

Statements Explanation

%macro runbivar (data, foodvar1, The start of the runbivar macro is
foodvar2, modeltype, nloptions, defined. The variables in parentheses
outlib, run); are the macro variables that will be
used in the macro.

data nhanes (keep=id age race repeat The dataset (&data) is set, and the
dayofweek weekend agegrp age_19to30
age_31to50 age_51to70 age_71plus variables are renamed. Note that age,
w0304_&run recall_food1 recall_food2); race, repeat, day of week and
weekend are used in this example.
set &data;
id = seqn;
age = ridageyr;
race = ridreth1;
repeat = day;
dayofweek = DAY_WK;
weekend = weekend;
recall_food1 = &foodvar1;
recall_food2 = &foodvar2;
run;

proc sort data=nhanes; by id repeat; The data are sorted by the variables id
run;
and repeat.

data nhanes; A dummy variable called repeat2 is
set nhanes; created. This variable is equal to 0 for
if (repeat = 2 ) then repeat2 = 1 ; day 1, so that the parameter only
else repeat2 = 0 ; needs to be added to the model for
day 2.
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 5/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

proc means data=nhanes noprint; The minimum amount consumed on a
where (recall_food1 > 0 ); consumption day is calculated for food
var recall_food1; 1 and saved in a dataset called
output out=min_a1(keep=min_a1) min_a1.
The minimum amount consumed on a
min=min_a1; consumption day is calculated for food
run; 2 and saved in a dataset called
min_a2.
proc means data=nhanes noprint;
where (recall_food2 > 0 ); The datasets min_a1 and min_a2 are
var recall_food2; merged.
output out=min_a2(keep= min_a2)

min=min_a2;
run;

data min_a;
merge min_a1 min_a2;
run;

data nhanes; The minimum amount is added to the
merge nhanes min_a;
dataset. For one-part models (amount
modeltype = upcase( "&modeltype" );
models), zero values are set to ½ of
if (modeltype = "ONEPART" &
the mimimum amount.
recall_food1 = 0 ) then recall_food1 =

min_a1 / 2 ;

if (recall_food2 = 0 ) then

recall_food2 = min_a2 / 2 ;
run;

data init_parms_f1; The dataset from the univariate model

set & outlib..p arms_u_&foodvar1._&run; for food 1 is set, and the parameters
rename a_intercept =
A1_intercept are renamed for the bivariate model.

a_repeat2 =
A1_repeat2
a_weekend =
A1_weekend
a_age_19to30 =
A1_age_19to30
a_age_31to50 =
A1_age_31to50
a_age_51to70 =
A1_age_51to70
a_logsde =
A1_LogSDe
a_lambda =
A1_Lambda
logsdu2 = LogSDu2;
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 6/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

data init_parms_f2; The dataset from the univariate model

for food 2 is set, and the parameters
set & outlib..p arms_u_&foodvar2._&run; are renamed for the bivariate model.
rename a_intercept =
A2_intercept
a_repeat2 =
A2_repeat2
a_weekend =
A2_weekend
a_age_19to30 =
A2_age_19to30
a_age_31to50 =
A2_age_31to50
a_age_51to70 =
A2_age_51to70
a_logsde =
A2_LogSDe
a_lambda =
A2_Lambda
logsdu2 = LogSDu3;
run;

data init_parms; Initial estimates for the two food or
merge init_parms_f1 init_parms_f2;
keep A1_intercept A1_repeat2 nutrients are combined.

A1_weekend A1_age_19to30 A1_age_31to50
A1_age_51to70 A2_intercept A2_repeat2
A2_weekend A2_age_19to30 A2_age_31to50
A2_age_51to70 A1_LogSDe A2_LogSDe
LogSDu2 LogSDu3;
run;

data lambdas (keep=a1_lambda Macro variables lambda1 and lambda2
a2_lambda); are created. Note that these macro
variables are the same Box-Cox
merge init_parms_f1 init_parms_f2; parameters that were selected by
call symput( "lambda1" macro BoxCox_Survey inside macro
,trim(left(put(a1_lambda, 4.2 )))); rununi (see step 3).
call symput( "lambda2"
,trim(left(put(a2_lambda, 4.2 ))));
run;

proc print data=lambdas; Print the Box-Cox parameters.
var a1_lambda a2_lambda;
title3 "Box-Cox Transformation

Parameters" ;
run;

title2; Title2 is reset.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 7/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

%NLMixed_Bivariate (data = The macro NLMixed_Bivariate is called
nhanes, to fit the model.

init_parms = init_parms, The Box-Cox transformation
lambda1 = &lambda1, parameters (lambda1 and lambda2)
lambda2 = &lambda2, are set equal to the values selected by
subject = id, macro BoxCox_Survey via macro
variables &lambda1 and &lambda2.

repeat = repeat,

response1 = recall_food1,

response2 = recall_food2,

modeltype = &modeltype,

covars_prob1 = repeat2
weekend age_19to30 age_31to50
age_51to70,

covars_amt1 = repeat2
weekend age_19to30 age_31to50
age_51to70,

covars_amt2 = repeat2
weekend age_19to30 age_31to50
age_51to70,

replicate_var = w0304_&run,

nloptions = &nloptions,

print = Y,

ntitle = 3 );

data & outlib..p The dataset
arms_b_&foodvar1._&foodvar2._&run; outlib.parms_b_&foodvar._&run is
created to save the parameters from
set parms_b; the model.
run;

data & outlib..p The dataset
red_b_&foodvar1._&foodvar2._&run; outlib.pred_b_&foodvar._&run is
created to save the predicted values
set pred_x_b; from the model.
run;

proc datasets lib=work nolist; Unneeded datasets are deleted.
delete nhanes min_a1 min_a2
init_parms_f1 init_parms_f2 init_parms
lambdas;
run;
quit;

%mend runbivar; The end of the runbivar macro.

Step 5: Create a macro to estimate the distribution of a ratio of two ubiquitously-consumed 8/18
dietary constituents

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

One SAS macro is created in Step 5, rundist. The rundist macro calls the Distrib_Bivariate macro and the
Percentiles_Survey macro to estimate the distribution of a ratio of two ubiquitously-consumed dietary constituents. After
creating this macro and running it 1 time, it may be called several times, each time changing the macro variables.

Statements Explanation

%macro rundist(foodvar1, foodvar2, The start of the rundist macro is defined.
modeltype, seedv, outlib, run); The variables in parentheses are the macro

variables that will be used in the macro.

%global seed; Set seed for the random number generator.

%let seed = &&seedv;

data parms; The parameter estimates and predicted

set & outlib..p values that were calculated in the runbivar

macro are set to the parms and pred
arms_b_&foodvar1._&foodvar2._&run; datasets, respectively. The dataset pred is

run; sorted by id and repeat.

data pred;

set & outlib..p
red_b_&foodvar1._&foodvar2._&run;

run;

proc sort data=pred; by id
repeat; run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 9/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

data pred; Because weekend was used, two records

merge pred parms(keep=a1_repeat2 are created per subject, one for weekday,
a2_repeat2 a1_weekend a2_weekend);
and one for weekend.

run; SAS assumes repeat2 is a covariate in the
model. The predicted value is calculated

data pred (drop=a1_repeat2 for day 1, i.e., when repeat2=0. SAS also
a2_repeat2 a1_weekend a2_weekend); assumes that weekend is a covariate in the
set pred;
by id; model set to 0 if the recall was Monday-
if (first.id); Thursday, and 1 if the recall was for Friday-
Sunday.

if (repeat2 = 1 ) then do; For each pseudo individual, Macro
Distrib_Bivariate generates usual weekday
repeat = 1 ; and usual weekend intakes of saturated fat
(energy) and then calculates usual intake
repeat2 = 0 ; as the weighted average of the usual
pred_x_a1 = pred_x_a1 - weekday and usual weekend intakes.
Variable day_wgt specifies the weight for
a1_repeat2; pred_x_a2 = weekdays (4/7) and weekends (3/7). Note:
these can be changed in this macro.
pred_x_a2 - a2_repeat2; end;

if (weekend = 1 ) then do;
weekend = 0 ;
pred_x_a1 = pred_x_a1 -

a1_weekend;
pred_x_a2 = pred_x_a2 -

a2_weekend;
end;

day_wgt = 4 / 7 ;
output;

weekend = 1 ; Macro variables min_a1 and min_a2 are
pred_x_a1 = pred_x_a1 + created for food 1 and food 2. These
a1_weekend; variables equal ½ of the minimum amount.
pred_x_a2 = pred_x_a2 +
a2_weekend;
day_wgt = 3 / 7 ;
output;
run;
data _null_;

set pred;

min_a1 = min_a1 / 2 ;

call symput( "min_a1"
,trim(left(put(min_a1, best12.
))));

min_a2 = min_a2 / 2 ;

call symput( "min_a2"
,trim(left(put(min_a2, best12.
))));
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 10/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

title2 "Males 19+" ; The minimum amount is added to the

dataset. For one-part models (amount
%Distrib_Bivariate (param = parms, models), zero values are set to ½ of the
predicted = pred,
subject = id, mimimum amount.

modeltype = &modeltype,

nsim_mc = 100 ,
day_wgt = day_wgt,
min_a1 = &min_a1,
min_a2 =

&min_a2,

print = N,

ntitle = 2 );

options notes; Notes are turned back on in the log and
title2; title2 is reset.

data mcsim; 100 * (( 9 * t1) / A dataset called mcsim is created from
set _mcsim; setting the dataset _mcsim from the
t_density = Distrib_Bivariate macro. The usual nutrient
density is defined for saturated fat (9
t2); kcal/g).
run;

title2 "Table 1: Percentiles by A dataset called mcsim2 is created from
Age" ; mcsim, defining the subpopulation variable
from the variable agegrp.
data mcsim2;
set mcsim;
Subpopulation = agegrp;
format Subpopulation ageg. ;

run;

%Percentiles_Survey (data = The macro Percentiles_Survey is called to
mcsim2, calculate the percentiles of the usual
nutrient density. The cutpoints correspond
byvar = subpopulation, to 10%, 12%, and 15% of calories from
var = t_density, saturated fat.
weight = w0304_&run,

cutpoints = 10 12 15 ,
print = N,

ntitle = 2 );

data pctl; The dataset _percentiles is created from
set _percentiles; the macro Percentiles_Survey.
by subpopulation;
run;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 11/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

%if &run= 0 %then %do ; Summary tables are printed for the point
estimate run.
title2 "Estimated Mean and
Percentiles" ;

title3 "Men, ages 19+" ;

proc print data=pctl label;

id subpopulation;

var Mean Pctile5 Pctile10
Pctile25 Pctile50 Pctile75
Pctile90 Pctile95;

format Mean Pctile5 Pctile10
Pctile25 Pctile50 Pctile75
Pctile90 Pctile95 7.2 ;

run;

title2 "Estimated Cut-Point
Probabilities" ;

proc print data=pctl label;
id subpopulation;
var Prob1-Prob3;
format Prob1-Prob3 7.2 ;
label Prob1 = "Prob(X <= 10)" |
Prob2 = "Prob(X <= 12)"
Prob3 = "Prob(X <= 15)" ;
run;

%end ; The means, percentiles, and cutpoint
data & outlib..p proportions are saved for each run.
ctl_b_&foodvar1._&foodvar2._&run;
Unneeded datasets are deleted.
set pctl;
run=&run; The end of the rundist macro.
run;
proc datasets nolist;
delete parms pred mcsim _mcsim
mcsim2 pctl _percentiles;
run;
quit;
%mend rundist;

Step 6: Run the rununi, runbivar, and rundist macros

Now that the rununi, runbivar, and rundist macros have been created, they need to be called.

Statements Explanation

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 12/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

%include First, the univariate model of saturated fat
"C:\NHANES\Macros\ratio.macros.v1\
NLMixed_univariate.macro.v1.1.sas" is run to get starting values. This requires
; the NLMixed_Univariate macro and the
BoxCox_Survey macro, so these are

%include included.

"C:\NHANES\Macros\ratio.macros.v1\ The foodvar is set equal to DRTSFAT to fit

boxcox_survey.macro.v1.1.sas" ; the univariate model for saturated fat. A

%rununi (data=sfat, one-part model (ONEPART) is specified,
foodvar=DRTSFAT, as saturated fat is ubiquitously consumed.
modeltype=ONEPART, nloptions= Options for NLMIXED are set. The macro
technique=trureg maxfunc= 10000 variable run is set equal to 0 to indicate the
maxiter= 200 , outlib=work, run= 0 base run.
);

%rununi (data=sfat, The univariate model for energy is run to
foodvar=DRTKCAL, get starting values.
modeltype=ONEPART, nloptions=

technique=trureg maxfunc= 10000

maxiter= 200 , outlib=work, run= 0
);

%include The bivariate model of saturated fat and
"C:\NHANES\Macros\ratio.macros.v1\ energy is run. This requires the
NLMixed_bivariate.macro.v1.1.sas" ; NLMixed_Bivariate macro so this is

%runbivar (data=sfat, included.

foodvar1=DRTSFAT,
foodvar2=DRTKCAL,
modeltype=ONEPART,
nloptions=technique=trureg

maxfunc= 10000 maxiter= 200 ,

outlib=work,run= 0 );

%include The bivariate distribution of saturated fat
"C:\NHANES\Macros\ratio.macros.v1\ and energy is estimated. This requires the
distrib_bivariate.macro.v1.1.sas" ; Distrib_Bivariate macro and the

%include Percentiles_Survey macro, so these are
"C:\NHANES\Macros\ratio.macros.v1\ included.

percentiles_survey.macro.v1.1.sas"
;

%rundist(foodvar1=DRTSFAT,
foodvar2=DRTKCAL,
modeltype=ONEPART, seedv= 0 ,
outlib=work, run= 0 );

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 13/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

%macro BRR204(run, outlib, Next, a macro called BRR204 is defined to
foodvar1, foodvar2); obtain the datasets needed for BRR. A
dataset called brr_runs is created from the
%do run= 1 %to 16 ; output of the Percentiles_Survey macro.

options nonotes;

%put BRR Run &run;

%rununi (data=sfat,
foodvar=DRTSFAT,
modeltype=ONEPART, nloptions=
maxfunc= 10000 maxiter= 200 ,
outlib=work, run=&run);

%rununi (data=sfat,
foodvar=DRTKCAL,
modeltype=ONEPART, nloptions=
maxfunc= 10000 maxiter= 200 ,
outlib=work, run=&run);

%runbivar (data=sfat,
foodvar1=DRTSFAT,
foodvar2=DRTKCAL,
modeltype=ONEPART,
nloptions=technique=trureg
maxfunc= 10000 maxiter= 200 ,

outlib=work,run=&run);

%rundist(foodvar1=DRTSFAT, The output from the BRR runs is saved in
foodvar2=DRTKCAL, the brr_runs dataset.
modeltype=ONEPART, seedv= 0 ,
outlib=work, run=&run); The end of the do loop for the BRR runs.
proc append base=brr_runs data=& The end of the BRR204 macro.
outlib..p The BRR204 macro is called.
ctl_b_&foodvar1._&foodvar2._&run;
The BRR runs are printed.
%end ;

%mend BRR204;
%BRR204(foodvar1=DRTSFAT,
foodvar2=DRTKCAL, outlib=work)

proc print data =brr_runs; run ;

Step 7: Calculate BRR SEs and print final estimates

Statements Explanation

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 14/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

data brr_runs2; The dataset brr_runs is set to create a new
set brr_runs; dataset, brr_runs2, and variables are renamed to
indicate they are from BRR runs.

rename mean=bmean pctile1-
pctile99=bpctile1-bpctile99
prob1-prob3=bprob1-bprob3;

run ; The data are sorted before merging.

proc sort data
=pctl_b_DRTSFAT_DRTKCAL_0;
by subpopulation;

proc sort data =brr_runs2; by
subpopulation;

data distall; The datasets pctl_b_drtsfat_drtkcal_0 (from the

merge rundist macro for the first run) and brr_runs2 are

pctl_b_DRTSFAT_DRTKCAL_0 merged, and the squared difference between the

brr_runs2; by subpopulation; BRR estimate and the parameter from the first run

array bvar (*) bmean is created.
bpctile1-bpctile99 bprob1-
bprob3;

array varo (*) mean
pctile1-pctile99 prob1-
prob3;

array dsqr (*) dbmean
dbpctile1-dbpctile99
dbprob1-dbprob3;

do i= 1 to dim(bvar);
dsqr[i]=(bvar[i]-varo[i])**

2;

end ;

run ;

proc means data =distall sum The sum of squares is computed.

noprint ; by subpopulation;

var dbmean dbpctile1-
dbpctile99 dbprob1-dbprob3;;

output out =sums sum
=sum_dbmean sum_dbpctile1-
sum_dbpctile99 sum_dbprob1-
sum_dbprob3;

run ;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 15/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

data brr; The standard errors are computed. Each SE is

set sums; multiplied by -1 to make it print out in parentheses

array sumt (*) sum_dbmean in the final step.
sum_dbpctile1-sum_dbpctile99
sum_dbprob1-sum_dbprob3;

array se (*) mean
pctile1-pctile99 prob1-
prob3;

do j= 1 to dim(sumt);

se[j]=- 1 *sqrt((sumt[j])/(
16 * .49 ));

end ;
keep mean pctile1-pctile99
prob1-prob3 subpopulation;
run ;

data toprint1; To create the final dataset, the point estimates are
set saved in a file called toprint1. The variable line will
identify them as estimates.
pctl_b_DRTSFAT_DRTKCAL_0;
line= 1 ;

keep subpopulation mean
pctile1-pctile99 prob1-prob3
line;

run ;

data toprint2; The standard errors are saved in a dataset called
set brr; toprint2. The variable line will identify them as
line= 2 ; standard errors.

keep subpopulation mean
pctile1-pctile99 prob1-prob3
line;

run ;

data nh.m20task4; The final dataset is created by appending toprint1
set toprint1 toprint2; and toprint2.

run ;

proc sort data =nh.m20task4; The final dataset is sorted.

by subpopulation line;

run ;

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 16/18

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

Statements Explanation

proc print data =nh.m20task4 The final dataset is printed. The format negparen
will make the standard errors print in parentheses.
split = ' ' noobs ;

var subpopulation line
mean pctile5 pctile10
pctile25 pctile50 pctile75
pctile90 pctile95 prob1-
prob3;

format line line. mean
pctile1-pctile99
negparen10.1

prob1-prob3 negparen6.2 ;

title 'Usual Intake of
%Saturated Fat' ;

title2 'NHANES 2003-04' ;

run ;

Step 8: Interpret estimates

The output from each NLMIXED run will be printed in the output. The first NLMIXED output (replicate variable
w0304_0) is a listing of the parameter estimates for the estimation of saturated fat. However, the standard errors
are incorrect. Next, the univariate model for energy is output, and then the bivariate model is output. The other
NLMIXED runs are from the BRR replications. Percentile estimates are also printed for the base run and the BRR
runs.
The last page of the output gives the estimated percentiles and cutpoint probabilities with standard errors by
subpopulation (age group).

The median percent of usual energy from saturated fat for men ages 19 to 30 years is 10.9% (SE=0.3%). 17/18

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm

12/19/2018 NHANES Dietary Web Tutorial: Estimating Population-Level Distributions of Usual Dietary Intake: Task 4

The cutpoint probabilities correspond to 10%, 12%, and 15% of calories from saturated fat. For men ages 71 and
older, 32% (SE=5%) have a usual intake of saturated fat as a percentage of energy of less than 10%.

IMPORTANT NOTE

Note: Your results may vary slightly, as a random seed is used to estimate the distribution of usual intake. However, they
would not be expected to vary by more than 1%.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/EstimateDistributions/Task4.htm 18/18

NHANES Dietary Web Data Tutorial - Examine the Relationship Betwe... https://www.cdc.gov/nchs/tutorials/dietary/advanced/ExamineRelationshi...

Examine the Relationship Between Dietary Intake and Some Outcome
Measure

Purpose

The term “dietary intake” in this module will include foods and beverages reported on the 24-hour recalls. Researchers are
often interested in relating an individual’s usual intake of dietary components to health parameters. Due to the different
statistical properties of distributions for ubiquitously-consumed dietary constituents (i.e, nutrients and food groups consumed
on a daily or almost daily basis), and episodically-consumed dietary constituents (i.e., nutrients and food groups that are not
consumed every day for more than about 5% of the population), different models are fit for ubiquitously-consumed and
episodically-consumed dietary constituents. This module will focus on using the method developed by researchers at NCI and
elsewhere (the “NCI method”) for the case of ubiquitously- and episodically-consumed dietary constituents.

IMPORTANT NOTE
Many of the statistical methods used in this course are advanced, and may require consultation with a statistician. For modules
18-22, it is required that you have the statistical knowledge of mixed effects models, and program knowledge of calling in SAS
macros. Since Module 18 provides the background information for Modules 19-22, it is advised that you carefully read Module
18 first before tackling other modules.

Task 1: Describe Regression Calibration

This task provides an overview of regression calibration.

Key Concepts about Regression Calibration (/nchs/tutorials/Dietary/Advanced/ExamineRelationship/Info1.htm)

Task 2: Examine the Relationship between Usual Intake of a Single Ubiquitously-consumed
Dietary Constituent and Some Outcome

This task describes the use of statistical methods to estimate the individual-level predictions of a ubiquitously-consumed
dietary constituent, such as a nutrient that is consumed daily, and relate this intake to a health parameter.

Key Concepts about Examining the Relationship Between Usual Intake of a Single Ubiquitously-consumed Dietary
Constituent and Some Outcome (/nchs/tutorials/Dietary/Advanced/ExamineRelationship/Info2.htm)
How to Examine the Relationship Between Usual Intake of a Single Ubiquitously-Consumed Dietary Constituent and Some
Outcome (/nchs/tutorials/Dietary/Advanced/ExamineRelationship/Task2.htm)
Download Sample Code and Datasets (/nchs/tutorials/Dietary/downloads/downloads.htm)

Task 3: Examine the Relationship between Usual Intake of a Single Episodically-Consumed
Dietary Constituent and Some Outcome

This task describes the use of statistical methods to estimate the individual-level predictions of an episodically-consumed
dietary constituent, such as a food group that is not consumed every day, and relate intake to a health parameter, using the NCI
method.

Key Concepts about Examining the Relationship Between Usual Intake of a Single Episodically-consumed Dietary
Constituent and Some Outcome (/nchs/tutorials/Dietary/Advanced/ExamineRelationship/Info3.htm)
How to Examine the Relationship Between Usual Intake of a Single Episodically-consumed Dietary Constituent and Some
Outcome (/nchs/tutorials/Dietary/Advanced/ExamineRelationship/Task3.htm)
Download Sample Code and Datasets (/nchs/tutorials/Dietary/downloads/downloads.htm)

Page last updated: May 2, 2013
Page last reviewed: May 2, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:24 PM

12/19/2018 NHANES Dietary Web Tutorial: Examine the Relationship Between Dietary Intake and Some Outcome Measure: Task 1

Print Text!

Task 1: Key Concepts about Measurement Error

When interest is on relating usual intake of a dietary constituent to disease or a biomarker (termed “health parameters” in
this module), we must consider the impact of measurement error in dietary assessment on the estimate of the relationship
of interest. Random dietary measurement error will lead to an attenuated (weakened) estimate of the true relationship and
a loss of statistical power. To get an (almost) unbiased estimate of the true relationship we can use an approach called
regression calibration.

Figure 1. The effects of random error on the relationship between usual intake and a health parameter. The black
dots and solid regression line represent the true relationship, and the blue triangles and dashed line represent the
observed attenuated relationship.

Statistically, we can represent the relationship between the health outcome O and true usual intake T as

O= a0+ a1T+ e

Where α0 is the intercept, α1 is the slope of the regression of O on T, andε is error. For simplicity, we are representing the
relationship between O and T as linear. In fact, the health parameter may be continuous or categorical. For example,

when modeling a disease outcome, logistic regression might be used. In this statistical model,α1 is the parameter of most

interest, as it represents the relationship between true usual intake and the health parameter. Unfortunately, we do not
have a measure of T, so it cannot be estimated directly. Instead of T, we have R, our 24-hour recall data that are
measured with error:

The tildes represent that the relationships that are estimated with the data measured with error differ from those that we
would have observed if we had a measure of true usual intake. By relating our imperfect measure R to truth T, we can
quantify the difference between α1 and . First, we define:

R=T+U

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/ExamineRelationship/Info1.htm 1/3

12/19/2018 NHANES Dietary Web Tutorial: Examine the Relationship Between Dietary Intake and Some Outcome Measure: Task 1

where U is the random error associated with R. The above relationship makes an important assumption – that R is
measured with error, but that this error is random, i.e., R provides an unbiased estimate of T. This is the assumption we
make throughout the tutorial for the 24-hour recall.

Next, we can use the definition of the slope, and the substitution of the formula above for R to show that:

where λ=var(T)/[var(T) + var(U)]. We assume that T and U are independent and that the measurement error is non-
differential with respect to the outcome. This factor, λ, is termed the attenuation factor, and it is the slope of the regression
of T on R. In the case of logistic regression, the relationship between the true relative risk (RRT) and the relative risk
observed when R is used rather than T (RRR), are associated by the formula:

The attenuation factor also is related to the correlation between truth T and the reported value R through:

When attenuation occurs, there is a loss of power for testing that the slope is significantly different from 0 (i.e., no
relationship). This power is directly related to the inverse of the square of the correlation between T and R, i.e., the sample

size required to detect an effect using R is times the sample size required if T were available.

To get an (almost) unbiased estimate of α1 we can use an approach called regression calibration. In regression
calibration, instead of using R for T in the disease model we use the expected value of T, given that we know R:

Intuitively, this estimator is our best estimate of T, given what we do know from the 24-hour recall data. This value is the
Empirical Bayes estimator (or best linear unbiased predictor) for the linear mixed model, when no transformation of the
data is necessary. Unfortunately, transformation is usually required and numerical integration needs to be used. This may
be done using the NCI method.

If we have covariates X in our health parameter model, which we almost always do, i.e.,

then we also must include X in our regression calibration predictor, In addition, even if X is not in the health

model, then we can still use it to obtain . This gives us a better estimate of T, and, consequently, a better estimate of α1 .
This is called extended regression calibration. These covariates should not be related to the health parameter given truth,
however.

In the health model, a transformation of T, rather than T on the original scale, may provide a better model fit. Therefore,
this method uses a Box-Cox transformation of T in the health model.

It is important to note that regression calibration does not restore power; it is used to obtain an estimate of the true
parameter relating diet to the health outcome (e.g., relative risk).

Close Window to return to module page. 2/3

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/ExamineRelationship/Info1.htm

12/19/2018 NHANES Dietary Web Tutorial: Examine the Relationship Between Dietary Intake and Some Outcome Measure: Task 1

https://www.cdc.gov/nchs/tutorials/Dietary/Advanced/ExamineRelationship/Info1.htm 3/3


Click to View FlipBook Version