The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Home Explore NHANES Dietary Web Tutorial_397 pages

NHANES Dietary Web Tutorial_397 pages

Like this book? You can publish your book online for free in a few minutes!

Download PDF

Related Publications

Discover the best professional documents and content resources in AnyFlip Document Base.

Published by smlneyman, 2019-01-16 01:35:47

NHANES Dietary Web Tutorial_397 pages

Pages:

NHANES Dietary Web Tutorial_397 pages

UNIT 2

PREPARING A DIETARY
ANALYTIC DATASET

NOTES

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Preparing/intro.htm

Preparing a Dietary Analytic Dataset

Dietary data are among the most complex of all the data in NHANES. For this reason, preparing a dataset for dietary analysis
is an especially critical set of steps and often may be more time-consuming than the analysis itself.
Analysts working with NHANES dietary data frequently want to be able to answer the following types of questions:

What is the mean intake of a given food?
What is the mean intake of a given nutrient from all foods and beverages?
What is the mean intake of a given nutrient from supplements?
Which foods are the major sources of a given nutrient?
What is the distribution of intake of a given food or nutrient across a selected population?
How does dietary intake relate to some health parameter?

To conduct these analyses, you will first need to know how to successfully complete the tasks described in the following
modules of the Preparing an Analytic Dataset course:

Module 6. Locate Variables (/nchs/tutorials/Dietary/Preparing/Locate/intro.htm)
Module 7. Download Data Files (/nchs/tutorials/Dietary/Preparing/Download/intro.htm)
Module 8. Merge & Append Datasets (/nchs/tutorials/Dietary/Preparing/MergeAppend/intro.htm)
Module 9. Review Data & Create New Variables (/nchs/tutorials/Dietary/Preparing/ReviewCreate/intro.htm)
Module 10. Format & Label Variables (/nchs/tutorials/Dietary/Preparing/FormatLabel/intro.htm)
Module 11. Save a Dataset (/nchs/tutorials/Dietary/Preparing/Save/intro.htm)

As you work your way through these modules, and eventually prepare your own analytic dataset, it is useful to keep in mind
three issues that add to the challenge of dietary data analysis—the unit of analysis, variable definitions, and the
inferred population. All of these issues require that you think very specifically about your research question.

IMPORTANT NOTE
This tutorial uses the SAS convention of using the term "variable" to refer to a field in a dataset.

One of the reasons that dietary data are so complex is because the unit of analysis may vary. The basic unit of analysis in
NHANES is the individual participant, identified by the variable SEQN. However, because of the way the dietary data are
structured—with individuals having multiple food and dietary supplement records, which in turn have their own
accompanying sets of variables—the unit of analysis for some types of analyses is at the level of the food or supplement,
rather than the individual.
Dietary data also are challenging to work with because many analyses require the creation of new variables from variables
that are found in the survey data files. For example, if you are interested in finding the answer to the question “What is the
mean intake of milk among survey participants?,” the way you define “milk” (e.g., all types of fluid milk consumed as a
beverage, or milk also consumed as an ingredient in other foods, or servings of milk as defined by the guidance in
MyPyramid) may require you to create several new variables based on your analytic needs.

IMPORTANT NOTE
The modules in this course require some basic knowledge of statistics as well as statistical software (e.g. SAS and SUDAAN)
and programming.

Before you get started

Review the Dietary Data Survey Orientation course (/nchs/tutorials/dietary/SurveyOrientation/intro.htm) .

If you have questions about this tutorial as a whole:

Check out the Dietary Data Tutorial Roadmap (/nchs/tutorials/dietary/roadmap.htm) to orient yourself to the tutorial’s

1 of 2 1/14/2019, 10:16 PM

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Preparing/intro.htm

content.

Read the Introduction (/nchs/tutorials/dietary/introduction.htm) to find answers to frequently asked questions about
NHANES dietary data and this tutorial.

Browse through the Logistics (/nchs/tutorials/dietary/logistics/logistics.htm) section to learn about the web layouts and
templates used in the tutorial and find out the basic knowledge and skills you’ll need to use the tutorial.

Go to Technical & Software Requirements (/nchs/tutorials/dietary/logistics/techsoftwarereqs.htm) for information about
what’s required to view the tutorials correctly and run the sample programs properly. This section also is the place to go if
you need help with technical problems.

Sample Code

Abbreviated SAS and SUDAAN code is presented throughout the tutorial for the sole purpose of demonstrating and explaining
specific steps in an analysis. The abbreviated code does not comprise a complete SAS or SUDAAN program that can be readily
submitted for a computer run. If you need the complete SAS or SUDAAN program, please consult the Additional Resources
section of this tutorial.

Before you get started

Check out the Dietary Data Tutorial Roadmap (/nchs/tutorials/dietary/roadmap.htm) to orient yourself to the tutorial’s
content.
Read the Introduction (/nchs/tutorials/dietary/introduction.htm) to find answers to frequently asked questions about
NHANES dietary data and this tutorial.
Browse through the Logistics (/nchs/tutorials/dietary/logistics/logistics.htm) section to learn about the web layouts and
templates used in the tutorial and find out the basic knowledge and skills you’ll need to use the tutorial.
Go to Technical & Software Requirements (/nchs/tutorials/dietary/logistics/techsoftwarereqs.htm) for information about
what’s required to view the tutorials correctly and run the sample programs properly. This section also is the place to go if
you need help with technical problems.

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

2 of 2 1/14/2019, 10:16 PM

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Preparing/Locate/intro.htm

Locate Variables

Purpose

Dietary variables, and information about the variables, are stored in different data and documentation files. You will use these
files to identify dietary recall variables of interest and their file locations (#1) , identify food frequency variables and their file
locations (#2) , identify dietary supplement variables and their file locations (#3) , and identify correct sample weights and their
file locations (#4) . Reviewing the data file documents also is an essential part of learning about dietary variables.

Task 1: Identify Dietary Recall Variables and Their File Locations

Dietary recall variables and information about the variables are stored within the Dietary data files. To identify dietary recall
variable names and file locations, and to understand their naming conventions, you will need to consult the dietary interview
files on the 2003-2004 Dietary Files page.

Key Concepts about Identifying Dietary Recall Variables and File Locations (/nchs/tutorials/Dietary/Preparing/Locate
/Info1.htm)
How to Identify Dietary Recall Variables and File Locations (frame1.htm)

Task 2: Identify Food Frequency Variables and Their File Locations

Food frequency variables and information about the variables are stored within the Dietary data files. To identify food
frequency variable names and file locations, and to understand their naming conventions, you will need to consult the files on
the 2003-2004 Dietary Files page.

Key Concepts about Identifying Food Frequency Variables and File Locations (/nchs/tutorials/Dietary/Preparing/Locate
/Info2.htm)
How to Identify Food Frequency Variables and File Locations (frame2.htm)

Task 3: Identify Dietary Supplement Variables and Their File Locations

Dietary supplement variables and information about the variables are stored within the Dietary data files. To identify dietary
supplement variable names and file locations, and to understand their naming conventions, you will need to consult the files on
the 2003-2004 Dietary Files page.

Key Concepts about Identifying Dietary Supplement Variables and File Locations (/nchs/tutorials/Dietary/Preparing/Locate
/Info3.htm)
How to Identify Dietary Supplement Variables and File Locations (frame3.htm)

Task 4: Identify Correct Sample Weights and Their File Locations

Sample weights are constructed for each 2-year survey cycle to account for over-sampling, survey non-response, and post-
stratification. Because not all persons completed all portions of the survey, each individual in NHANES may be assigned
multiple weight variables (see the Survey Orientation course for more information on how different sample weights are
created). It is important to select the correct weight variable for your particular analysis and take extra care to include it in
your analytic dataset.

Key Concepts about Identifying Correct Sampling Weights and File Locations (/nchs/tutorials/Dietary/Preparing/Locate
/Info4.htm)
How to Identify Correct Sampling Weights and File Locations (/nchs/tutorials/Dietary/Preparing/Locate/task4.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA

1 of 2 1/14/2019, 9:17 PM

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Preparing/Locate/intro.htm

800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 -
Contact CDC–INFO

2 of 2 1/14/2019, 9:17 PM

12/19/2018 NHANES Dietary Web Tutorial: Locate Variables: Identify Dietary Recall Variables and File Locations

Print Text!

Task 1: Key Concepts about Identifying Dietary Recall Variables and File
Locations

NHANES dietary recall variables include data items that are related to food and nutrient intakes as reported on the 24-hour
dietary recalls. Some variables relate to the individual foods reported on the 24-hour recalls, including nutrient content and
specifics about when and where the foods were eaten. Other variables, such as the variables in the total nutrient intake
file, represent the total or sum of the nutrients in all of the foods consumed by a participant on a particular day. Both types
of variables are available for the Day 1 and Day 2 dietary recall data.

These dietary recall variables, and information about the dietary recall variables, are stored within the Dietary component
page of the “Questionnaires, Datasets and Related Documentation” section of the NHANES website for each two-year
survey cycle. The dietary recall data are contained in four separate data files:

Individual Foods – Day 1
Individual Foods – Day 2
Total Nutrient Intakes – Day 1
Total Nutrient Intakes – Day 2

The Individual Foods files provide a data record for each food reported by survey participants during their Day 1 and Day
2 dietary recalls. The variables contained in these files include food energy, nutrient values, gram weights, eating
occasions, and related information, such as timing, source of foods, and location where eaten, for each food reported.
Because most survey participants eat more than one food during a day, the Individual Foods Files contain multiple records
per person for each recall day. Consequently, these

The Total Nutrient Intakes files provide a summary of the dietary intake data for each 1-day recall period. Variables
representing total food energy and total intakes of dozens of nutrients are included. Variable totals were derived by
summing the nutrient amounts from all foods listed on a particular day in the Individual Foods File for a survey participant.
Because the daily nutrient intakes from all foods are summarized, the Total Nutrient Intake files contain only one record per
person for each recall day.

IMPORTANT NOTE

For illustrations of this concept of multiple and single records per person, see the second task in Module 3: “NHANES
Dietary Data Structure and Contents” of the Dietary Data Survey Orientation course.

As shown in the following lists, some variables are unique to particular dietary recall data files and other variables are
common across the files. For more detail on the variables, go to Information about Dietary Variables . Please note,
however, that the most complete descriptions of these variables can be found in the Analytic Notes section of the
Documentation (“Docs”) files.

Variables in Individual Foods File Only 1/2
Food/individual component number
USDA food code
Modification code
Amount of food in grams
Food energy and nutrients contained in each amount of food consumed
Combination foods
Eating occasion
Food source

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/Info1.htm

12/19/2018 NHANES Dietary Web Tutorial: Locate Variables: Identify Dietary Recall Variables and File Locations

Variables in Total Nutrient Intakes File Only
Total food energy and nutrient intakes for the day
Self-reported comparison of recall day with typical diet
Water intake variables
Whether a person is on a special diet

Variables in both Individual Foods and Total Nutrient Intakes Files
Participant sequence number (SEQN)
Dietary recall status code
Dietary sample weights
Number of intake days
Breast-fed infant (either day)
Intake day of week

IMPORTANT NOTE

Dietary recall data are part of the Dietary component of NHANES 2003-2004. Other variables necessary for your analysis
may be located in other parts of the dataset.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/Info1.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Locate Variables: Task 1

Print Text!

Task 1: How to Identify Dietary Recall Variables and File Locations

To identify variables of interest for your analysis and their file locations, you will need to successfully navigate the Dietary
Files component of the NHANES website. You also will need to understand the NHANES dietary recall variables naming
conventions.

Step 1: Identify Dietary Recall Variables and their File Locations

Using the NHANES website to the right, follow the directions below to identify dietary recall variables. The tutorial will use
calcium as an example.

From the NHANES homepage, click Questionnaires, Datasets and Related Documentation. From this page, click
NHANES 2003-2004. On the NHANES 2003-2004 page, click the Dietary link, under “Data, Documentation, Codebooks,
SAS Code.”

On the 2003-2004 Dietary Files page, click the Variable List link, opens the Dietary Variable list as an HTML in your
browser. Right-clicking the link allows you to save the file to your computer. Search the HTML file for the word “calcium.”
The results will show you the four calcium variables. Clicking on the listings will take you to each location in the Variable
List in which calcium is mentioned.

Make a note of the variable labels (e.g., Calcium (mg)), Item ID (e.g., DR1ICALC), the component (e.g., Dietary-Individual
Foods (First Day), and the Data File name where the variables are stored (e.g., DR1IFF_C).

IMPORTANT NOTE

Note that the Variable List and the Documentation use different terms for the same letter-number combination that
designates variables. The Variable List uses “Item Label,” whereas the Documentation uses “variable name.” In other
words, "Item Label" is a different term for "variable name".

IMPORTANT NOTE

The “Docs,” or Documentation, link for each dietary recall interview file contains the documentation, codebook, and
frequencies for all of the variables associated with that data file. Read the data file documentation to verify that a particular
variable is appropriate to use in your analysis. Many variable names have similar labels and survey questions are
modified between survey cycles.

Review the “Food codes” and “Modification codes” links provided with each dietary interview file as these will provide
additional information about the variables.

The “Procedure Manual” links provided at the top of each Dietary Data File page contains excerpts from the Dietary
Interviewer Procedures Manual. The manual describes the dietary recall interview (also called What We Eat in America)
and USDA’s Automated Multiple-Pass Method (AMPM) procedures in detail. No printed dietary recall forms or
questionnaires are used in NHANES. For more information about the dietary recall interview, - see Task 2 of the Dietary
Data Overview module.

Read the documentation.

Step 2: Understand Dietary Variable Naming Conventions

Some variables with different names and data file locations may have the same labels, as shown in the table below. As a
result, it is impossible to differentiate these variables by their labels alone. In order to determine which variables to use in
your analysis, you will need to understand NHANES’ dietary variable naming conventions.

If you are still at the Variable List, close the PDF file and return to the Dietary Files page. 1/2

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/frame1.htm

12/19/2018 NHANES Dietary Web Tutorial: Locate Variables: Task 1

Open the Docs file for the 2003-2004 Dietary Interview (Individual Foods – First Day) file. The documentation will open.
Use the web browser search feature and search for the term "variable name." The first ‘hit’ will explain the variable
naming convention that are used to distinguish between the Day 1 and Day 2 data.

In the Table of Contents on the right-hand side of the window, click on the "Appendix 1. DR1IFF_C and DR2IFF_C
Variables by Position" link. As you scroll through the list of variable names, notice that in the Individual Foods files, the
fourth position of the variable name for nutrients is always the letter "I." Additionally, the number in the third position of
the variable name identifies the collection day.

Click your browser's Back button once to return to the 2003-2004 Dietary Files page. Click on the "Docs" link listed for the
Dietary Interview (Total Nutrient Intakes – First Day) file. The documentation opens. On the Table of Contents on the
right-hand side of the window and find "Appendix 1. DR1TOT_C and DR2TOT_C Variables by Position." As you scroll
through the list of variable names, notice that in the Total Nutrient Intakes files, the fourth position of the nutrientsis always
the letter "T."

IMPORTANT NOTE

Going to Appendix 1 in the documentation for either of the Individual Foods files or the Total Nutrient Intake files is another
way to find the dietary recall variables.

After locating your variables, you should have found the following information. Again the tutorial will use calcium as an
example:

Table of Information about Calcium

Dietary Interview File Variable (SAS) label Item ID/ Item ID Decoded
Variable name

DR = Dietary Recall

Individual Foods – First Day Calcium (mg) DR1ICALC 1 = Day 1
I = Individual Foods

CALC = Calcium

DR = Dietary Recall

Individual Foods – Second Day Calcium (mg) DR2ICALC 2 = Day 2
I Individual Foods

CALC = Calcium

Total Nutrient Intakes Calcium (mg) DR1TCALC DR = Dietary Recall
First Day 1 = Day 1
T = Total Nutrient Intakes
CALC = Calcium

DR = Dietary Recall

Total Nutrient Intakes – Second Day Calcium (mg) 2 = Day 2
DR2TCALC T = Total Nutrient Intakes

CALC = Calcium

Although all four of these variables share the same label—calcium (mg)—they each represent something different because
of their file location. For example, each value of DR1ICALC represents the amount of calcium contained in an individual
food eaten on Day 1. Each value of DR2TCALC represents the total amount of calcium consumed from all foods and
beverages (other than plain drinking water) on Day 2.

Close Window to return to module page. 2/2

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/frame1.htm

12/19/2018 NHANES Dietary Web Tutorial:Locate Variables: Identify Food Frequency Variables and File Locations

Print Text!

Task 2: Key Concepts about Identifying Food Frequency Variables and File
Locations

NHANES food frequency variables cover a range of items related to the consumption of foods and food groups during
the previous 12 months. These variables, and information about the variables, are stored within the DIetary component
and are contained in two separate data files that reflect responses to the NHANES Food Frequency Questionnaire
(FFQ):

FFQ Questionnaire File (FFQRAW_C)

FFQ Average Daily Frequency Covariates File (FFQDC_C)

The variables in the FFQ Questionnaire File represent the actual questionnaire responses. The raw responses were
processed using special software (DietCalc) to produce the second file, the FFQ Average Daily Frequency Covariates
File. DietCalc has food grouping algorithms which derive average daily frequencies

Each survey participant has one record in the FFQ Questionnaire File, whereas each participant has multiple records in
the FFQ Average Daily Frequency Covariates File.

The NHANES food frequency variables and their respective files are divided within the Examination component of the
survey, as follows:

Variables in the FFQ Questionnaire File:

The file has a large number of variables, so it is especially important to review the codebook and documentation. The
variables in this file include:

SEQN: Sequence number
FFQ0001 to FFQ0139d: FFQ line item responses
WTS_FFQ: FFQ sample weight
DRDINT: The number of dietary recalls completed by the survey participant
FFQ_MISS: The total number of missing responses for each survey participant

Variables in the FFQ Daily Frequency Covariates File:

The file contains five variables as listed below. Two of these variables are linked to look-up files that provide descriptive
information.

SEQN: Sequence number

FFQ_VAR: The Variable ID is a numeric code (range 1-100) that links to a brief variable description that is based
on the FFQ items. The VARLOOK look-up file is used to obtain the variable ID description. For example FFQ_VAR
code 7 is “Soda in summer” and code 8 is “Soda, rest of year.”

FFQFOOD: The Food ID is a numeric code (range 1-100) that links to the FOODLOOK, a look-up file that contains
more detailed information about certain types of foods that were queried as follow-up to stem questions. For
example, code 36 is “Cream cheese, regular” and code 37 is “Cream cheese, lowfat.”

FFQ_FREQ: Daily intake frequency computed by NCI’s DietCalc software. For more detail about NCI's DietCalc
software, please see the NCI DietCalc page at http://riskfactor.cancer.gov/DHQ/dietcalc/.

FFQ_CODE: Average daily intake frequency imputation code. DietCalc imputed frequencies for some foods when
responses to FFQ questions were missing or when a scanning error was encountered. This code identifies whether
the average daily frequency value is a reported or an imputed value.

IMPORTANT NOTE

For more detail on the food frequency variables, go to Information about Dietary Variables. Please note, however, that
the most complete descriptions of these variables can be found in the Analytic Notes section of the “Docs” files.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/Info2.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial:Locate Variables: Identify Food Frequency Variables and File Locations

IMPORTANT NOTE

Food frequency data are part of the Dietary component of NHANES 2003-2004. Other variables necessary for your
analysis may be located in other parts of the dataset.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/Info2.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Locate Variables: Task 2

Print Text!

Task 2: How to Identify Food Frequency Variables and File Locations

The food frequency files are included in the list of Dietary data files. To identify variables of interest for your analysis and
their file locations, please refer to the Dietary Files variable list on the NHANES website. Most of the food frequency
variables have an “FFQ” prefix.

Step 1: Identify Food Frequency Variables and their File Locations

Using the NHANES website to the right, follow the directions below to identify food frequency variables and their file
locations:

From the NHANES homepage, click Questionnaires, Datasets and Related Documentation. From the Questionnaires,
Datasets and Related Documentation page, click NHANES 2003-2004.

On the NHANES 2003-2004 page, click the Dietary link, under “Data, Documentation, Codebooks, SAS Code.”

Click the Variable List link. An HTML file of the dietary variables opens. Use the Find feature (or Control + F) to find
"FFQ."

IMPORTANT NOTE
The “Docs,” or Documentation, link for each food frequency file contains the documentation, codebook, and frequencies
for all of the variables associated with that data file. You cannot determine whether a variable is appropriate to use in your
analysis without consulting the data file documentation because variable names may not convey enough information,
similar concepts may be captured by more than one variable, or questions may have changed between survey cycles.
You will use this information when downloading data files and documentation and when performing your analyses.

Read the documentation.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/frame2.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial:Locate Variables: Identify Dietary Supplement Variables and File Locations

Print Text!

Task 3: Key Concepts about Identifying Dietary Supplement Variables and
File Locations

NHANES dietary supplement variables cover items related to the use of vitamin, mineral, herbal, and other dietary
supplements. Some variables relate to the participants and the supplements they may have taken; other variables relate
to the supplements and their formulations and ingredients. These variables, and information about them, are stored within
the Dietary component, contained in five data files:

30-Day Dietary Supplement Use
File 1: Supplement Counts
File 2: Participant's Use of Supplement

Dietary Supplement Database
File 3: Dietary Supplement Blend Information
File 4: Dietary Supplement Ingredient Information
File 5: Dietary Supplement Product Information

Variables in File 1
Participant sequence number (SEQN)
Any dietary supplements taken in the past 30 days?
Total number of supplements taken
Any non-prescription antacids taken?
Total number of antacids taken

Variables in File 2
Participant sequence number (SEQN)
Supplement ID number
Supplement name
Was container seen?
Matching code
How long supplement taken (days)?
How often taken past month? (quantity)
How often taken past month? (unit)
How much taken each time? (quantity)
How much taken each time? (unit)
Antacid reported as a dietary supplement

Variables in File 3
Supplement ID number
Supplement name
Supplement information source
Formulation type
Serving size quantity
Serving size unit
Alternative serving size
Count of vitamins in supplement
Count of minerals in supplement
Count of amino acids in supplement
Count of botanicals in supplement
Count of other ingredients in supplement

Variables in File 4 1/2
Supplement ID number
Supplement name

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/Info3.htm

12/19/2018 NHANES Dietary Web Tutorial:Locate Variables: Identify Dietary Supplement Variables and File Locations

Ingredient ID
Ingredient name
Ingredient operator (<, =, >)
Ingredient quantity
Ingredient unit
Ingredient category
Blend flag

Variables in File 5
Ingredient ID number
Ingredient name
Blend component ID
Blend component name
Blend component category

IMPORTANT NOTE

For more detail on the dietary supplement variables, go to Information about Dietary Variables. Please note, however,
that the most complete descriptions of these variables can be found in the Analytic Notes section of the “Docs” file. You
can find this link on the Dietary Supplement Use line of the 2003-2004 Dietary Files page.

You cannot determine whether a dietary supplement variable is appropriate to use in your analysis without consulting the
data file documentation. You will use this information when downloading data files and documentation and when
performing your analyses.

Read the documentation.

IMPORTANT NOTE

Dietary supplement data are part of the Dietary component of NHANES 2003-2004. Other variables necessary for your
analysis may be located in other parts of the dataset.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/Info3.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Locate Variables: Task 3

Print Text!

Task 3: How to Identify Dietary Supplement Variables and File Locations

Dietary supplement data were collected in the household interview, and these data are listed with the Dietary data. To
identify the variable names and file locations for the variables in your analysis, you will need to successfully navigate the
NHANES website.

Step 1: Identify Dietary Supplement Variables and their File Locations

Using the NHANES website to the right, follow the directions below to identify variables related to dietary supplement use:

From the NHANES homepage, click Questionnaires, Datasets and Related Documentation. From the Questionnaires,
Datasets and Related Documentation page, click NHANES 2003-2004. On the NHANES 2003-2004 page, click the
Dietary link, under “Data, Documentation, Codebooks, SAS Code.”

On the 2003-2004 Dietary page, click the Variable List link. Click on the FInd feature (or Control + F). Type DSD (for
Dietary Supplement Data) in the box at the top of the browser window, and then click the Next button. The Results will
show you all the variables associated with the dietary supplement data. Note that all of the dietary supplement variables
begin with DSD.

IMPORTANT NOTE

You can also find a listing of the dietary supplement variables in the documentation for the Dietary Supplement Use data.
On the NHANES 2003-2004 page, click the Dietary link, under “Data, Documentation, Codebooks, SAS Code.” Scroll
down to the 30-Day Dietary Supplement Use line. Click the “Docs” link to open the Dietary Supplement documentation.
In the Table of Contents, click the Data Files and Structure section.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/frame3.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Locate Variables: Task 4

Print Text!

Task 4: How to Identify Correct Sample Weights and File Locations

NHANES participants are asked to participate in a variety of survey components, some of which are completed with
subsamples of examinees. Several laboratory assays, nutrition, environmental health assessments, and mental health
interviews are completed on subamples of examinees. (Please see the respective survey protocol/documentation for more
specific information. Each component subsample has its own designated sample weight, which accounts for the additional
probability of selection into the subsample component, as well as the additional nonresponse.

Recall from the “Overview of NHANES Survey Design and Weights” module in the Survey Orientation course that the set
of all individuals that have nonzero values for a particular version of sample weights comprise a nationally representative
sample, so long as those sample weights are incorporated into statistical analyses. However, the construction of sample
weights does not take into account item non-response. Item non-response is described in Module 9, Task 1: Key
Concepts about Missing Data in NHANES.

The table below shows the versions of sample weights that have been constructed for individuals within different groups of
dietary data files. To produce estimates appropriately adjusted for survey non-response, it is important to check all of the
variables in your analysis and select the weight variable that applies to all members of the smallest analysis
subpopulation. The following table lists the types of sample weights by decreasing sample size.

Sample Weights and Data Files

Sample Component Data file(s) Containing Sample Weight Variable
Size Weight Variable
Questionnaire, WTINT2YR
Largest supplement use Demographic WTMEC2YR
MEC WTDRD1
▼ Demographic
Individual Foods; Total Nutrient WTDRD2
▼ 24-hour recall (Day 1) Intakes Varies, depending on
Individual Foods; Total Nutrient the subsample
▼ 24-hour recall (Day 1 Intakes
Smallest and 2)
Varies
Special subsamples
(e.g. fasting)

For example, if you wish to combine data from the Day 1 24-hour recall data and the supplement use section (collected
during the household interview), you must restrict your analysis to individuals with dietary recall data. To do this, you
designate that persons with a non-zero dietary recall Day 1 sample weight value (variable WTDRD1 in the Individual
Foods and Total Nutrient Intakes files) be included in the analysis. In doing so, you will be analyzing the supplement use
data for individuals who completed the first 24-hour recall.

Examples

Other examples for selecting the correct sampling weights for your analysis are included below:

All of the variables in the analysis were collected in the in-home interview

Some of the variables in the analysis were collected in the MEC

Some of the variables in this analysis were collected in the MEC, but have special circumstances

Some of the variables in the analysis were collected as part of a subsample

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/task4.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Locate Variables: Task 4

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/task4.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial:Locate Variables: Identify Sample Weights and File Locations

Print Text!

Task 4: Key Concepts about Identifying Correct Sample Weights and File
Locations

A sample weight value is a measure of the number of people in the population represented by a specific sample person in
NHANES. Sample weights are constructed for each two-year survey cycle to account for over-sampling, survey non-
response, and post-stratification. Because not all sampled persons completed all portions of the survey, each individual
represented in a public release file may have different sample weights assigned, depending on the nature of the non-
response adjustments necessary. When using weights, it is important to select the correct weight variable (i.e. either from
the Demographic file or from one of the other data files), namely the variable that applies to all members of the smallest
analysis subpopulation.

IMPORTANT NOTE

Depending on the individuals with data for the particular variables you want to use in your dietary analysis, you will need to
choose from several different types of sample weights. Sample weights and information about the type to use for a
particular dataset are located within the files and documentation of that dataset.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Locate/Info4.htm 1/1

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Preparing/Download/intro.htm

Download Data Files

Purpose

Throughout, this tutorial uses examples related to calcium and milk to illustrate the principles and instructions presented in
the modules. We have created three full programs for these examples, entitled “Milk,” “Food Sources,” and “Supplement.”
These programs encompass all the steps involved in preparing an analytic dataset and conducting the various analyses covered
in this tutorial. There are is an additional fourth program, entitled “Outliers,” that is used as an example in Module 9. These
full programs can be found in the Additional Resources section of the tutorial. However, starting with this module and
continuing through the end of the course, shorter snippets of code can be found in most of the “How To” sections to illustrate
the topic being discussed.

In order to create and use these programs, you need to first download data files. NHANES dietary data files are available for
download from the website as SAS transport files (.XPT). The SAS transport format allows data files to be extracted on
Windows-, UNIX-, or Macintosh-based systems (this tutorial focuses only on Windows-based systems). To use these transport
files, you need to create folders (#1) in which to save them, download the data files and documentation (#2) , and then extract
and save NHANES data files in a SAS-accessible library (#3) .

IMPORTANT NOTE
Although SAS is not required for mastering the content of this tutorial, it does need to be installed if you plan to download and
use the data files on your own computer to run the example code provided.

Task 1: Create Folders

You will create folders to save your data files, documentation, and extracted SAS datasets. This folder structure will help keep
your transport files and datasets organized.

Key Concepts about Creating Folders (/nchs/tutorials/Dietary/Preparing/Download/Info1.htm)
How to Create Folders (/nchs/tutorials/Dietary/Preparing/Download/task1.htm)

Task 2: Download Data Files and Documentation

You will need to download the NHANES data transport files to create your analytic dataset, and you will need the
documentation for reference as you conduct your analysis.

Key Concepts about Downloading Data Files and Documentation (/nchs/tutorials/Dietary/Preparing/Download/Info2.htm)
How to Download Data Files and Documentation (frame2.htm)

Task 3: Extract and Save NHANES Data Files in a SAS-Accessible Library

After downloading the SAS transport files, you will need to extract the data and save them as SAS datasets in a SAS-accessible
library. A library is a folder that you designate on your computer to store your SAS files. Transport files are not usable without
completing this task.

Key Concepts about Extracting and Saving NHANES Data Files in Permanent Libraries (/nchs/tutorials/Dietary/Preparing
/Download/Info3.htm)
How to Extract and Save NHANES Data Files in a SAS-Accessible Library (/nchs/tutorials/Dietary/Preparing/Download
/Task3.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:17 PM

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Create Folders

Print Text!

Task 1: Key Concepts about Creating Folders

In this task, you will create folders that will hold your SAS transport files and documentation, and your permanent SAS
dataset library. The folder structure recommended in this task will be used throughout the tutorial. If you decide to use a
different arrangement of folders, then you will need to update the sample programs by changing the path to the dataset.

You will need to include the path to the dataset you wish to use in every program. By creating a folder directly on the C:
drive, you won't have a long file path to type or remember.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Download/Info1.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Task 1

Print Text!

Task 1: How to Create Folders

Here are the steps for creating folders:

Create an NHANES subfolder
Create a DOWNLOAD subfolder inside the NHANES folder for the documentation and transport files
Create a DATA subfolder inside the NHANES folder for the extracted SAS datasets

Screenshot of Suggested Folder Structure

Step 1: Create an NHANES subfolder

Go to your C: drive by opening Windows Explorer or the My Computer icon on your desktop. Create a new folder in your
C: drive and name it NHANES.

Step 2: Create a DOWNLOAD Folder

Double-click the NHANES folder and create the DOWNLOAD folder inside it. This folder will store your SAS transport files
and documentation. If you follow this structure, the path to your transport files and documentation will be

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Download/task1.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Task 1

C:/NHANES/DOWNLOAD.

Step 3: Create a DATA folder

Return to the NHANES folder and create another folder inside it called DATA. This folder will store your extracted SAS
datasets and act as your permanent SAS library. If you follow this structure, the path to your library will be
C:/NHANES/DATA.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Download/task1.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Download Data Files and Documentation

Print Text!

Task 2: Key Concepts about Downloading Data Files and Documentation

You will need NHANES data files to create your analytic dataset, and you will need the documentation for reference as you
conduct your analysis. This is the case regardless of the types of dietary data you wish to use in your analysis.

NHANES data are saved in SAS transport files (.XPT). The SAS transport format allows the data files to be extracted on
Windows-, UNIX-, or Macintosh-based systems.

IMPORTANT NOTE

The steps for downloading data files and documentation are described in the “How To” section of this task. For the
purposes of this tutorial, it is only necessary to download the data files and documentation if you plan to run the sample
programs on your own computer..

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Download/Info2.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Task 2

Print Text!

Task 2: How to Download Data Files and Documentation

Downloading data files and documentation from the NHANES website requires the same steps no matter which files you
decide to use for your analysis. Generally, you will need to download:

SAS transport file of data,
Documentation, Codebook, and Frequency Tables.

IMPORTANT NOTE

The Documentation for the dietary recall files also contains the Codebook and Frequency tables. Previously, these items
are found in separate files.

Step 1: Download SAS Transport File

Here, you are interested in the dietary recall data, so you will use the Total Nutrient Intakes – First Day file (DR1TOT_C) to
illustrate this step. However, you can apply these steps to any dietary data files. Remember that when doing Step 1 and
Step 2 on your own, you will need to download only the data files and documentation relevant to your analysis.

Begin at the NHANES homepage (shown here on the right) and click the Questionnaires, Data Sets and Related
Documentation link. Click the NHANES 2003-2004 link. Scroll down and click the appropriate link in the Data,
Documentation, Codebooks, SAS Code section. In this case, click the Dietary link. From the 2003-2004 Dietary page,
you can download the data file, codebook, documentation, and frequency tables for any publicly released item in the
Dietary data files. Scroll down to the Dietary Interview items and right-click the Data link to download, or save (not open),
the SAS transport files to the designated folder on your computer.

To recreate the three example datasets, entitled “Milk,” “Food Sources,” and “Supplement,” that we will use throughout this
tutorial, you will need to download the NHANES data and documentation from the files listed in the table below. Note that
files that end in “_B” are from NHANES 2001-2002. Files that end in “_C” are from 2003-2004. Use the steps outlined
above to download these files.

Additional Data and Files Needed to Create Dataset for Tutorial

To create the Milk dataset, you will need the:
demographic file, DEMO_C; and the
dietary files, DR1TOT_C and DR1IFF_C.

To create the Food Sources dataset, you will need the:
demographic file, DEMO_B and DEMO_C; and the
dietary files, DRXIFF_B AND DR1IFF_C

To create the Supplement Dataset, you will need the:
demographic file, DEMO_C;
questionnaire file, OSQ_C;
dietary files, DSQ1_C, DSQ2_C; and the
dietary supplement database files, DSBI, DSII, and DSPI.

https://www.cdc.gov/nchs/tutorials/dietary/Preparing/Download/frame2.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Task 2

IMPORTANT NOTE
For more information about USDA Food Files, please see the Resources for Food Intake Analysis module.

Step 2: Download Documentation, Codebook, and Frequency Tables

In NHANES 2003-2004, the survey documentation, codebook, and frequency tables are in an HTML file. Remaining on the
Dietary Interview (Total Nutrient Intakes – First Day) line:
Right-click the Docs link to download (not open) the documentation, codebook, and frequency tables to the DOWNLOAD
folder.
If you are downloading these documents from earlier survey cycles, repeat this process with the individual documentation,
codebook, and frequency tables files.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/dietary/Preparing/Download/frame2.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Extract and Save NHANES Data Files in a SAS-Accessible Library

Print Text!

Task 3: Key Concepts about Extracting and Saving NHANES Data Files in a
SAS-Accessible Library

In the preceding task, you save SAS transport files (.XPT) of NHANES data in your DOWNLOAD folder. The transport
files are created by the XPORT engine. You will use the XPORT engine and the PROC COPY procedure to convert the
datasets from the SAS transport format to the standard SAS dataset format so they are useable for analysis.

To extract the file, assign one library name (LIBNAME) to the transport file and another to the location where you want the
SAS-accessible data file to be saved. Include the XPORT statement in line of SAS code that has the libname of the
transport file. Then, use the PROC COPY procedure to save the transport file's libname IN and the SAS-accessible
dataset's libname OUT.

After you've run the code, check your library (if you followed Task 1 it would be C:/NHANES/DATA) to see that the dataset
is now there.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Download/Info3.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Task 3

Print Text!

Task 3: How to Extract and Save NHANES Data Files in a SAS-Accessible
Library

Extracting an SAS transport file and saving it to a SAS-accessible library involves three steps:

Assign LIBNAME
Save to SAS-accessible library
Check results

Step 1: Assign LIBNAME

Assign a LIBNAME to each SAS transport file being downloaded. In this example, the transport files, which will be used in
sample programs throughout the tutorial, are stored in the DOWNLOAD folder, created in Task 2 of this module. The
extension “_b” is used to denote files from the 2001-2002 survey cycle and the extension “_c” is used to denote files from
the 2003-2004 survey cycle.

The XPORT statement tells SAS to extract the data from the transport file, using the XPORT engine, into a SAS-
accessible format. Remember to surround the pathnames with quotation marks.

Finally, assign a LIBNAME (NH) to the C:\NHANES\DATA folder. This is where the permanent datasets will be saved on
your computer.

Program to Assign LIBNAMES

Sample Code

libname XPDB xport "c:\nhanes\download\demo_b.xpt" ;
libname XPDC xport "c:\nhanes\download\demo_c.xpt" ;
libname XPIB xport "c:\nhanes\download\drxiff_b.xpt" ;
libname XPTB xport "c:\nhanes\download\dr1tot_b.xpt" ;
libname XPTC xport "c:\nhanes\download\dr1tot_c.xpt" ;
libname XPIC xport "c:\nhanes\download\dr1iff_c.xpt" ;
libname XPOC xport "c:\nhanes\download\osq_c.xpt" ;
libname XPSA xport "c:\nhanes\download\dsq1_c.xpt" ;
libname XPSB xport "c:\nhanes\download\dsq2_c.xpt" ;
libname XPSC xport "c:\nhanes\download\DSPI.xpt" ;
libname XPSD xport "c:\nhanes\download\DSII.xpt" ;
libname NH "c:\nhanes\data" ;

Step 2: Save to a SAS-Accessible Library

Use the PROC COPY statement to save the extracted datasets from the transport file to the C:\NHANES\DATA folder.
Use IN to denote the folder where the dataset is temporarily being stored and use OUT to denote the library where the
dataset will be saved on your computer in a SAS-accessible format.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Download/Task3.htm 1/3

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Task 3

This procedure must be run for each of the downloaded data files. The output folder is always the same but the input file
varies for each run.

Program to Save an Extracted Data File

Sample Code

proc copy in =XPDB out =NH;
run ;

proc copy in =XPDC out =NH;
run ;

proc copy in =XPIB out =NH;
run ;

proc copy in =XPTB out =NH;
run ;

proc copy in =XPTC out =NH;
run ;

proc copy in =XPIC out =NH;
run ;

proc copy in =XPOC out =NH;
run ;

proc copy in =XPSA out =NH;
run ;

proc copy in =XPSB out =NH;
run ;

proc copy in =XPSC out =NH;
run ;

proc copy in =XPSD out =NH;
run ;

Step 3: Check Results

To check the results of your program, open Windows Explorer and go to your C:\NHANES\DATA folder. You should see the
downloaded files in that folder. You now have these files saved to your computer.

You can also run a proc contents statement in SAS to check that your dataset contains the correct number of observations
and variables, based on the documentation.

Program to Check Datasets Contents

Sample Code

proc contents data = "C:\NHANES\DATA\demo_b" ;
run ;

proc contents data = "C:\NHANES\DATA\demo_c" ;
run ;

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Download/Task3.htm 2/3

12/19/2018 NHANES Dietary Web Tutorial: Download Data Files: Task 3

proc contents data = "C:\NHANES\DATA\drxiff_b" ;
run ;

proc contents data = "C:\NHANES\DATA\dr1tot_b" ;
run ;

proc contents data = "C:\NHANES\DATA\dr1tot_c" ;
run ;

proc contents data = "C:\NHANES\DATA\dr1iff_c" ;
run ;

proc contents data = "C:\NHANES\DATA\osq_c";
run ;

proc contents data = "C:\NHANES\DATA\dsq1_c" ;
run ;

proc contents data = "C:\NHANES\DATA\dsq2_c" ;
run ;

proc contents data = "C:\NHANES\DATA\DSPI" ;
run ;

proc contents data = "C:\NHANES\DATA\DSII" ;
run ;

Output of Program

Click here to view program output

Note that datasets saved in this SAS-accessible library can be used between SAS sessions. This is in contrast to a
dataset used within a SAS session, which is deleted when the session ends.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/Download/Task3.htm 3/3

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Preparing/MergeAppend/intro.htm

Merge & Append Datasets

Purpose

Once you have downloaded data files into a SAS-accessible library, you can begin to manipulate them for your particular
analysis. Typically, an NHANES dietary dataset used for analysis will include data from 2 or more years and variables from
more than one type of data file. You will need to merge (#1) the data to include variables from different dietary files and append
(#2) the data to combine the years of data.

Task 1: Merge NHANES Data

Data files are generally organized by the following categories: Demographic, Dietary, Examination, Laboratory, and
Questionnaire. To allow for timelier releases, different data files are released at different times as they are completed.
Combining variables from these different data files into one dataset is called merging.

Key Concepts about Merging Data in NHANES (/nchs/tutorials/Dietary/Preparing/MergeAppend/Info1.htm)
How to Merge Data in NHANES (/nchs/tutorials/Dietary/Preparing/MergeAppend/Task1.htm)

Task 2: Append NHANES Data

NHANES data files are released for public use in 2-year cycles. You may wish to combine multiple years, add additional
observations, or combine different years of data files on the same variables. The process of combining years is called
appending. This is similar to adding rows to a table.

Key Concepts about Appending Data in NHANES (/nchs/tutorials/Dietary/Preparing/MergeAppend/Info2.htm)
How to Append Data in NHANES (/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:18 PM

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Merge Data

Print Text!

Task 1: Key Concepts about Merging Data in NHANES

For each 2-year cycle, NHANES data files are organized into five types of files: Demographic, Dietary, Examination,
Laboratory, and Questionnaire. To allow for timelier releases, different data files are released at different times as they are
completed. Combining variables from these different data files in a dataset is called merging. This is similar to adding
columns to a table.

To merge data, the variables must be linked in terms of a unique identifier. Because almost all analyses are conducted with
individuals as the unit of analysis, the most frequently used unique identifier in NHANES is SEQN, the sequence number
that identifies each participant in the sample. Whenever you conduct an analysis with individuals, SEQN is the variable
you must use to merge data files.

In contrast, because the dietary supplement data files contain data about the supplements themselves as well as about
individuals, the unique identifier can be the SEQN, the supplement ID number, or the ingredient ID number, depending on
which files you wish to merge.

Before merging data, you need to sort each data file by the SEQN variable or other unique identifier. This will ensure that
all records are ordered in the same way in each data file. Use the PROC SORT procedure in SAS to sort the data. After
sorting the data files, you can continue merging.

An important point to remember is that dietary supplement data and dietary recall data in the Individual Foods Files contain
multiple records per person. This will become apparent when you check the results of your merge statements because
you may notice many more records than the number of people in your sample.

After you have merged the data files, it is advisable that you check the contents again to make sure that the files merged
correctly. Use the PROC CONTENTS statement to list all variable names and labels and use the PROC MEANS
statement to check the number of observations, as well as missing, minimum, and maximum values, for each variable.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Info1.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Dataset: Task 1

Print Text!

Task 1: How to Merge NHANES Dietary Data

Here are the steps for merging NHANES dietary data:

Sort data files by unique identifier
Merge data by unique identifier
Check the results

Step 1: Sort Data Files by Unique Identifier

The first step in merging data is to sort each of the data files by a unique identifier. For example, if you wish to conduct an
analysis with dietary recall data, sort the data by SEQN. If you wanted to conduct an analysis of supplement data, sort
your data files by SEQN, supplement ID number, or ingredient ID depending on which supplement files you wish to merge
to construct your database.

Use the PROC SORT procedure in SAS to sort the data.

Step 2: Merge Data by Unique Identifier

After sorting the data files, you can continue merging the data using the MERGE statement. Remember that merging, as
well as sorting, is done using a unique identifier (the SEQN or other identifier).

A second important thing to remember when merging is to consider the number of records per person in each of the data
files you will be using. If you merge individual food or supplement files, remember that you will have multiple records per
person or supplement because of the nature of the data files. If you merge total nutrient intake or other individual-level
data files, you will have only one record per person. These different situations are demonstrated in the four examples
below.

Step 3: Check the Results

After you have merged the data files, it is advisable that you check the contents again to make sure that the files merged
correctly. Use the PROC CONTENTS procedure to list all variable names and labels; use the PROC MEANS procedure to
check the number of observations for each variable as well as missing, minimum, and maximum values.

IMPORTANT NOTE

When you check your results, in situations when you are merging datasets with one record per person with datasets with
multiple records per person, you will find that the resulting number of records will be greater than the number of people in
your sample. This is to be expected.

Examples

The examples provided in the links below demonstrate different scenarios that you may encounter when merging dietary
data files. These scenarios use two of the tutorial’s three sample programs – Milk and Supplement – and show how to
merge data depending on the number of records per person in the data file, as follows: )

One-to-one merge using SEQN as the unique identifier (Example 1)
One-to-many merge using SEQN as the unique identifier (Example 2)
One-to-many merge using supplement ID number as the unique identifier (Example 3)
Many-to-many merge using supplement ID number as the unique identifier (Example 4)

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Task1.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Dataset: Task 1

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Task1.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets:Append Data

Print Text!

Task 2: Key Concepts about Appending Data in NHANES

Each 2-year cycle of NHANES and any combination of 2-year cycles is a nationally representative sample. However, in
some situations, such as when estimating the average serving size of a rarely consumed food, the sample for a single 2-
year cycle is too small to produce statistically reliable estimates. The NHANES sample design makes it possible to
combine data from multiple survey cycles to increase the sample size for an analysis. Increased sample size improves the
statistical power, reliability, and stability of estimates for population sub-domains including racial and ethnic groups and
results for rare events.

The process of combining data for multiple survey cycles or years is called appending. This is similar to adding rows to a
table.

Always check the contents of each data file before appending the data files because some components or questions are
not collected in every survey cycle. For example, food frequency data are collected only in the 2003-04 and 2005-06
cycles. In addition, variable names may be different from cycle to cycle, and recoded or derived variables may be added in
different cycles.

If the names or labels of the variables of interest are identical in the selected cycles, you can append the data files
directly.

If the variables of interest have changed, you will need to evaluate the differences in the wording of the question,
definitions, and response choices that were used during data collection. You may need to recode the variables
before the files can be appended

NHANES adds or deletes survey items from time to time. If the added or deleted variables are not relevant to your
analysis, you can simply append the data files as described and use only the variables of interest for your analysis. The
extra variables will not affect your analysis if you do not include them in your dataset.

IMPORTANT NOTE

When extracting variables from an NHANES data file or appending NHANES data you should always include the SEQN
variable, which is the unique identifier for each participant in NHANES. Failing to include this variable in your dataset will
lead to problems when you sort or merge your data files at a later time.

When combining two or more 2-year cycles of the continuous NHANES for NHANES 2001-2002 and beyond, you must
create a new weight variable, by summing rescaled versions of the existing weight variables, before beginning any
analyses. When survey cycles are combined, the estimates weighted with the new variable will be representative of the
population at the midpoint of the combined survey period. The new weight variable simply rescales the values of the
weight variables from the separate cycles so that the sum of the new weights matches the survey population size at the
midpoint of that period.

When combining data cycles, it is extremely important to:

1. verify that data items collected in all combined years are comparable in wording and methods, and
2. select the same type of sample weight from each cycle when constructing the new weight variable in the combined

data set.

For more information about determining the compatibility of datasets, please see the Locate Variables and Structure &
Contents modules.

IMPORTANT NOTE

Because the data collection protocol changed significantly in 2002, it is recommend that you not combine dietary recall
data from survey cycles before 2001-2002 with data from subsequent cycles.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Info2.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets:Append Data

After appending the data, you will need to check the results. You should check that all your variables of interest were
included and that any variables you renamed or recoded are correct and include all the years of data.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Info2.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Task 2

Print Text!

Task 2: How to Append NHANES Dietary Data

Here are the steps to appending NHANES data:

Compare variable names and labels
Append directly, if variables are identical
Rename variables and/recode variables before appending, if variables are different
Construct weights for NHANES analyses across multiple survey cycles
Check the results

Step 1: Compare Variable Names and Labels

The first step before appending data is to examine the contents of the data files. Using the PROC CONTENTS procedure,
you can get a list of variable names and variable labels for each data file selected. While reviewing the output of the PROC
CONTENTS procedure, you should compare variable names and labels to see whether any changes or differences
occurred from cycle to cycle.

The example below uses the sample "Food Sources" program. Notice that the variable labels for “Calcium (mg)” are the
same between 2001-2002 and 2003-2004, but the variable names are different. Additionally, a comparison of the
documentation for vitamins A and E between 2001-2002 and 2003-2004 shows that although the variable names remain
the same, the units of measure are different, and only careful examination of the documentation would allow you to detect
this change. It is important to check whether the variable names and labels are consistent between datasets before
appending.

Program to Check Datasets' Contents and Compare Variable Names and Labels

Sample Code

*---------------------------------------------------------;

* Use the LIBNAME statement to refer to the folder where ;

* the data files are stored. ;

*;

* Use the PROC CONTENTS procedure to list the contents ;

* of each dataset ;

* 2001-2002 Dietary Interview (Individual Foods File) ;

* Examination File ;

* 2003-3004 Dietary Interview (Individual Foods File) ;

* Examination File;

* 2001-2002 Demographic File ;

* 2003-2004 Demographic File ;

*;

* Use the VARNUM option to list the variables according ;

* to their position in the dataset. ;

*--------------------------------------------------------;

libname NH "C:\NHANES\DATA" ;
proc contents data =NH.DRXIFF_B varnum ;
proc contents data =NH.DR1IFF_C varnum ;
proc contents data =NH.DEMO_B varnum ;
proc contents data =NH.DEMO_C varnum ;
run ;

Output of Program

Click here to view program output and highlights

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm 1/4

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Task 2

IMPORTANT NOTE

Most dietary variables from 2001-2002 begin with the prefix DRXT and most dietary variables from the 2003-2004 begin
with the prefix DR1T (for Day 1 data) or DR2T (for Day 2 data). Because these variables are continuous (as opposed to
categorical), you can simply rename them to make them identical.

Step 2: Append Directly, If Variables are Identical

After carefully reviewing the demographic files, you will find that the variables of interest in the two cycles remain the same.
Therefore, you can directly append without any further changes.

Because you are interested only in a subset of the variables, you can use the KEEP option statement to select relevant
variables.

IMPORTANT NOTE

When appending NHANES data you should always include the sequence number (SEQN). Failing to do so will lead to
problems if you want to sort or merge your data files at a later time.

As a reminder, the sample code below is taken from the "Food Sources" program. No output is associated with this
procedure, so you will need to check the SAS log file to make sure that the procedure was completed successfully.
Additionally, you can use SAS Explorer to see that the new 4-year dataset (DEMO_4YR) is in your WORK library, which is
the default temporary library created for each SAS session. This library is deleted when the SAS session is complete. (To
find out how to save the dataset to a SAS-accessible library, see the Save a Dataset module.)

Program to Directly Append Datasets

Sample Code

*-------------------------------------------------------------------------;

* The DATA step creates a dataset for your 4 years of demographic data ;

* (DEMO_4YR). ;

*;

* The SET statement appends the 2003-2004 demographic data file ;

* (NH.DEMO_C) to the 2001-2002 demographic data file (NH.DEMO_B). ;

*;

* The KEEP statement selects the variables of interest. Notice that ;

* in the keep statement, the variable, sequence number (SEQN) is ;

* included. This variable should be included when datasets are appended. ;

*;

* The SDMVPSU and SDMVSTRA variables are included in the dataset in order ;

* to incorporate survey design information in later analyses. ;

*;

* Note that WTMEC2YR is the weight variable for all persons examined in ;

* the MEC and is appropriate for use with dietary recall data. Weights ;

* must be used in order for your analysis to be generalizable to the ;

* total population. For more information on weighting, see the Overview ;

* of NHANES Survey Design and Weights module in the NHANES Dietary Data ;

* Survey Orientation Course. ;

*-------------------------------------------------------------------------;

data DEMO_4YR;
set NH.DEMO_B (keep=SEQN RIDAGEYR SDMVPSU SDMVSTRA)
NH.DEMO_C (keep=SEQN RIDAGEYR SDMVPSU SDMVSTRA);

run ;

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm 2/4

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Task 2

Step 3: Rename Variables and/or Recode Variables Before Appending, If Variables are Different

If the variables in your datasets differ, you will need to rename and/or recode them before you append them. For example,
the 2001-2002 total nutrient intake files contain variables that were renamed in 2003-2004. Therefore, if you append files
from these survey cycles, you will need to rename the variable first and then append the data. If the response categories
of the variables are different, you will also need to recode.

You will see in the sample code from the "Food Sources" program that the variables DRDDRSTZ, DRXICALC, and
DRDIFDCD in the 2001-2002 individual food file were renamed to DR1DRSTZ, DR1ICALC, and DR1IFDCD, respectively,
the same as the variable names in the 2003-2004 data file. After renaming the 2001-2002 variables, you will be ready to
append the data files with selected variables of interest.

Program to Rename Variables and Append

Sample Code

*-------------------------------------------------------------------------;

* The DATA step creates the dataset for your 4 years of dietary data ;

* (IFF_4YR). ;

*;

* The KEEP statement includes only variables of interest in your dataset. ;

*;

* The SET statement appends the 2003-2004 dietary nutrient data file ;

* (NH.DR1IFF_C) to the 2001-2002 dietary nutrient data file (NH.DRXIFF_B).;

*;

* The RENAME statement renames the variables DRDDRSTZ, DRXICALC, and ;

* DRDIFDCD in the 2001-2002 dietary nutrient data file to DR1DRSTZ, ;

* DR1ICALC, and DR1IFDCD, which are the names given to the same variables ;

* in the 2003-2004 dietary nutrient data file. ;

*-------------------------------------------------------------------------;

data IFF_4YR (keep=DR1IFDCD WTDRD1 DR1ICALC SEQN DR1DRSTZ);
set NH.DRXIFF_B (rename=(DRDDRSTZ=DR1DRSTZ DRXICALC=DR1ICALC
DRDIFDCD=DR1IFDCD))
NH.DR1IFF_C;

run ;

No output is associated with this procedure, so you will need to check the SAS log file to make sure that the procedure
completed successfully. Additionally, you can use SAS Explorer to see that the new 4-year dataset (IFF_4YR) is in your
WORK library.

Step 4: Construct Weights for NHANES Analyses across Multiple Survey Cycles

In general, when combining multiple survey cycles, the basic sample weight variable for each cycle should be divided by
the number of cycles in the combined data set. Then, these rescaled weights can be summed to form a new weight for the
combined survey cycles. The following examples show how to construct weights for multiple survey cycles for NHANES
2001-2002 and beyond.

Combining 2001-2002 and 2003-2004 to Produce a 4-Year Dataset 3/4
For 4 years of data from 2001-2004, construct a weight variable as follows:
Sample Code
if SDDSRVYR=2 or SDDSRVYR=3 then MEC4YR = WTMEC2YR/2;

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Task 2

Combining 2001-2002, 2003-2004, and 2005-2006 to Obtain 6 Years of Data
For 6 years of data from 2001-2006, construct a weight variable as follows:
Sample Code
if SDDSRVYR in (2,3,4) then MEC6YR = WTMEC2YR/3;

IMPORTANT NOTE

Certain survey components were completed on subsamples, which have subsample sample weights. Subsample weights
are not designed to be combined. In fact, many subsamples are mutually exclusive. If it is necessary to combine two or
more subsamples for your analyses, then appropriate weights would need to be recalculated. However, details on how to
recalculate weights when combining subsamples are beyond the scope of this tutorial. Therefore, it is strongly advised
that you do not attempt to combine subsamples in any analysis.

Step 5: Check Results

After appending the data files, it is a good idea to check the contents again to make sure that the files were appended
correctly. Use the PROC CONTENTS procedure, as demonstrated in Step 1, to check the combined files. Consult the
Program to Check Datasets' Contents and Compare Variable Names and Labels, above, for further instruction, if
necessary.

Double check variable names and labels, and make sure that variables are renamed correctly. Pay special attention to the
number of observations in the combined dataset, which should be the sum of the observations in the two data files.

Output of Program

Click here to view program output and highlights

View animation of program and output
Can't view the demonstration? Try our Tech Tips for troubleshooting help.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm 4/4

NHANES Dietary Web Tutorial - Survey Orientation https://www.cdc.gov/nchs/tutorials/dietary/Preparing/MergeAppend/intro.htm

Merge & Append Datasets

Purpose

Once you have downloaded data files into a SAS-accessible library, you can begin to manipulate them for your particular
analysis. Typically, an NHANES dietary dataset used for analysis will include data from 2 or more years and variables from
more than one type of data file. You will need to merge (#1) the data to include variables from different dietary files and append
(#2) the data to combine the years of data.

Task 1: Merge NHANES Data

Data files are generally organized by the following categories: Demographic, Dietary, Examination, Laboratory, and
Questionnaire. To allow for timelier releases, different data files are released at different times as they are completed.
Combining variables from these different data files into one dataset is called merging.

Key Concepts about Merging Data in NHANES (/nchs/tutorials/Dietary/Preparing/MergeAppend/Info1.htm)
How to Merge Data in NHANES (/nchs/tutorials/Dietary/Preparing/MergeAppend/Task1.htm)

Task 2: Append NHANES Data

NHANES data files are released for public use in 2-year cycles. You may wish to combine multiple years, add additional
observations, or combine different years of data files on the same variables. The process of combining years is called
appending. This is similar to adding rows to a table.

Key Concepts about Appending Data in NHANES (/nchs/tutorials/Dietary/Preparing/MergeAppend/Info2.htm)
How to Append Data in NHANES (/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm)

Page last updated: May 3, 2013
Page last reviewed: May 3, 2013
Content source: CDC/National Center for Health Statistics
Page maintained by: NCHS/NHANES

Centers for Disease Control and Prevention 1600 Clifton Road Atlanta, GA 30329-4027, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO

1 of 1 1/14/2019, 9:18 PM

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Merge Data

Print Text!

Task 1: Key Concepts about Merging Data in NHANES

For each 2-year cycle, NHANES data files are organized into five types of files: Demographic, Dietary, Examination,
Laboratory, and Questionnaire. To allow for timelier releases, different data files are released at different times as they are
completed. Combining variables from these different data files in a dataset is called merging. This is similar to adding
columns to a table.

To merge data, the variables must be linked in terms of a unique identifier. Because almost all analyses are conducted with
individuals as the unit of analysis, the most frequently used unique identifier in NHANES is SEQN, the sequence number
that identifies each participant in the sample. Whenever you conduct an analysis with individuals, SEQN is the variable
you must use to merge data files.

In contrast, because the dietary supplement data files contain data about the supplements themselves as well as about
individuals, the unique identifier can be the SEQN, the supplement ID number, or the ingredient ID number, depending on
which files you wish to merge.

Before merging data, you need to sort each data file by the SEQN variable or other unique identifier. This will ensure that
all records are ordered in the same way in each data file. Use the PROC SORT procedure in SAS to sort the data. After
sorting the data files, you can continue merging.

An important point to remember is that dietary supplement data and dietary recall data in the Individual Foods Files contain
multiple records per person. This will become apparent when you check the results of your merge statements because
you may notice many more records than the number of people in your sample.

After you have merged the data files, it is advisable that you check the contents again to make sure that the files merged
correctly. Use the PROC CONTENTS statement to list all variable names and labels and use the PROC MEANS
statement to check the number of observations, as well as missing, minimum, and maximum values, for each variable.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Info1.htm 1/1

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Dataset: Task 1

Print Text!

Task 1: How to Merge NHANES Dietary Data

Here are the steps for merging NHANES dietary data:

Sort data files by unique identifier
Merge data by unique identifier
Check the results

Step 1: Sort Data Files by Unique Identifier

The first step in merging data is to sort each of the data files by a unique identifier. For example, if you wish to conduct an
analysis with dietary recall data, sort the data by SEQN. If you wanted to conduct an analysis of supplement data, sort
your data files by SEQN, supplement ID number, or ingredient ID depending on which supplement files you wish to merge
to construct your database.

Use the PROC SORT procedure in SAS to sort the data.

Step 2: Merge Data by Unique Identifier

After sorting the data files, you can continue merging the data using the MERGE statement. Remember that merging, as
well as sorting, is done using a unique identifier (the SEQN or other identifier).

A second important thing to remember when merging is to consider the number of records per person in each of the data
files you will be using. If you merge individual food or supplement files, remember that you will have multiple records per
person or supplement because of the nature of the data files. If you merge total nutrient intake or other individual-level
data files, you will have only one record per person. These different situations are demonstrated in the four examples
below.

Step 3: Check the Results

After you have merged the data files, it is advisable that you check the contents again to make sure that the files merged
correctly. Use the PROC CONTENTS procedure to list all variable names and labels; use the PROC MEANS procedure to
check the number of observations for each variable as well as missing, minimum, and maximum values.

IMPORTANT NOTE

When you check your results, in situations when you are merging datasets with one record per person with datasets with
multiple records per person, you will find that the resulting number of records will be greater than the number of people in
your sample. This is to be expected.

Examples

The examples provided in the links below demonstrate different scenarios that you may encounter when merging dietary
data files. These scenarios use two of the tutorial’s three sample programs – Milk and Supplement – and show how to
merge data depending on the number of records per person in the data file, as follows: )

One-to-one merge using SEQN as the unique identifier (Example 1)
One-to-many merge using SEQN as the unique identifier (Example 2)
One-to-many merge using supplement ID number as the unique identifier (Example 3)
Many-to-many merge using supplement ID number as the unique identifier (Example 4)

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Task1.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Dataset: Task 1

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Task1.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets:Append Data

Print Text!

Task 2: Key Concepts about Appending Data in NHANES

Each 2-year cycle of NHANES and any combination of 2-year cycles is a nationally representative sample. However, in
some situations, such as when estimating the average serving size of a rarely consumed food, the sample for a single 2-
year cycle is too small to produce statistically reliable estimates. The NHANES sample design makes it possible to
combine data from multiple survey cycles to increase the sample size for an analysis. Increased sample size improves the
statistical power, reliability, and stability of estimates for population sub-domains including racial and ethnic groups and
results for rare events.

The process of combining data for multiple survey cycles or years is called appending. This is similar to adding rows to a
table.

Always check the contents of each data file before appending the data files because some components or questions are
not collected in every survey cycle. For example, food frequency data are collected only in the 2003-04 and 2005-06
cycles. In addition, variable names may be different from cycle to cycle, and recoded or derived variables may be added in
different cycles.

If the names or labels of the variables of interest are identical in the selected cycles, you can append the data files
directly.

If the variables of interest have changed, you will need to evaluate the differences in the wording of the question,
definitions, and response choices that were used during data collection. You may need to recode the variables
before the files can be appended

NHANES adds or deletes survey items from time to time. If the added or deleted variables are not relevant to your
analysis, you can simply append the data files as described and use only the variables of interest for your analysis. The
extra variables will not affect your analysis if you do not include them in your dataset.

IMPORTANT NOTE

When extracting variables from an NHANES data file or appending NHANES data you should always include the SEQN
variable, which is the unique identifier for each participant in NHANES. Failing to include this variable in your dataset will
lead to problems when you sort or merge your data files at a later time.

When combining two or more 2-year cycles of the continuous NHANES for NHANES 2001-2002 and beyond, you must
create a new weight variable, by summing rescaled versions of the existing weight variables, before beginning any
analyses. When survey cycles are combined, the estimates weighted with the new variable will be representative of the
population at the midpoint of the combined survey period. The new weight variable simply rescales the values of the
weight variables from the separate cycles so that the sum of the new weights matches the survey population size at the
midpoint of that period.

When combining data cycles, it is extremely important to:

1. verify that data items collected in all combined years are comparable in wording and methods, and
2. select the same type of sample weight from each cycle when constructing the new weight variable in the combined

data set.

For more information about determining the compatibility of datasets, please see the Locate Variables and Structure &
Contents modules.

IMPORTANT NOTE

Because the data collection protocol changed significantly in 2002, it is recommend that you not combine dietary recall
data from survey cycles before 2001-2002 with data from subsequent cycles.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Info2.htm 1/2

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets:Append Data

After appending the data, you will need to check the results. You should check that all your variables of interest were
included and that any variables you renamed or recoded are correct and include all the years of data.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/Info2.htm 2/2

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Task 2

Print Text!

Task 2: How to Append NHANES Dietary Data

Here are the steps to appending NHANES data:

Compare variable names and labels
Append directly, if variables are identical
Rename variables and/recode variables before appending, if variables are different
Construct weights for NHANES analyses across multiple survey cycles
Check the results

Step 1: Compare Variable Names and Labels

The first step before appending data is to examine the contents of the data files. Using the PROC CONTENTS procedure,
you can get a list of variable names and variable labels for each data file selected. While reviewing the output of the PROC
CONTENTS procedure, you should compare variable names and labels to see whether any changes or differences
occurred from cycle to cycle.

The example below uses the sample "Food Sources" program. Notice that the variable labels for “Calcium (mg)” are the
same between 2001-2002 and 2003-2004, but the variable names are different. Additionally, a comparison of the
documentation for vitamins A and E between 2001-2002 and 2003-2004 shows that although the variable names remain
the same, the units of measure are different, and only careful examination of the documentation would allow you to detect
this change. It is important to check whether the variable names and labels are consistent between datasets before
appending.

Program to Check Datasets' Contents and Compare Variable Names and Labels

Sample Code

*---------------------------------------------------------;

* Use the LIBNAME statement to refer to the folder where ;

* the data files are stored. ;

*;

* Use the PROC CONTENTS procedure to list the contents ;

* of each dataset ;

* 2001-2002 Dietary Interview (Individual Foods File) ;

* Examination File ;

* 2003-3004 Dietary Interview (Individual Foods File) ;

* Examination File;

* 2001-2002 Demographic File ;

* 2003-2004 Demographic File ;

*;

* Use the VARNUM option to list the variables according ;

* to their position in the dataset. ;

*--------------------------------------------------------;

libname NH "C:\NHANES\DATA" ;
proc contents data =NH.DRXIFF_B varnum ;
proc contents data =NH.DR1IFF_C varnum ;
proc contents data =NH.DEMO_B varnum ;
proc contents data =NH.DEMO_C varnum ;
run ;

Output of Program

Click here to view program output and highlights

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm 1/4

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Task 2

IMPORTANT NOTE

Most dietary variables from 2001-2002 begin with the prefix DRXT and most dietary variables from the 2003-2004 begin
with the prefix DR1T (for Day 1 data) or DR2T (for Day 2 data). Because these variables are continuous (as opposed to
categorical), you can simply rename them to make them identical.

Step 2: Append Directly, If Variables are Identical

After carefully reviewing the demographic files, you will find that the variables of interest in the two cycles remain the same.
Therefore, you can directly append without any further changes.

Because you are interested only in a subset of the variables, you can use the KEEP option statement to select relevant
variables.

IMPORTANT NOTE

When appending NHANES data you should always include the sequence number (SEQN). Failing to do so will lead to
problems if you want to sort or merge your data files at a later time.

As a reminder, the sample code below is taken from the "Food Sources" program. No output is associated with this
procedure, so you will need to check the SAS log file to make sure that the procedure was completed successfully.
Additionally, you can use SAS Explorer to see that the new 4-year dataset (DEMO_4YR) is in your WORK library, which is
the default temporary library created for each SAS session. This library is deleted when the SAS session is complete. (To
find out how to save the dataset to a SAS-accessible library, see the Save a Dataset module.)

Program to Directly Append Datasets

Sample Code

*-------------------------------------------------------------------------;

* The DATA step creates a dataset for your 4 years of demographic data ;

* (DEMO_4YR). ;

*;

* The SET statement appends the 2003-2004 demographic data file ;

* (NH.DEMO_C) to the 2001-2002 demographic data file (NH.DEMO_B). ;

*;

* The KEEP statement selects the variables of interest. Notice that ;

* in the keep statement, the variable, sequence number (SEQN) is ;

* included. This variable should be included when datasets are appended. ;

*;

* The SDMVPSU and SDMVSTRA variables are included in the dataset in order ;

* to incorporate survey design information in later analyses. ;

*;

* Note that WTMEC2YR is the weight variable for all persons examined in ;

* the MEC and is appropriate for use with dietary recall data. Weights ;

* must be used in order for your analysis to be generalizable to the ;

* total population. For more information on weighting, see the Overview ;

* of NHANES Survey Design and Weights module in the NHANES Dietary Data ;

* Survey Orientation Course. ;

*-------------------------------------------------------------------------;

data DEMO_4YR;
set NH.DEMO_B (keep=SEQN RIDAGEYR SDMVPSU SDMVSTRA)
NH.DEMO_C (keep=SEQN RIDAGEYR SDMVPSU SDMVSTRA);

run ;

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm 2/4

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Task 2

Step 3: Rename Variables and/or Recode Variables Before Appending, If Variables are Different

If the variables in your datasets differ, you will need to rename and/or recode them before you append them. For example,
the 2001-2002 total nutrient intake files contain variables that were renamed in 2003-2004. Therefore, if you append files
from these survey cycles, you will need to rename the variable first and then append the data. If the response categories
of the variables are different, you will also need to recode.

You will see in the sample code from the "Food Sources" program that the variables DRDDRSTZ, DRXICALC, and
DRDIFDCD in the 2001-2002 individual food file were renamed to DR1DRSTZ, DR1ICALC, and DR1IFDCD, respectively,
the same as the variable names in the 2003-2004 data file. After renaming the 2001-2002 variables, you will be ready to
append the data files with selected variables of interest.

Program to Rename Variables and Append

Sample Code

*-------------------------------------------------------------------------;

* The DATA step creates the dataset for your 4 years of dietary data ;

* (IFF_4YR). ;

*;

* The KEEP statement includes only variables of interest in your dataset. ;

*;

* The SET statement appends the 2003-2004 dietary nutrient data file ;

* (NH.DR1IFF_C) to the 2001-2002 dietary nutrient data file (NH.DRXIFF_B).;

*;

* The RENAME statement renames the variables DRDDRSTZ, DRXICALC, and ;

* DRDIFDCD in the 2001-2002 dietary nutrient data file to DR1DRSTZ, ;

* DR1ICALC, and DR1IFDCD, which are the names given to the same variables ;

* in the 2003-2004 dietary nutrient data file. ;

*-------------------------------------------------------------------------;

data IFF_4YR (keep=DR1IFDCD WTDRD1 DR1ICALC SEQN DR1DRSTZ);
set NH.DRXIFF_B (rename=(DRDDRSTZ=DR1DRSTZ DRXICALC=DR1ICALC
DRDIFDCD=DR1IFDCD))
NH.DR1IFF_C;

run ;

No output is associated with this procedure, so you will need to check the SAS log file to make sure that the procedure
completed successfully. Additionally, you can use SAS Explorer to see that the new 4-year dataset (IFF_4YR) is in your
WORK library.

Step 4: Construct Weights for NHANES Analyses across Multiple Survey Cycles

In general, when combining multiple survey cycles, the basic sample weight variable for each cycle should be divided by
the number of cycles in the combined data set. Then, these rescaled weights can be summed to form a new weight for the
combined survey cycles. The following examples show how to construct weights for multiple survey cycles for NHANES
2001-2002 and beyond.

Combining 2001-2002 and 2003-2004 to Produce a 4-Year Dataset 3/4
For 4 years of data from 2001-2004, construct a weight variable as follows:
Sample Code
if SDDSRVYR=2 or SDDSRVYR=3 then MEC4YR = WTMEC2YR/2;

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm

12/19/2018 NHANES Dietary Web Tutorial: Merge & Append Datasets: Task 2

Combining 2001-2002, 2003-2004, and 2005-2006 to Obtain 6 Years of Data
For 6 years of data from 2001-2006, construct a weight variable as follows:
Sample Code
if SDDSRVYR in (2,3,4) then MEC6YR = WTMEC2YR/3;

IMPORTANT NOTE

Certain survey components were completed on subsamples, which have subsample sample weights. Subsample weights
are not designed to be combined. In fact, many subsamples are mutually exclusive. If it is necessary to combine two or
more subsamples for your analyses, then appropriate weights would need to be recalculated. However, details on how to
recalculate weights when combining subsamples are beyond the scope of this tutorial. Therefore, it is strongly advised
that you do not attempt to combine subsamples in any analysis.

Step 5: Check Results

After appending the data files, it is a good idea to check the contents again to make sure that the files were appended
correctly. Use the PROC CONTENTS procedure, as demonstrated in Step 1, to check the combined files. Consult the
Program to Check Datasets' Contents and Compare Variable Names and Labels, above, for further instruction, if
necessary.

Double check variable names and labels, and make sure that variables are renamed correctly. Pay special attention to the
number of observations in the combined dataset, which should be the sum of the observations in the two data files.

Output of Program

Click here to view program output and highlights

View animation of program and output
Can't view the demonstration? Try our Tech Tips for troubleshooting help.

Close Window to return to module page.

https://www.cdc.gov/nchs/tutorials/Dietary/Preparing/MergeAppend/task2.htm 4/4

Pages:

Click to View FlipBook Version