•CHAPTER 19 Editing and Coding: Transforming Raw Data into Information 473
EXHIBIT 19.7 Code © Cengage Learning 2013
Sales rep
Coding Open-Ended Questions about Equipment Rental Pricing
Pricing
Customer Verbatim Comment Other
The customer stated that he would like an outside sales representative to visit periodically. Pricing
The customer stated she would like any “specials” the store is running to be faxed to her office.
The customer stated that the store could lower prices to improve. Location
The customer just opened an account with the store and will be using it soon. Sales rep
The customer stated that the store could improve by giving the customer preferred rates Location
on all rentals. Location
The customer stated that the store could improve by having a more convenient location. Sales rep
The customer stated that the store could send an outside sales representative to his business. Sales rep
The customer stated that the store could improve by having a closer location. Location
The customer stated that the store could improve by having a more convenient location. Sales rep
The customer stated that he contacted the store but the sales rep did not call back. Other
The customer stated that he has never talked to a sales representative. Equipment availability
The customer stated that she would like to see more store locations near her business. Sales rep
The customer has not heard from a sales representative from the store.
The customer stated that he disliked the store requiring him to have insurance to rent equipment. Sales rep
The customer said the store does not carry the equipment he needs. Pricing
The customer stated that he would like a sales rep to provide information about the store’s Other
products.
The customer stated that he would like to be contacted by a sales rep. Equipment availability
The customer stated that she would like to see lower prices at the store. Pricing
The customer stated that she felt the entire process of renting equipment at the store was far too Sales rep
time-consuming.
The customer stated that the store needed broader variety of equipment. Location
The customer stated that the store should lower their prices. Location
The customer stated that the store could send a sales representative with information to their store Location
location. Sales rep
The customer stated that the store could improve by having a closer location. Equipment availability
The customer stated the nearest location is 50 miles away from him.
The customer stated that the store could improve by having a closer location.
The customer stated that she would like to be contacted by a sales representative from the store.
The customer stated that the store did not always have the equipment he needed.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•474 PART SIX Data Analysis and Presentation
EXHIBIT 19.8 Prices high: restaurant/coffee shop/snack bar Number
Dirty—filthy—smelly restrooms/airport 90
Open-Ended Responses to a Very good/good/excellent/great 65
Survey about the Honolulu Need air-conditioning 59
Airport Nice/beautiful 52
Gift shops expensive 45
Too warm/too hot 32 © Cengage Learning 2013
Friendly staff/people 31
Airport is awful/bad 25
Long walk between terminal/gates 23
Clean airport 21
Employees rude/unfriendly/poor attitude 17
More signs/maps in lobby/streets 16
Like it 16
Love gardens 15
Need video games/arcade 11
More change machines/different locations 10
More padded benches/comfortable waiting area 8
More security personnel including HPD 8
Replace shuttle with moving walkways 8
Complaint: flight delay 8
Cool place 7
Crowded 7
Provide free carts for carry-on bags 7
Baggage storage inconvenient/need in different locations 7
Floor plan confusing 6
Mailbox locations not clear/more needed 6
More restaurants and coffee shops/more variety 6
Need a place to nap 6
Polite VIP/friendly/helpful 6
Poor help in gift shops/rude/unfriendly 6
Slow baggage delivery/service 6
6
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 19 Editing and Coding: Transforming Raw Data into Information 475
Very efficient/organized Number EXHIBIT 19.8 (Continued)
Excellent food
Install chilled water drinking fountains 6 Open-Ended Responses to a
Love Hawaii 5 Survey about the Honolulu
More TV sets 5 Airport
Noisy 5
People at sundries/camera rude 5
Shuttle drivers rude 5
Something to do for passengers with long waits 5
Airport too spread out 5
Better information for departing/arriving flights 5
Better parking for employees 4
Better shuttle service needed 4
Cute VIP 4
4
4
Coding open-ended questions is a very complex issue. Certainly, this task cannot be mastered
simply from reading this chapter. However, the reader should have a feel for the art of coding
responses into similar categories. With practice, and by using multiple coders so that consistency
can be examined, one can become skilled at this task.
Code Book
A code book gives each variable in the study and its location in the data matrix. In essence, the code book
code book provides a quick summary that is particularly useful when a data file becomes very
large. Exhibit 19.9 illustrates a portion of a code book from the telephone interview illustrated in A book that identifies each
Exhibit 19.6. Notice that the first few fields record the study number, city, and other information variable in a study and gives
used for identification purposes. Researchers commonly identify individual respondents by giving the variable’s description, code
each an identification number or questionnaire number. When each interview is identified with name, and position in the data
a number entered into each computer record, errors discovered in the tabulation process can be matrix.
checked on the questionnaire to verify the answer.
Editing and Coding Combined
Frequently the person coding the questionnaire performs certain editing functions, such as trans
lating an occupational title provided by the respondent into a code for socioeconomic status.
A question that asks for a description of the job or business often is used to ensure that there will
be no problem in classifying the responses. For example, respondents who indicate “salesperson” as
their occupation might write their job description as “selling shoes in a shoe store” or “selling IBM
supercomputers to the defense department.” Generally, coders are instructed to perform this type
of editing function, seeking the help of a tabulation supervisor if questions arise.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
RESEARCH
SNAPSHOT
Coding Data “On-the-Go”
Collecting business-critical data takes time. Often data collec- forklift drivers, data can © Baloncici/Shutterstock
tion specialists and researchers must stop their other organiza- be entered through voice
tional responsibilities to code a data value into a spreadsheet or commands, thus allowing
business database. This can take valuable and productive time them to continue to work
away from their other responsibilities. For example, a ware- within their warehouse
house worker operating a forklift might need to code where a without any productivity
particular pallet of product is in a warehouse, after they have downtime. In many ways,
moved it from one section to another. This would require that taking advantage of the
they stop and code into a computer the new p allet location. portability of cellular
Vangard Voice Systems aims to change this in a big way. technology is a natural fit
for supply chain companies. Using Vangard’s AccuSpeech, data
Vangard’s AccuSpeech and Mobile Voice Platform (MVP) is can truly be collected and coded “on-the-go”!
a mobile enterprise system that uses cellular phone technol-
ogy and proprietary voice recognition software to execute Source: http://www.vangardvoice.com, Vangard Voice Systems, accessed
voice commands to store, code, or recode data hands-free. For August 18, 2011.
Computerized Survey Data Processing
data entry While a very simple study may use hand tabulation, virtually all business research studies with large
The activity of transferring sample sizes use a computer for data processing. The process of transferring data from a research
data from a research project to project, such as answers to a survey questionnaire, to computers is referred to as data entry. Several
alternative means exist for entering data into a computer. In studies involving highly structured
computers. paper and pencil questionnaires, an optical scanning system may be used to read material directly
into the computer’s memory from mark-sensed questionnaires, similar to the form used for mul
optical scanning system tiple guess exams. As seen in the Research Snapshot “Coding Data ‘On-the-Go’,” even mobile
A data processing input phone technology is now being used to aid data processing.
device that reads material In a research study using computer-assisted telephone interviewing or a self-administered
directly from mark-sensed Internet questionnaire, responses are automatically stored and tabulated as they are collected. Direct
data capture substantially reduces clerical errors that occur during the editing and coding process.
questionnaires. If researchers have security concerns, the data collected in an Internet survey should be encrypted
and protected behind a firewall.
As the opening vignette shows, collecting data using computer technology is an ever-growing
phenomenon in business research. When data are not optically scanned or directly entered into the
computer the moment they are collected, data processing begins with keyboarding. A data entry process
transfers coded data from the questionnaires or coding sheets onto a hard drive. As in every stage of the
research process, there is some concern about whether the data entry job has been done correctly. Data
entry workers, like anyone else, may make errors.To ensure 100 percent accuracy in transferring the
codes, the job should be verified by a second data entry worker. If an error has been made, the verifier
corrects the data entry. T his process of verifying the data is never performed by the same person who
entered the original data.A person who misread the coded questionnaire during the keyboarding opera
tion might make the same mistake during the verifying process, and the mistake might go undetected.
Error Checking
The final stage in the coding process is error checking and verification, or data cleaning, to check
for wild codes. Computer software can examine the entered data and identify coded values that
lie outside the range of acceptable answers. For example, if “sex” is coded 1 for “male” and 2 for
“female” and a 3 code is found, a mistake obviously has occurred and an adjustment must be made.
Similarly, if a 10-point scale is used for respondent answers, the researcher must check to insure that
all coded data falls within the 1 to 10 range.
476
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 19 Editing and Coding: Transforming Raw Data into Information 477
Study #45641 EXHIBIT 19.9
January 20__
N = 743 Portion of a Code Book from
a Travel Study
Description and Meaning of Code Values
Question Field or Column
Number Number Study number (45641)
—
— 1–5 City
6 1. Chicago
— 2. Gary
A 7–9 3. Ft. Wayne
Not entered 4. Bloomington
B
1. Not entered Interview number (3 digits on upper left-hand corner
2. 10 of questionnaire)
11
Family, work for © Cengage Learning 2013
12 1. Travel agency
2. Advertising agency
13 3. Marketing research company
Interviewed past month
1. Yes
2. No
Traveled in past 3 months
1. Yes
2. No
Purpose last trip
1. Business
2. Vacation
3. Personal
Purpose second last trip
1. Business
2. Vacation
3. Personal
Purpose other trips
1. Business
2. Vacation
3. Personal
SUMMARY
1. Know when a response is really an error and should be edited. Data editing is necessary
before coding and storing the data file. The data editor must sometimes alter a respondent’s answer.
Often, this situation arises because of inconsistent responses; that is, responses to different ques
tions that contradict each other. The editor should be cautious in altering a respondent’s answer.
Only when a certain response is obviously wrong and the true response is easily determined should
the coder substitute a new value for the original response. Ideally, multiple pieces of evidence
would suggest the original response as inaccurate and also suggest the accurate response before the
respondent takes such a step. Missing data should generally be left as missing, although imputation
methods exist to provide an educated guess for missing values. These imputation methods can be
used when the sample size is small and the researcher needs to retain as many responses as possible.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
478 Part 6: Data Analysis and Presentation
2. Appreciate coding of pure qualitative research. Qualitative research such as typified by
depth interviews, conversations, or other responses is coded by identifying the themes underlying
some interview. The codes become a key component of a hermeneutic unit that ultimately can
be linked with others to form a grounded theory. The frequency with which some thought is
expressed helps to identify appropriate coding for unstructured qualitative data.
3. Understand the way data are represented in a data file. A survey provides an overview of
respondents based on their answers to questions. These answers are edited, coded, and then stored
in a data file. The data file is structured as a data matrix in which the rows represent respondents
and the columns represent variables. Thus, a survey in which 200 respondents are asked 50 struc
tured questions would result in a data matrix consisting of 200 rows and 50 columns.
4. Understand the coding of structured responses including a dummy variable
approach. Structured quantitative responses are generally coded simply by marking the number
corresponding to the choice selected by the respondent. Structured qualitative responses must also
be coded. Dichotomous variables lend themselves well to dummy coding. With dummy coding,
the two possible choices to a question are coded with a 0 for one response and a 1 for the other.
Short-answer or list questions are coded by assigning a number to all responses that seem to suggest
the same theme even if different words are used.
5. Appreciate the ways that technological advances have simplified the coding pro-
cess. Throughout the chapter, technological advances in data collection were mentioned. These
advances have automated a great deal of data coding and reduced the chances of respondent error.
For instance, some inconsistent responses can be automatically screened and the respondent can be
prompted to go back and correct a response that seems inconsistent. Also, if a respondent fails to
answer a question, a pop-up window can take that respondent back to the question and force him
or her to respond in order to continue through the rest of the questionnaire.
KEY TERMS AND CONCEPTS
code book, 475 editing, 460 optical scanning system, 476
codes, 465 field editing, 461 plug value, 463
coding, 465 field, 467 raw data, 459
data entry, 476 impute, 463 record, 467
data file, 467 in-house editing, 461 string characters, 467
data integrity, 460 item nonresponse, 463 test tabulation, 472
dummy coding, 466 nonrespondent error, 459 value labels, 467
QUESTIONS FOR REVIEW AND CRITICAL THINKING
1. What is the purpose of editing? Provide some examples of 8. List at least three ways in which recent technological
q uestions that might need editing. advances (within the last 15 years) have changed the way data
are coded.
2. When should the raw data from a respondent be altered by a
data editor? 9. ETHICS A large retail company implements an employee survey
that ostensibly is aimed at customer satisfaction. The survey
3. How is data coding different from data editing? includes a yes or no question that asks whether or not the
4. A 25-year-old respondent indicates that she owns her own employee has ever stolen something from the workplace. How
could this data be coded? What steps could be attempted to
house in Springfield, Illinois, and it is valued at $990 million. try and ensure that the employee’s response is honest? Do you
Later in the interview, she indicates that she didn’t finish high believe it is fair to ask this question? Should the employee take
school and that she drives a 1993 Buick Century. Should the action against employees who have indicated that they have
editor consider altering any of these responses? If so, how? s tolen something?
5. What role might a word counter play in coding qualitative
research results? 10. A researcher asks, “What do you remember about advertising
6. A survey respondent from Florida has been asked to respond as for iPad 2?” A box with enough room for 100 words is pro
to whether or not he or she owns a boat, and if so, whether he vided in which the respondent can answer the question. The
or she stores the boat at a marina. Over two hundred respon survey involves responses from 250 consumers. How should
dents are included in this sample. What suggestions do you have the code book for this question be structured? What problems
for coding the information provided? might it present?
7. How would a dummy variable be used to represent whether or
not a respondent in a restaurant ordered dessert after their meal?
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 19: Editing and Coding: Transforming Raw Data into Information 479
11. ’NET Use http://www.naicscode.com to help with this 12. ’NET Explore the advantages of computerized software such as
response. What is the NAICS code for golf (country) clubs? ATLAS.ti. The website is at http://www.atlasti.com. How do
What is the NAICS code for health clubs? How can these codes you think it might assist in coding something like a depth inter
be useful in creating data files? view or a collage created by a respondent?
RESEARCH ACTIVITIES
1. Design a short questionnaire with fewer than five fixed-alterna 2. ’NET The web page of the Research Triangle Institute (http://
tive questions to measure student satisfaction with your college www.rti.org) describes its research tools and methods in some
bookstore. Interview five classmates and then arrange the data detail. Click on Survey Research and Services and explore the
base into a data matrix. surveys and survey tools described there. How might these
methods assist in coding?
CASE U.S. Department of the Interior Heritage Conservation
and Recreation Service
19.1 Some years ago the U.S. Department of the Interior • Respondent number
conducted a telephone survey to help plan for • State code (all 50 states)
future outdoor recreation. A nine-page questionnaire concern Question
ing participation in outdoor recreational activities and satisfaction Design the coding for this portion of the questionnaire. Assume
with local facilities was administered by the Opinion Research that the data from previous pages of the questionnaire will fol
C orporation of Princeton, New Jersey, to 4,029 respondents. The low these data.
last two pages of the questionnaire appear in Case Exhibit 19.1-1.
Assume the data will be entered into a data file in which each
data entry should include the following information:
CASE EXHIBIT 19.1-1
Sample Page from Questionnaire
The following questions are for background purposes. Craftspersons, forepersons, and kindred workers
32. Do you live in an … Operatives and kindred workers
Urban location Service workers
Suburban location Laborers, except farm and mine
Rural location
33. C ounting yourself, how many members of your family live Retired, widow, widower
Student
here? (If 1 on Q.33, go to Q.35) Unemployed, on relief, laid off → Go to Q.43
34. How many family members are … Housewife
Under 5 years _______
5 to 11 years ________ Other (specify)
12 to 20 years _______ 38. H ow many hours a week do you work at your place of
21 to 39 years _______
40 to 65 years _______ employment? ___ (hours)
Over 65 years _______ 39. How many days of vacation do you get in a year? ___ (days)
35. What is your age? (Years) 40. Please tell me which of the following income categories most
36. I n school, what is the highest grade (or year) you have com
closely describe the total family income for the year before
pleted? (Circle response) taxes, including wages and all other income. Is it …
Under $12,000
Elementary school 01 02 03 04 05 06 $12,000–$20,000
$20,001–$30,000
Junior high school 07 08 $30,001–$50,000
$50,001–$100,000
High school 09 10 11 12 Over $100,001
41. Sex of respondent …
College 13 14 15 16 Male
Female
Graduate school 17 18 19 20 21 42. What is the zip code at your place of employment?
This concludes the interview; thank you very much for your
37. What is your occupation? What kind of work is that? cooperation and time.
Professional, technical, and kindred workers
Farmers
Managers, officials, and proprietors
Clerical and kindred workers
Sales workers
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
480 Part 6: Data Analysis and Presentation
6.
Shampoo 9–10
CASE A shampoo, code-named 9–10, was given to women field 10, the “gentleness” field, and find the comment “Gentle/mild/
19.2 for trial use.4 The respondents were asked what they not harsh”; then the coder would write 11 next to the comment. If,
liked and disliked about the product. Some sample under “dislikes,” someone had said,“I would rather have a shampoo
with a crême rinse,” the coder would look in field 16 for comparison
codes are given in Case Exhibit 19.2-1 and 19.2-2. to other shampoos and write 74 (“Prefer one with a crême rinse”)
beside that response.
There were two separate sets of codes: the codes in Case Exhibit
The sample questionnaires appear in Case Exhibit 19.2-3.
19.2-1 were for coding the respondents’ likes and the codes in
Questions
Case Exhibit 19.2-2 were for coding their dislikes.The headings iden 1. Code each of the three questionnaires.
2. Evaluate this coding scheme.
tify fields in the data matrix and the different attributes of shampoo.
The specific codes are listed under each attribute.The coding instruc
tions were first to look for the correct heading, and then to locate the
correct comment under that heading and use that number as the code.
For example, if, in response to a “like” question, a respondent had
said,“The shampoo was gentle and mild,” a coder would look in
CASE EXHIBIT 19.2-1
Sample Codes for “like” Questions
Test No. Shampoo
Question: Likes
Field 10 Gentleness Field 11 Result on Hair
11 Gentle/mild/not harsh 21 Good for hair/helps hair
12 Wouldn’t strip hair of natural oils 22 Leaves hair manageable/no tangles/no need for crême rinse
13 Doesn’t cause/helps flyaway hair 23 Gives hair body
14 Wouldn’t dry out hair 24 Mends split ends
15 Wouldn’t make skin/scalp break out 25 Leaves hair not flyaway
16 Organic/natural 26 Leaves hair silky/smooth
17 27 Leaves hair soft
18 28 Leaves hair shiny
19 29 Hair looks/feels/good/clean
20 30
1− 2−
1+ Other gentleness 2+ Other results on hair
Field 12 Cleaning Field 13 Miscellaneous
31 Leaves no oil/keeps hair dry 41 Cheaper/economical/good price
32 It cleans well 42 Smells good/nice/clean
33 Lifts out oil/dirt/artificial conditioners 43 Hairdresser recommended
34 Don’t have to scrub as much 44 Comes in different formulas
35 No need to wash as often/keeps hair cleaner longer 45 Concentrated/use only a small amount
36 Doesn’t leave a residue on scalp 46 Good for whole family (unspecified)
37 Good lather 47
38 Good for oily hair 48
39 49
(Continued )
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 19: Editing and Coding: Transforming Raw Data into Information 481
CASE EXHIBIT 19.2-1 (Continued) Field 13 Miscellaneous
50
Sample Codes for “like” Questions 4– Other miscellaneous
4+ Don’t know/nothing
Field 12 Cleaning
40
3−
3+ Other cleaning
CASE EXHIBIT 19.2-2
Sample Codes for “Dislike” Questions
Test No. Shampoo
Question: Dislikes
Field 14 Harshness Field 15 Cleaning
51 Too strong 61 Doesn’t clean well
52 Strips hair/takes too much oil out 62 Leaves a residue on scalp
53 Dries hair out 63 Poor lather
54 Skin reacts badly to it 64 Not good for oily hair
55 65
56 66
57 67
58 68
59 69
60 70
5− 6−
5+ Other harshness 6+ Other cleaning
Field 16 Comparison to Others Field 17 Miscellaneous
71 Prefer herbal/organic shampoo 81 Don’t like the name
72 Prefer medicated/dandruff shampoo 82 Too expensive
73 Same as other shampoos—doesn’t work any differently 83 Not economical for long hair
74 Prefer one with a crême rinse 84 Use what hairdresser recommends
75 Prefer another brand (unspecified) 85
76 86
77 87
78 88
79 89
80 90
7− 8− Other miscellaneous
7+ Other comparison to others 8+ Don’t know what/disliked/nothing
(Continued )
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
482 Part 6: Data Analysis and Presentation
Case 19.2 (Continued )
CASE EXHIBIT 19.2-3
Sample Questionnaires for Shampoo 9–10 Survey
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Basic Data Analysis 20CHAPTER
Descriptive Statistics
LEARNING OUTCOMES
After studying this chapter, you should be able to
1. Know what descriptive statistics are and why they are used
2. Create and interpret simple tabulation tables
3. Understand how cross-tabulations can reveal relationships
4. Perform basic data transformations
5. List different computer software products designed for descriptive
statistical analysis
6. Understand a researcher’s role in interpreting the data
Chapter Vignette:
Choose Your “Poison”
M ost Americans enjoy an adult beverage the customer base. More recently, younger singles just starting
occasionally. But not all Americans like careers have moved into the nearby neighborhoods. Should the
the same drink. Many decision makers are store reconsider its adult beverage merchandising?
interested in what Americans like to drink.
In 1992, American consumers showed a heavy preference
Retailers need to have the correct product mix for their par- toward beer. Among American adults who drank adult beverages,1
ticular customers to increase profits and customer satisfaction. ■■ 47 percent drank beer
■■ 21 percent drank spirits
R estaurants need to know what their customers like to have ■■ 27 percent drank wine
with the types of food they serve. Policy makers need to know
what types of restrictions should be placed on what types of
products to prevent underage drinking and alcohol abuse.
Researchers could apply sophisticated statistics to address ques-
tions related to Americans’ drinking preferences, but a lot can
be learned from just counting what people are buying.
A grocery store built in 1975 in Chicago allocates 15 per-
cent of their floor space to adult beverage products. Out of
this 15 percent, 60 percent is allocated to beer, 25 percent © Stephen Oliver/Dorling Kindersley/Getty Images
to spirits, and 15 percent to wine. Since the products are not
merchandised the same way (different types of shelving, aisles,
and racking are needed), adjusting the floor space to change
these percentages is not an easy task. Over the three-decade
history of the store, the customer base has changed. Originally,
stay-at-home moms buying groceries for the family best char-
acterized the customer base. During the 1990s, empty-nesters,
including retirees with high disposable incomes, characterized
483
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•484 PART SIX Data Analysis and Presentation
By 2005, Americans had changed their drinking preferences. Across America, grocers account for 35 percent of all
At this point, beer sales, but convenience stores, where younger consum-
ers tend to shop, account for 45 percent.3 If the grocery store
■■ 36 percent drank beer is converting more to a convenience store, maybe a contin-
■■ 21 percent drank spirits ued emphasis on beer is wise. However, wine consumers are
■■ 39 percent drank wine more attractive from several perspectives. Wine now ranks
among the top 10 food categories in America, based on gro-
A couple of other facts also became clear. A count of the pre- cery store dollar sales volume. Forty-five percent of all wine
ferred beverages among American adult consumers 29 and is sold in grocery stores. What we find is that the consumer
younger showed the following preferences in 2005: who buys wine is also more likely to buy products like prime
or choice beef and imported cheeses, instead of lower quality
■■ 48 percent drank beer and lower priced meat and cheese products. As a result, the
■■ 32 percent drank liquor a verage $13.44 spent on wine in a grocery store (as opposed to
■■ 17 percent drank wine $11.94 on beer) is only part of the story in explaining why wine
c ustomers may be grape customers!4
Perhaps due to the emergence of this younger group, a 2008
study shows beer has regained the position as America’s favor- What should the grocer emphasize in marketing adult bever-
ite adult beverage:2 ages? Perhaps the research based on counting can address this
decision.
■■ 42 percent drink beer
■■ 23 percent drink spirits
■■ 31 percent drink wine
Introduction
Perhaps the most basic statistical analysis is descriptive analysis. Descriptive statistics can sum-
marize responses from large numbers of respondents in a few simple statistics. When a sample is
obtained, the sample descriptive statistics are used to make inferences about characteristics of the
entire population of interest.This chapter introduces basic descriptive statistics, which are simple
but powerful. This chapter also provides the foundation for Chapter 21, which will extend basic
statistics into the area of univariate statistical analysis.
The Nature of Descriptive Analysis
descriptive analysis Descriptive analysis is the elementary transformation of data in a way that describes the basic char-
acteristics such as central tendency, distribution, and variability. For example, consider the business
The elementary transformation researcher who takes responses from 1,000 American consumers and tabulates their favorite soft
of raw data in a way drink brand and the price they expect to pay for a six-pack of that product.The mean, median, and
mode for favorite soft drink and the average price across all 1,000 consumers would be descriptive
that describes the basic statistics that describe central tendency in three different ways. Means, medians, modes, variance,
characteristics such as central range, and standard deviation typify widely applied descriptive statistics.
tendency, distribution, and Chapter 13 indicated that the level of scale measurement helps the researcher choose the most
variability. appropriate form of statistical analysis. Exhibit 20.1 shows how the level of scale measurement
influences the choice of descriptive statistics. Remember that all statistics appropriate for lower-
order scales (nominal and ordinal) are suitable for higher-order scales (interval and ratio), but the
reverse is not true.
Consider the following data. Sample consumers were asked where they most often purchased
beer.The result is a nominal variable that can be described with a frequency distribution (see the
bar chart in Exhibit 20.1).Ten percent indicated they most often purchased beer in a drug store,
45 percent indicated a convenience store, 35 percent indicated a grocery store, and 7 percent
indicated a specialty store.Three percent listed some “other” outlet (not shown in the bar chart).
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
STUHRVISEY! Courtesy of Qualtrics.com
One item in the questionnaire asks respondents to
report the career they view as most attractive. A
simple way to get an understanding of this popula-
tion’s career aspiration is to simply count the num-
ber who rank each profession as their preferred
career. Try to draw some conclusions about which
job is most attractive: (1) Calculate the number of
respondents who rank each profession as the most
attractive (assign it a 1). Report this tabulation.
(2) Do you think female and male respondents
respond similarly to this item? Try to create the
appropriate cross-tabulation table to show which
jobs are preferred by men and women respectively.
The mode is convenience store since more respondents chose this than any other category. A simi- histogram
lar distribution may have been obtained if the chart plotted the number of respondents ranking
each store as their favorite type of place to purchase beer. A graphical way of showing a
frequency distribution in which
The bottom part of Exhibit 20.1 displays example descriptive statistics for interval and ratio the height of a bar corresponds
variables. In this case, the chart displays results of a question asking respondents how much they to the observed frequency of
typically spend on a bottle of wine purchased in a store.The mean and standard deviation are dis- the category.
played beside the chart as 11.7 and 4.5, respectively.Additionally, a frequency distribution is shown
with a histogram. A histogram is a graphical way of showing a frequency distribution in which
the height of a bar corresponds to the frequency of a category. Histograms are useful for any type
of data, but with continuous variables (interval or ratio) the histogram is useful for providing a
quick assessment of the distribution of the data. A normal distribution line is superimposed over
the histogram, providing an easy comparison to see if the data are skewed or multimodal.
Measurement Level Statistic Example EXHIBIT 20.1
Nominal Frequency Table
Ordinal Beer Sales Levels of Scale Measurement
Proportion 50 and Suggested Descriptive
(Precentages) Mode 45 Statistics
40
Percent 35
30
25
20
15
10
5
0
Drug Store Convenience Store Grocery Store Specialty
Purchase Location
7
6
Interval 5
Ratio
Means Frequency 4
Standard
Deviations 3
2 Mean ϭ 11.6667 © Cengage Learning 2013
Std. Dev. ϭ 4.54888
1 N ϭ 27
5.00 10.00 15.00 20.00 25.00
0 Price
0.00
485
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•486 PART SIX Data Analysis and Presentation
Tabulation
tabulation Tabulation refers to the orderly arrangement of data in a table or other summary format.When this
tabulation process is done by hand, the term tallying is used. Counting the different ways respon-
The orderly arrangement dents answered a question and arranging them in a simple tabular form yields a frequency table.
of data in a table or other The actual number of responses to each category is a variable’s frequency distribution. A simple
summary format showing the tabulation of this type is sometimes called a marginal tabulation.
number of responses to each
response category; tallying. Simple tabulation tells the researcher how frequently each response occurs.This starting point
for analysis requires the researcher to count responses or observations for each category or code
frequency table assigned to a variable. A frequency table showing where consumers generally purchase beer can
be computed easily. The tabular results that correspond to the chart would appear as follows:
A table showing the different
ways respondents answered a
question.
Response Frequency Percent Cumulative
Percentage
Drugstore 50 10
Convenience store 225 45 10
Grocery store 175 35 55
Specialty 90
Other 35 7 97
15 3 100
The frequency column shows the tally result or the number of respondents listing each store,
respectively.The percent column shows the total percentage in each category. From this chart, we
can see the most common outlet—the mode—is convenience store since more people indicated
this as their top response than any other.The cumulative percentage keeps a running total, showing
the percentage of respondents indicating this particular category and all preceding categories as
their preferred place to purchase beer.The cumulative percentage column is not so important for
nominal or interval data, but is quite useful for interval and ratio data, particularly when there are
a large number of response categories.
Similarly, in the Research Snapshot you see some recent results of a study of Americans’
responses to the simple question, “Do you consider your pet to be a member of the family?”5
The idea that gender influences the types of things we do for our pets brings us to cross-tabulation.
Cross-Tabulation
cross-tabulation A frequency distribution or tabulation can address many research questions. As long as a question
deals with only one categorical variable, tabulation is probably the best approach. Although fre-
The appropriate technique for quency counts, percentage distributions, and averages summarize considerable information, simple
addressing research questions tabulation may not yield the full value of the research. Cross-tabulation is the appropriate tech-
involving relationships among nique for addressing research questions involving relationships among multiple less-than interval
variables. We can think of a cross-tabulation as a combined frequency table. Cross-tabs allow the
multiple less-than interval inspection and comparison of differences among groups based on nominal or ordinal categories.
v ariables; results in a combined One key to interpreting a cross-tabulation table is comparing the observed table values with hypo-
frequency table displaying one thetical values that would result from pure chance. A statistical test for this comparison is discussed
variable in rows and another in in Chapter 21. Here, we focus on constructing and interpreting cross-tabs.
columns. Exhibit 20.2 summarizes several cross-tabulations from responses to a questionnaire on how fam-
ilies would respond to financial hardship or instability associated with their children. Panel A pres-
ents results on whether grown children should be allowed to share a home with their parents.The
cross-tab suggests this may vary with basic demographic variables. From the results, we can see that
more men (24 percent) than women (22 percent) reported they consider it a “good idea.” Further,
it appears that there is a slight increase when comparisons are made between very young (18–29)
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 20 Basic Data Analysis: Descriptive Statistics 487
EXHIBIT 20.2
Cross-Tabulation Tables from a Survey of Families Regarding Family Responses to Financial Hardships
(A) As you know, many older people share a home with their grown children. Do you think this is
generally a good idea or a bad idea?
Total Gender Age
Adults Male Female 18–29 30–44 45–59 60+
Good idea 22.8% 23.9% 21.8% 21.2% 24.4% 27.4% 18.4%
Bad idea 14.2% 14.4% 14.0% 12.2% 13.1% 11.8% 18.3%
It depends 58.1% 56.5% 59.8% 61.1% 57.3% 56.4% 58.8%
Don’t know
4.8% 5.3% 4.4% 5.5% 5.2% 4.3% 4.6%
(B) Parents ought to provide financial help to their adult children when the children are having financial difficulty.
Total Gender Age
Adults Male Female 18–29 30–44 45–59 60+
Strongly Agree 5.0% 5.3% 4.6% 6.1% 4.0% 4.2% 5.7%
Agree 31.2% 33.3% 29.1% 31.6% 27.6% 29.0% 36.0%
Neither 49.9% 48.6% 51.2% 48.7% 53.6% 50.8% 46.6%
Disagree 12.1% 10.9% 13.3% 12.8% 12.6% 13.6% 9.9%
Strongly Disagree 1.9% 1.9% 1.9% .8% 2.1% 2.5% 1.8%
Source: Bibliographic Citation: Judith A. Seltzer & Suzanne M. Bianchi, National Center for Family and Marriage Research, “Familial Responses to Financial Instability, Doubling Up When Times
Are Tough: Obligations to Share a Home in Response to Economic Hardship,” [United States] [Computer file] ICPSR26543-v1 (2009). Ann Arbor, MI: Inter-university Consortium for Political and
Social Research [distributor], accessed May 20, 2010. doi:10.3886/ICPSR26543.v1.
and middle class respondents, although this trend decreases substantially for those respondents over
60 years of age. Panel B provides another example of a cross-tabulation table. The question asks if
parents should provide financial help to their children if they are having difficulties. In this case, we
see some differences between men (38.6 percent strongly agree or agree) and women (33.7 percent
strongly agree or agree). However, before reaching any conclusions based on this survey, one must
carefully scrutinize this finding for possible extraneous variables.
Contingency Tables
Exhibit 20.3 shows example cross-tabulation results using contingency tables. A contingency contingency table
table is a data matrix that displays the frequency of some combination of possible responses to A data matrix that displays
multiple variables.Two-way contingency tables, meaning they involve two less-than interval vari- the frequency of some
ables, are used most often.A three-way contingency table involves three less-than interval variables. c ombination of possible
Beyond three variables, contingency tables become difficult to analyze and explain easily. For all responses to multiple variables;
practical purposes, a contingency table is the same as a cross-tabulation. cross-tabulation results.
Two variables are depicted in the contingency table shown in panel A: marginals
Row and column totals in a
■■ Row Variable: Biological Sex _____M _____F contingency table, which are
■■ Column Variable: “Do you shop at Target? YES or NO” shown in its margins.
Several conclusions can be drawn initially by examining the row and column totals:
1. 225 men and 225 women responded, as can be seen in the row totals column.
2. Out of 450 total consumers responding, 330 consumers indicated that “yes” they do shop at
Target and 120 indicated “no,” they do not shop at Target. This can be observed in the col-
umn totals at the bottom of the table. These row and column totals often are called marginals
because they appear in the table’s margins.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•488 PART SIX Data Analysis and Presentation
EXHIBIT 20.3 (A) Cross-Tabulation of Question “Do you shop at Target?” by Sex of Respondent
Possible Cross-Tabulations of Yes No Total
One Question
Men 150 75 225
Women 180 45 225
Total 330 120 450
(B) Percentage Cross-Tabulation of Question “Do you shop at Target?”
by Sex of Respondent, Row Percentage
Yes No Total (Base)
Men 66.7% 33.3% 100% (225)
Women 80.0% 20.0% 100% (225)
(C) Percentage Cross-Tabulation of Question “Do you shop at Target?”
by Sex of Respondent, Column Percentage
Yes No
Men 45.5% 62.5% © Cengage Learning 2013
Women 54.5% 37.5%
Total 100% 100%
(Base) (330) (120)
Researchers usually are more interested in the inner cells of a contingency table. The inner
cells display conditional frequencies (combinations). Using these values, we can draw some more
specific conclusions:
3. Out of 330 consumers who shop at Target, 150 are male and 180 are female.
4. Alternatively, out of the 120 respondents not shopping at Target, 75 are male and 45 are
female.
This finding helps us know whether the two variables are related. If men and women equally
patronized Target, we would expect that hypothetically 165 of the 330 shoppers would be male
and 165 would be female. Because we have equal numbers of men and women, the 330 would be
equally male and female.The hypothetical expectations (165m/165f ) are not observed.What is the
implication? Target shoppers are more likely to be female than male. Notice that the same meaning
could be drawn by analyzing non-Target shoppers.The Research Snapshot provides an example of
the information provided by cross-tabs.
A two-way contingency table like the one shown in part A is referred to as a 2 × 2 table
because it has two rows and two columns. Each variable has two levels. A two-way contingency
table displaying two variables, one (the row variable) with three levels and the other with four
levels, would be referred to as a 3 × 4 table. Any cross-tabulation table may be classified according
to the number of rows by the number of columns (R by C).
statistical base Percentage Cross-Tabulations
The number of respondents When data from a survey are cross-tabulated, percentages help the researcher understand the
or observations (in a row or nature of the relationship by making relative comparisons simpler. The total number of respon-
column) used as a basis for dents or observations may be used as a statistical base for computing the percentage in each cell.
When the objective of the research is to identify a relationship between answers to two questions
computing percentages.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
RESEARCH
SNAPSHOT
Our Four-Legged Family Members
We all have different sized families, but what is often the pets by gender? When © Eric Isselee/Shutterstock
case in American households is that at least one member examining the contin-
of the family is a pet. And, overwhelmingly that pet is a gency tables for this % Male % Female
dog, a cat, or both. The interesting question is: What does question, males and 67 72
it mean to be a “member of the family,” and do men or females who responded 56 63
women treat their four-legged friends as family members “Frequently” or “Occa- 41 33
in different ways? This would suggest a set of contingent sionally” indicated the 27 22
arguments, that is, does treating a pet as a member of the following: 17 15
family depend upon whether you are a male or female? And,
does the type of treatment of that pet depend on whether Allowed the pet to sleep in the bed
you are male or female as well? A Harris Interactive Poll
conducted in May 2011 sought to answer this interesting Bought my pet a holiday present
question.
Bought my pet a birthday present
The Harris Interactive Poll conducted a stratified survey of
over 2,000 adults across the United States. The sample included Cooked especially for my pet
regional information, age, race, and gender, as well as income
and political party identification. Consider the following 2-by-2 Dressed my pet in some type of
contingency table, excluding those that responded as “Not clothing
Sure.” Harris asked an initial question, “Do you consider your
pet to be a member of your family?”
Took my pet to work 17 6
% Male % Female
Yes 85 95 The results are fairly similar, but some interesting results do
appear. First, while females are more likely to have their pet
No 12 3 sleep in the bed and buy them a holiday present, males appear
more likely to buy a birthday present, cook something special
It would appear that women are much more likely to see for them, and take them to work. The results suggest that
their pet as a part of the family. But the second contingent the ways we treat our pets as a family member depends upon
question is also interesting: If you see your pet as a family (i.e., is contingent upon) whether we are males or females.
member, are there different things pet owners do for their
Source: “Pets Really are Members of the Family,” Harris Interactive, http://
www.harrisinteractive.com/vault/HI-Harris-Poll-Pet-Ownership-2011-06-10
.pdf, accessed August 4, 2011.
(or two variables), one of the questions is commonly chosen to be the source of the base for ‘‘The more we
determining percentages. For example, look at the data in parts A, B, and C of Exhibit 20.3.
Compare part B with part C. In part B, we are considering gender as the base—what percent- study, the more
age of men and of women shop at Target? In part C, we are considering Target shoppers as the
base—what percentage of Target shoppers are men? Selecting either the row percentages or ’’we discover our
the column percentages will emphasize a particular comparison or distribution. The nature of
the problem the researcher wishes to answer will determine which marginal total will serve as a ignorance.
base for computing percentages.
—PERCY BYSSHE Shelley
Fortunately, a conventional rule determines the direction of percentages. The rule depends on
which variable is identified as an independent variable and which is a dependent variable. Simply
put, independent variables should form the rows in a contingency table. The marginal total of the
independent variable should be used as the base for computing the percentages. Although survey
research does not establish cause-and-effect evidence, one might argue that it would be logical to
assume that a variable such as biological sex might predict beverage preference. This makes more
sense than thinking that beverage preference would determine biological sex!
489
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•490 PART SIX Data Analysis and Presentation
Elaboration and Refinement
elaboration analysis The Oxford Universal Dictionary defines analysis as “the resolution of anything complex into its
simplest elements.” Once a researcher has examined the basic relationship between two variables, he
An analysis of the basic or she may wish to investigate this relationship under a variety of different conditions. Typically,
cross-tabulation for each level a third variable is introduced into the analysis to elaborate and refine the researcher’s understand-
ing by specifying the conditions under which the relationship between the first two variables is
of a variable not previously strongest and weakest. In other words, a more elaborate analysis asks, “Will interpretation of the
considered, such as subgroups relationship be modified if other variables are simultaneously considered?”
of the sample. Elaboration analysis involves the basic cross-tabulation within various subgroups of the sam-
ple.The researcher breaks down the analysis for each level of another variable. If the researcher has
cross-tabulated shopping preference by sex (see Exhibit 20.3) and wishes to investigate another
variable (say, marital status), a more elaborate analysis may be conducted. Exhibit 20.4 breaks down
the responses to the question, “Do you shop at Target?” by sex and marital status.The data show
women display the same preference whether married or single. However, married men are much
more likely to shop at Target than are single men.The analysis suggests that the original conclusion
about the relationship between sex and shopping behavior for women be retained. However, a rela-
tionship that was not discernible in the two-variable case is evident. Married men more frequently
shop at Target than do single men.
EXHIBIT 20.4 Single Married
Women
Cross-Tabulation of Marital Men Men Women
Status, Sex, and Responses
to the Question “Do You “Do you shop at Target?” © Cengage Learning 2013
Shop at Target?” Yes
No
55% 80% 86% 80%
45% 20% 14% 20%
moderator variable The finding is consistent with an interaction effect.The combination of the two variables, sex
and marital status, is associated with differences in the dependent variable. Interactions between
A third variable that changes variables are the result of one variable being used as a moderating variable. A moderator variable
the nature of a relationship is a third variable that changes the nature of a relationship between the original independent and
between the original dependent variables. Marital status is a moderator variable in this case.The interaction effect sug-
gests that marriage changes the relationship between sex and shopping preference.
independent and dependent
variables. In other situations the addition of a third variable to the analysis may lead us to reject the
original conclusion about the relationship.When this occurs, the elaboration analysis suggests the
relationship between the original variables is spurious (see Chapter 3).
The chapter vignette described data suggesting a relationship between the type of store in
which a consumer shops and beverage preference. Convenience store shoppers seem to choose
beer over wine, while grocery store shoppers choose wine over beer. Does store type drive drink-
ing preference? Perhaps a third variable, age, determines both the type of store consumers choose
to buy in and their preference for adult beverages. Younger consumers both disproportionately
shop in convenience stores and drink beer.
How Many Cross-Tabulations?
Surveys may ask dozens of questions and hundreds of categorical variables can be stored in a
data warehouse. Using computer programs, business researchers could “fish” for relationships by
cross-tabulating every categorical variable with every other categorical variable.Thus, every possible
response becomes a possible explanatory variable. A researcher addressing an exploratory research
question may find some benefit in such a fishing expedition. Software exists that can automatically
search through volumes of cross-tabulations.These may even provide some insight into the business
questions under investigation. Alternatively, the program may flag the cross-tabulations suggesting
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 20 Basic Data Analysis: Descriptive Statistics 491
the strongest relationship. CHAID (chi-square automatic interaction detection) software e xemplifies
software that makes searches through large numbers of variables possible.7 Data mining can be con-
ducted in a similar fashion and may suggest relationships that are worth considering further.
However, outside of exploratory research, researchers should conduct cross-tabulations that
address specific research questions or hypotheses. When hypotheses involve relationships among
two categorical variables, cross-tabulations are the right tool for the job.
Quadrant Analysis
Quadrant analysis is a variation of cross-tabulation in which responses to two rating scale ques- quadrant analysis
tions are plotted in four quadrants of a two-dimensional table. A common quadrant analysis in An extension of cross-
business research portrays or plots relationships between average responses about a product attri- tabulation in which responses
bute’s importance and average ratings of a company’s (or brand’s) performance on that product to two rating-scale questions
feature. The term importance-performance analysis is sometimes used because consumers rate are plotted in four quadrants of
perceived importance of several attributes and rate how well the company’s brand performs on that a two-dimensional table.
attribute. Generally speaking, the business would like to end up in the quadrant indicating high
performance on an important attribute. importance-performance
analysis
Exhibit 20.5 illustrates a quadrant analysis for an international, mid-priced hotel chain.8 The Another name for quadrant
chart shows the importance and the performance ratings provided by business travelers. After plot- analysis.
ting the scores for each of eight attributes, the analysis suggests areas for improvement.The arrows
indicate attributes that the hotel firm should concentrate on to move from quadrant three, which
means the performance on those attributes is low but business consumers rate those attributes as
important, to quadrant four, where attributes are both important and rated highly for performance.
High Importance EXHIBIT 20.5
An Importance-Performance
or Quadrant Analysis of
Hotels
Prompt Room
Service Room Cleanliness
Prices High Performance
Quietness
Low Performance
Attractive Breakfast
Interior Availability
Entertainment © Cengage Learning 2013
24-Hour
Room Service
Low Importance
Data Transformation
Simple Transformations data transformation
Data transformation (also called data conversion) is the process of changing the data from their Process of changing the data
original form to a format suitable for performing a data analysis that will achieve research objec- from their original form to
tives. Researchers often modify the values of scalar data or create new variables. For example, many a format suitable for
performing a data analysis
addressing research objectives.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•492 PART SIX Data Analysis and Presentation
All that we do is researchers believe that less response bias will result if interviewers ask respondents for their year
of birth rather than their age.This presents no problem for the research analyst, because a simple
‘‘done with an eye to data transformation is possible.The raw data coded as birth year can easily be transformed to age
by subtracting the birth year from the current year.
something else.
In earlier chapters, we discussed recoding and creating summated scales.These also are com-
’’—ARISTOTLE mon data transformations.
Collapsing or combining adjacent categories of a variable is a common form of data transfor-
mation used to reduce the number of categories. A Likert scale may sometimes be collapsed into
a smaller number of categories. For instance, consider the following Likert item administered to a
sample of state university seniors:
I am satisfied with my college Strongly Disagree Neutral Agree Strongly
experience at this university Disagree □ □ □ Agree
□
□
The following frequency table describes results for this survey item:
Strongly Disagree Disagree Neutral Agree Strongly Agree
110 30 15 35 210
The distribution of responses suggests the responses are bimodal. That is, two “peaks” exist
in the distribution, one at either end of the scale. Exhibit 20.6 shows an example of a bimodal
distribution. Since the vast majority of respondents [80 percent 5 (110 1 210)/400] indicated
either strongly disagree or strongly agree, the variable closely resembles a categorical variable. In
general, students either strongly disagreed or strongly agreed with the statement. So, the researcher
may wish to collapse the responses into two categories. While multiple ways exist to accomplish
this, the researcher may assign the value of one to all respondents who either strongly disagreed or
disagreed and the value two to all respondents who either agreed or strongly agreed. Respondents
marking neutral would be deleted from analysis. In this case, we would end up with 140 (110 1 30)
respondents that disagree with this statement and 245 (210 1 35) that agreed.
EXHIBIT 20.6 Histogram
125
Bimodal Distributions
Are Consistent with 100
Transformations into
Categorical Values
Frequency 75
50
25 70.00 80.00 90.00 Meanϭ 68.1429
Exams Std. Dev.ϭ21.82851
0 Nϭ 350
40.00 50.00 60.00 100.00
Adapted from 1987 Nielsen Television Report.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 20 Basic Data Analysis: Descriptive Statistics 493
Problems with Data Transformations
Researchers often perform a median split to collapse a scale with multiple response points into two median split
categories.The median split means respondents below the observed median go into one category
and respondents above the median go into another. Although this is common, the approach is best Dividing a data set into
applied only when the data do indeed exhibit bimodal characteristics. When the data are unimodal, two categories by placing
such as would be the case with normally distributed data, a median split will throw away valuable respondents below the median
information and lead to error. in one category and
respondents above the
Exhibit 20.7 illustrates this problem. Clearly, most respondents either slightly agree or slightly dis- median in another.
agree with this statement.The central tendency could be represented by the median of 3.5, a mean
of 3.5, and modes of 3 and 4 (3 and 4 each have the same number of responses). The “outliers,” if any,
appear to be those not indicating something other than slight agreement/disagreement. A case can be
made that the respondents indicating slight disagreement are more similar to those indicating slight
agreement than they are to those respondents indicating strong disagreement. Yet we can see the recode
places values 1 and 3 in the same new category, but places values 3 and 4 in a different category (see the
recoding scheme in Exhibit 20.7). The data distribution does not support a median split into two cate-
gories and so a transformation collapsing these values into agreement and disagreement is inappropriate.
When a sufficient number of responses exist and a variable is ratio, the researcher may choose
to delete one-fourth to one-third of the responses around the median to effectively ensure a
bimodal distribution. However, median splits should always be performed only with great care, as
the inappropriate collapsing of continuous variables into categorical variables ignores the informa-
tion contained within the untransformed values. Rather than splitting a continuous variable into
two categories to conduct a frequency distribution or cross-tabulation, we have more appropriate
analytical techniques that are discussed in the chapters which follow.
EXHIBIT 20.7
The Problem with Median Splits with Unimodal Data
Shop at Convenience Store Frequency Distribution: X1 5 I Do Most of My Shopping at
140 Convenience Stores.
120
100 Response Category (Code) Counts Cumulative Percentage
80
60 Strongly Disagree (1) 10 2.86%
40
Disagree (2) 40 14.29%
Slightly Disagree (3) 125 50.00%
Slightly Agree (4) 125 85.71%
Agree (5) 40 97.14%
Frequency
Strongly Agree (6) 10 100.00%
© Cengage Learning 2013
Median 5 3.5
Recode to Complete Data Transformation:
Old Values 1 2 3 4 5 6
New Values 1 1 1 2 2 2
20 2.00 3.00 4.00 5.00 Mean ϭ 3.50
0 Shop at Convenience Stores Std. Dev. ϭ 1.02616
N ϭ 350
1.00 6.00
Treated the Treated Treated the
Same Differently Same
Median (3.5)
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•494 PART SIX Data Analysis and Presentation
Index Numbers
index numbers The consumer price index and wholesale price index are secondary data sources that are frequently used
by business researchers. Price indexes, like other index numbers, represent simple data transformations
Scores or observations that allow researchers to track a variable’s value over time and compare a variable(s) with other variables.
recalibrated to indicate how Recalibration allows scores or observations to be related to a certain base period or base number.
they relate to a base number.
Consider the information in Exhibit 20.8.Weekly television viewing statistics are shown grouped
by household size. Index numbers can be computed for these observations in the following manner:
1. A base number is selected. The U.S. household average of 52 hours and 36 minutes represents
the central tendency and will be used.
2. Index numbers are computed by dividing the score for each category by the base number and
multiplying by 100. The index reflects percentage changes from the base:
1 person hh: 41:01 = 0.7832 * 100 = 78.32
2 person hh: 52:36
3+ person hh:
Total U.S. average: 47:58 = 0.9087 * 100 = 90.87
52:36
60:49 = 1.1553 * 100 = 115.53
52:36
52:36 = 1.0000 * 100 = 100.00
52:36
EXHIBIT 20.8 Household Size Hours:Minutes
Hours of Television Usage 1 41:01
per Week
2 47:58
31 60:49 © Cengage Learning 2013
Total U.S. average 52:36
Adapted from 1987 Nielsen Television Report.
If the data are time-related, a base year is chosen.The index numbers are then computed by
dividing each year’s activity by the base-year activity and multiplying by 100. Index numbers
require ratio measurement scales. Managers may often chart consumption in some category over
time. Relating back to the chapter vignette, grocers may wish to chart the U.S. wine consumption
index. Using 1968 as a base year, the current U.S. wine consumption index is just over 2.0, mean-
ing that the typical American consumer drinks about twice as much wine today as in 1968, which
is just over 8.7 liters of wine per year.9 The Research Snapshot “Twitter and the ReTweetability
Index” shows another application of data transformation and index creation.
Calculating Rank Order
Survey respondents are often asked to rank order their preference for some item, issue, or char-
acteristic. For instance, consumers may be asked to rank their three favorite brands or employee
respondents may provide rankings of several different employee benefit plans. Ranking data can
be summarized by performing a data transformation.The transformation involves multiplying the
frequency by the ranking score for each choice to result in a new scale.
For example, suppose a CEO had 10 executives rank their preferences for locations in which
to hold the company’s annual conference. Exhibit 20.9 shows how executives ranked each of four
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
RESEARCH
SNAPSHOT
Twitter and the ReTweetability Index
Twitter is one of the fastest growing social networks. A © Waynehowes/Shutterstock
p rivately funded organization in San Francisco, Twitter’s first
prototype was developed in March of 2006 and launched pub- Encouraging other Twitter users to retweet your messages is the
licly five months later. Since then, Twitter has evolved into a key in spreading your message across the Twittersphere. Wow!
real-time messaging service compatible with several different
networks and multiple devices: Dan Zarrella, a self-proclaimed viral marketing scientist, has
developed an index to assess the most influential Twitter users.
Simplicity has played an important role in Twitter’s success. While several sites rank users by their number of followers,
People are eager to connect with other people and T witter and others report the number of RTs, Zarrella has combined
makes that simple. Twitter asks one question, “What are you these figures with the daily number of tweets to calculate the
doing?” Answers must be under 140 characters in length and ReTweetability Index:
can be sent via mobile texting, instant message, or the web.
(Retweets per Day/Tweets per Day)/Followers
Twitter’s core technology is a device agnostic m essage
routing system with rudimentary social networking fea- The index is intended to provide a score and ranking of Twitter
tures. By accepting messages from sms, web, mobile web, users based on the power of their tweets. The higher the num-
instant message, or from third party API projects, Twitter ber, the more influential your Twitter account is!
makes it easy for folks to stay connected.
Sources: “About Twitter,” Twitter, http://twitter.com/about; Dan Zarrella’s
If you are not familiar with Twitter, a basic understanding of the ReTweetability Index, http://www.retweetability.com; Saric, Marko, “Make
terminology is necessary. After signing up for a Twitter account, Your Blog Go Viral with Twitter ReTweets,” HowToMakeMyBlog.com
you can tweet your 140 character message. Followers are people (January 13, 2009), http://www.howtomakemyblog.com/twitter/
who have signed up to receive someone’s Twitter messages. make-your-blog-go-viral-with-twitter-retweets.
A retweet (or RT) occurs when a follower takes a tweet and then
tweets that message to everyone in their own Twitter network.
Executive Hawaii Paris Greece Hong Kong EXHIBIT 20.9
1 1 2 4 3 Executive Rankings of
2 1 3 4 2 Potential Conference
3 2 1 3 4 Destinations
4 2 4 3 1
5 2 1 3 4 © Cengage Learning 2013
6 3 4 1 2
7 2 3 1 4
8 1 4 2 3
9 4 3 2 1
10 2 1 3 4
locations: Hawaii, Paris, Greece, and Hong Kong. Exhibit 20.10 tabulates frequencies for these
rankings.A ranking summary can be computed by assigning the destination with the highest pref-
erence the lowest number (1) and the least preferred destination the highest consecutive number
(4).The summarized rank orderings were obtained with the following calculations:
Hawaii: (3 31) 1 (5 32) 1 (1 33) 1 (1 34) = 20
Paris: (3 31) 1 (1 32) 1 (3 33) 1 (3 34) = 26
Greece: (2 31) 1 (2 32) 1 (4 33) 1 (2 34) = 26
Hong Kong: (2 31) 1 (2 32) 1 (2 33) 1 (4 34) = 28
495
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•496 PART SIX Data Analysis and Presentation
EXHIBIT 20.10 Destination 1st Preference Rankings 4th
Hawaii 3 2nd 3rd 1
Frequencies of Conference Paris 3 3
Destination Rankings Greece 2 51 2
Hong Kong 2 13 4
24 © Cengage Learning 2013
22
Three executives chose Hawaii as the best destination (ranked 1), five executives selected
Hawaii as the second best destination, and so forth.The lowest total score indicates the first (highest)
preference ranking.The results show the following rank ordering: (1) Hawaii, (2) Paris, (3) Greece,
and (4) Hong Kong. Company employees may be glad to hear their conference will be in Hawaii!
Tabular and Graphic Methods of
Displaying Data
Tables, graphs, and charts may simplify and clarify data. Graphical representations of data may take a
number of forms, ranging from a computer printout to an elaborate pictograph.Tables, graphs, and
charts, however, all facilitate summarization and communication. For example, see how the simple
frequency table and histogram shown in Exhibit 20.7 provide a summary that quickly and easily com-
municates meaning that would be more difficult to see if all 350 responses were viewed separately.
Today’s researcher has many convenient tools to quickly produce charts, graphs, or tables. Even
common programs such as Excel andWord include chart functions that can construct the chart within
the text document. Bar charts (histograms), pie charts, curve/line diagrams, and scatter plots are among
the most widely used tools. Some choices match well with certain types of data and analyses.
Bar charts and pie charts are very effective in communicating frequency tabulations and simple
cross-tabulations. Exhibit 20.11 displays frequency data from the chapter vignette with pie charts.
Each pie summarizes preference in the respective year. The size of each pie slice corresponds to
a frequency value associated with that choice.When the three pie charts are compared, the result
EXHIBIT 20.11
Pie Charts Work Well with Tabulations and Cross-Tabulations
1992 Beverage Preference 2005 Beverage Preference 2008 Beverage Preference
Other Other Other
5% 4% 4%
Wine Beer Wine Beer
27% Wine 36% 31% 42%
39%
Beer
47%
Spirits Spirits Spirits © Cengage Learning 2013
21% 21% 23%
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 20 Basic Data Analysis: Descriptive Statistics 497
communicates a cross-tabulation. Here, the comparison clearly communicates that wine preference
increased at the expense of beer preference from 1992 to 2005, but has yielded some ground in
2008. In other words, the relative slice of pie for wine became larger, then slightly smaller.
Chapter 25 discusses how these and other graphic aids may improve the communication value
of a written report or oral presentation.
Computer Programs for Analysis
Statistical Packages
Just 50 years ago, the thought of a typical U.S. company performing even basic statistical analy-
ses, like cross-tabulations, on a thousand or more observations was unrealistic. The personal
computer brought this capability not just to average companies, but to small companies and
individuals with limited resources.Today, computing power is very rarely a barrier to complet-
ing a research project.
In the 1980s and early 1990s, when the PC was still a relatively novel innovation, special-
ized statistical software formerly used on mainframe computers made their way into the personal
computing market.Today, most spreadsheet packages can perform a wide variety of basic statistical
options. Excel’s basic data analysis tool will allow descriptive statistics including frequencies and
measures of central tendency to be easily computed.10 Most of the basic statistical features are now
menu driven, reducing the need to memorize function labels. Spreadsheet packages like Excel
continue to evolve and become more viable for performing many basic statistical analyses.
Despite the advances in spreadsheet applications, commercialized statistical software packages
remain extremely popular among researchers. They continue to become easier to use and more
compatible with other data interface tools including spreadsheets and word processors. Like any
specialized tool, statistical packages are more tailored to the types of analyses performed by statisti-
cal analysts, including business researchers. Thus, any serious business or social science researcher
should still become familiar with at least one general computer software package.
Two of the most popular general statistical packages are SAS (http://www.sas.com) and SPSS
(http://www.spss.com). SAS revenues exceed $2.15 billion in 2008 and its software can be found
on computers worldwide. SAS was founded in 1976, and its statistical software historically has been
widely used in engineering and other technical fields. SPSS stands for Statistical Package for the
Social Sciences, and was founded in 1968. SPSS is commonly used by university business and social
science students. Business researchers have traditionally used SPSS more than any other statistical
software tool. SPSS has been viewed as more “user-friendly” in the past. However, today’s versions
of both SPSS and SAS are very user-friendly and give the user the option of using drop-down
menus to conduct analysis rather than writing computer code.
Excel, SAS, and SPSS account for most of the statistical analysis conducted in business research.
University students may also be exposed to MINITAB, which is sometimes preferred by economists.
However, MINITAB has traditionally been viewed as being less user-friendly than other choices.
In the past, data entry was an issue as specific software required different types of data input.
Today, however, all the major software packages, including SAS and SPSS, can work from data
entered into a spreadsheet. The spreadsheets can be imported into the data windows or simply
read by the program. Most conventional online survey tools will return data to the user in the form
of either an SPSS data file, an Excel spreadsheet, or a plain text document.
Exhibit 20.12 shows a printout of descriptive statistics generated by SAS for two variables: EMP
(number of employees working in an MSA, or Metropolitan Statistical Area) and SALES (sales vol-
ume in dollars in an MSA) for 10 MSAs.The number of data elements (N), mean, standard devia-
tion, and other descriptive statistics are displayed. SAS output is generally simple and easy to read.
As an example of SPSS output, the histograms shown in Exhibits 20.6 and 20.7 were created
by SPSS. By clicking on “Charts” in the SPSS Tools menu, one can see the variety of charts that can
be created.The key place to click to generate statistical results in tabular form is “Analyze.” Here,
one can see the many types of analysis that can be created. In this chapter, the choices found by
clicking on “Analyze” and then “Descriptive Statistics” are particularly relevant.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•498 PART SIX Data Analysis and Presentation
EXHIBIT 20.12
SAS Computer Output of Descriptive Statistics
State = NY Standard Minimum Maximum Std. Error
Variable N Mean Deviation Value Value of Mean Sum Variance C.V.
EMP 10 142.930 232.665 12.800 788.800 73.575 1429.300 54133.0 162.782 © Cengage Learning 2013
307.000 39401.000 3764.732
SALES 10 5807.800 11905.127 58078.000 141732049.1 204.985
Key: EMP = number of employees (000) SALES = Sales (000)
EXHIBIT 20.13 CLASS * SMOKING Cross-Tabulation
Examples of SPSS Output for Smoking
Cross-Tabulation
Count Smoker Nonsmoker Total
Class 16
High School 7 9 31
Undergraduate 9 22 25
12
Graduate 15 10 84 © Cengage Learning 2013
Career 66
37 47
Total
Exhibit 20.13 shows an SPSS cross-tabulation of two variables, class status and smoking behav-
ior. The data come from a sample intercepted on an urban university campus. It addresses the
research question,“Does smoking on campus vary across groups?” More nonsmokers than smok-
ers are found. However, the results show that graduate students, and to a lesser extent instructors,
smoke more than the norm.The SPSS user can ask for any number of statistics and percentages to
be included with this output by clicking on the corresponding options.
The thing to do is Computer Graphics and Computer Mapping
‘‘to supply light. Graphic aids prepared by computers have replaced graphic presentation aids drawn by artists.
’’—WOODROW Wilson Computer graphics are extremely useful for descriptive analysis. As mentioned in Chapter 2,
decision support systems can generate two- or three-dimensional computer maps to portray data
box and whisker plots about sales, demographics, lifestyles, retail stores, and other features. Exhibit 20.14 shows a com-
Graphic representations of puter graphic depicting how fast-food consumption varies from state to state.The chart shows the
central tendencies, percentiles, relative frequencies of eating fast-food burgers, chicken, tacos, or other types of fast food across
variabilities, and the shapes of several states. Computer graphics like these have become more common as common applications
have introduced easy ways of generating 3D graphics and maps. Many computer maps are used by
frequency distributions. business executives to show locations of high-quality customer segments. Competitors’ locations
are often overlaid for additional quick and easy visual reference. Scales that show miles, population
interquartile range densities, and other characteristics can be highlighted in color, with shading, and with symbols.
A measure of variability.
Many computer programs can draw box and whisker plots, which provide graphic represen-
tations of central tendencies, percentiles, variabilities, and the shapes of frequency distributions.
Exhibit 20.15 shows a computer-drawn box and whisker plot for 100 responses to a question
measured on a 10-point scale. The response categories are shown on the vertical axis. The small
box inside the plot represents responses for half of all respondents.Thus, half of respondents marked
4, 5, or 6.This gives a measure of variability called the interquartile range, but the term midspread
is less complex and more descriptive.The location of the line within the box indicates the median.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 20 Basic Data Analysis: Descriptive Statistics 499
Fast Food Consideration EXHIBIT 20.14
80–90 A 3D Graph Showing Fast-
70–80 Food Consumption Patterns
60–70 around the United States
50–60
90 40–50
80 30–40
70 20–30
60 10–20
50 0–10
40
30 Tacos © Cengage Learning 2013
20 Chicken
10 Burger
0
Ohio Kansas Arizona
Louisiana Colorado
Potential Outliers Potential Outliers EXHIBIT 20.15
10.0 Computer-Drawn Box and
Whisker Plot
*
8.00
6.00
4.00
Potential Outliers
2.00
0.0
Mean 5.40
Median 5.00
75th percentile 6.00
25th percentile 4.00
Standard deviation 1.62
Source: From “Graphic Displays of Data: Box and Whisker Plots,” Reports No. 17, Market Facts, Inc.
The dashed lines that extend from the top and bottom of the box are the whiskers. Each whisker outlier
extends either the length of the box (the midspread in our example is two scale points) or to the
most extreme observation in that direction. A value that lies outside the
normal range of the data.
An outlier is a value that lies outside the normal range of the data. In Exhibit 20.15 outliers
are indicated by either a 0 or an asterisk. Box and whisker plots are particularly useful for spotting
outliers or comparing group categories (e.g., men versus women).
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•500 PART SIX Data Analysis and Presentation
Interpretation
interpretation An interpreter at the United Nations translates a foreign language into another language to
explain the meaning of a foreign diplomat’s speech. In business research, the interpretation process
The process of drawing explains the meaning of the data.After the statistical analysis of the data, inferences and conclusions
inferences from the analysis about their meaning are developed.
results. A distinction can be made between analysis and interpretation. Interpretation is drawing infer-
ences from the analysis results. Inferences drawn from interpretations lead to managerial implica-
tions. In other words, each statistical analysis produces results that are interpreted with respect
to insight into a particular decision. The logical interpretation of the data and statistical analysis
are closely intertwined. When a researcher calculates a cross-tabulation of employee number of
dependents with choice of health plan, an interpretation is drawn suggesting that employees with
a different number of dependents may be more or less likely to choose a given health plan. This
interpretation of the statistical analysis may lead to a realization that certain health plans are better
suited for different family situations.
From a management perspective, however, the qualitative meaning of the data and their mana-
gerial implications are an important aspect of the interpretation. Consider the crucial role played
by interpretation of research results in investigating one new product, a lip stain that could color
the lips a desired shade semipermanently and last for about a month at a time:
The lip stain idea, among lipstick wearers, received very high scores on a rating scale ranging from “excel-
lent” to “poor,” presumably because it would not wear off. However, it appeared that even among routine
wearers of lipstick the idea was being rated highly more for its interesting, even ingenious, nature than for
its practical appeal to the consumer’s personality. They liked the idea, but for someone else, not them-
selves. . . . [Careful interpretation of the data] revealed that not being able to remove the stain for that
length of time caused most women to consider the idea irrelevant in relation to their own personal needs
and desires. Use of the product seems to represent more of a “permanent commitment” than is usually
associated with the use of a particular cosmetic. In fact, women attached overtly negative meaning to the
product concept, often comparing it with hair dyes instead of a long-lasting lipstick.11
This example shows that interpretation is crucial. However, the process is difficult to explain
in a textbook because there is no one best way to interpret data. Many possible interpretations of
data may be derived from a number of thought processes. Experience with selected cases will help
you develop your own interpretative ability.
Data are sometimes merely reported and not interpreted. Research firms may provide reams
of computer output that do not state what the data mean. At the other extreme, some researchers
tend to analyze every possible relationship between each and every variable in the study. Such an
approach is a sign that the research problem was not adequately defined prior to beginning the
research and the researcher really doesn’t know what business decision the research is addressing.
Researchers who have a clear sense of the purpose of the research do not request statistical analyses
of data that have little or nothing to do with the primary purpose of the research.
SUMMARY
1. Know what descriptive statistics are and why they are used. D escriptive analyses provide
descriptive statistics including measures of central tendency and variation. Statistics such as the
mean, mode, median, range, variance, and standard deviation are all descriptive statistics. These
statistics provide a summary describing the basic properties of a variable.
2. Create and interpret simple tabulation tables. Statistical tabulation is another way of saying
that we count the number of observations in each possible response category. In other words,
tabulation is the same as tallying. Tabulation is an appropriate descriptive analysis for less-than
interval variables. Frequency tables and histograms are used to display tabulation results.
3. Understand how cross-tabulations can reveal relationships. Cross-tabulation is when we
combine two or more less-than interval variables to display the relationship. For example, a
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 20: Basic Data Analysis: Descriptive Statistics 501
cross-tabulation of respondent gender with adult beverage preference (i.e., beer, spirits, wine)
would give us two rows (male and female) and three columns (beer, spirits, wine), which would
show the preferred beverage for each gender. The key to interpreting a cross-tabulation result is
to compare actual observed values with hypothetical values that would result from pure chance.
When observed results vary from these values, a relationship is indicated.
4. Perform basic data transformations. Data transformations are often needed to assist in data
analysis and involve changing the mathematical form of data in some systematic way. Basic data
transformations include reverse coding, summating scales, creating index numbers, and collapsing
a variable based on a median split.
5. List different computer software products designed for descriptive statistical
analysis. While spreadsheets have improved with respect to their ability to conduct basic statisti-
cal analyses, business researchers still rely heavily on specialized statistical software. SAS and SPSS
are two of the best known statistical packages. Each is available for even the most basic modern
PC and can be used with a drop-down window interface, practically eliminating the need for
writing computer code.
6. Understand a researcher’s role in interpreting the data. The interpretation process explains
the meaning of the data. Interpretation is drawing inferences from the analysis results, providing
meaning for the figures that are observed. Inferences drawn from interpretations lead to manage-
rial implications.
KEY TERMS AND CONCEPTS
box and whisker plots, 498 histogram, 485 moderator variable, 490
contingency table, 487 importance-performance analysis, 491 outlier, 499
cross-tabulation, 486 index numbers, 494 quadrant analysis, 491
data transformation, 491 interpretation, 500 statistical base, 488
descriptive analysis, 484 interquartile range, 498 tabulation, 486
elaboration analysis, 490 marginals, 487
frequency table, 486 median split, 493
QUESTIONS FOR REVIEW AND CRITICAL THINKING
1. What are five descriptive statistics used to describe the basic properties of variables?
2. What is a histogram? What is the advantage of overlaying a normal distribution over a histogram?
3. A survey asks respondents to respond to the statement, “My work is interesting.” Interpret the frequency distribution shown here (taken
from an SPSS output):
My work is interesting:
Category Label Code Abs. Freq. Rel. Freq. (Pct.) Adj. Freq. (Pct.) Cum. Freq. (Pct.)
Very true 62.4
Somewhat true 1 650 23.9 62.4 91.5
Not very true 2 303 11.2 29.1 97.3
Not at all true 3
4 61 2.2 5.9 100.0
Valid cases • 28 1.0 2.7
Total 1,673 61.6 Missing
1,042 2,715 100.0 100.0
Missing cases 1,673
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
502 Part 6: Data Analysis and Presentation
4. Use the data in the following table to
a. Prepare a frequency distribution of the respondents’ ages
b. Cross-tabulate the respondents’ genders with cola preference
c. Identify any outliers
Individual Gender Age Cola Weekly Unit
Preference Purchases
James M 19
Parker M 17 Coke 2
Bill M 20 Pepsi 5
Laurie F 20 Pepsi 7
Jim M 18 Coke 2
Jill F 16 Coke 4
Tom M 17 Coke 4
Julia F 22 Pepsi 12
Amie F 20 Pepsi 6
Dawn F 19 Pepsi 2
Pepsi 3
5. Data on the average size of a soda (in ounces) at all 30 major league baseball parks are as follows: 14, 18, 20, 16, 16, 12, 14, 16, 14, 16,
16, 16, 14, 32, 16, 20, 12, 16, 20, 12, 16, 16, 24, 16, 16, 14, 14, 12, 14, 20. Compute descriptive statistics for this variable including a
box and whisker plot. Comment on the results.
6. The following computer output shows a cross-tabulation of frequencies and provides frequency number (N) and row (R) percentages.
a. Interpret this output including a conclusion about whether or not the row and column variables are related.
b. Critique the way the analysis is presented.
c. Draw a pie chart indicating percentages for having read a book in the past three months for those with and those without high
school diplomas.
Have You Read Have High School Diploma?
a Book in Past
3 Months? Yes No Total
Yes 663
No 489 174
73.8 26.2 851
Total 473 378
55.6 44.4 ......
...... ...... 1,514
962 552
7. List and describe at least three basic data transformations.
8. What conditions suggest that a ratio variable should be transformed (recoded) into a dichotomous (two group) variable?
9. A data processing analyst for a research supplier finds that preliminary computer runs of survey results show that consumers love a client’s
new product. The employee buys a large block of the client’s stock. Is this ethical?
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 20: Basic Data Analysis: Descriptive Statistics 503
RESEARCH ACTIVITIES
1. ’NET Go the website for the Chicago Cubs baseball team c. Extra analysis: Repeat the analyses for the Houston Astros
(http://chicago.cubs.mlb.com). Use either the schedule listing baseball team (http://houston.astros.mlb.com). What does
or the stats information to find their record in the most recent this suggest for the relationship between playing at home
season. Create a data file with a variable indicating whether
each game was won or lost and a variable indicating whether and winning?
the game was played at home in Wrigley Field or away from 2. ’NET Go to http://www.spss.com and click on Solutions, and
home. Using computerized software like SPSS or SAS,
a. Compute a frequency table and histogram for each variable. then select an industry. What services does the company provide
b. Use cross-tabulations to examine whether a relationship for that particular industry?
exists between where the game is played (home or away)
and winning.
Body on Tap
CASE A few years ago Vidal Sassoon, Inc., took legal action When the “very good” and “good” ratings were combined with the
20.1 against Bristol-Myers over a series of TV com- “outstanding” and “excellent” ratings, however, there was only a dif-
ference of one percent between the two products in the category of
mercials and print ads for a shampoo that had been “strong, healthy-looking hair.”
named Body on Tap because of its beer content.12 The prototype
commercial featured a well-known high fashion model saying, “In The research was conducted for Bristol-Myers by Market-
shampoo tests with over 900 women like me, Body on Tap got ing Information Systems, Inc. (MISI), using a technique known
higher ratings than Prell for body. Higher than Flex for condition- as blind monadic testing.The president of MISI testified that this
ing. Higher than Sassoon for strong, healthy-looking hair.” method typically is employed when what is wanted is an abso-
lute response to a product “without reference to another specific
The evidence showed that several groups of approximately product.”Although he testified that blind monadic testing was
200 women each tested just one shampoo.They rated it on a six- used in connection with comparative advertising, that was not
step qualitative scale, from “outstanding” to “poor,” for 27 separate the purpose for which Bristol-Myers retained MISI. Rather,
attributes, such as body and conditioning. It became clear that Bristol-Myers wished to determine consumer reaction to the
900 women did not, after trying both shampoos, make product- introduction of Body on Tap. Sassoon’s in-house research expert
to-product comparisons between Body on Tap and Sassoon or stated flatly that blind monadic testing cannot support comparative
between Body on Tap and any of the other brands mentioned. In advertising claims.
fact, no woman in the tests tried more than one shampoo.
Question
The claim that the women preferred Body on Tap to Sassoon for Comment on the professionalism of the procedures used to make
“strong, healthy-looking hair” was based on combining the data for the advertising claim. Why do you believe the researchers performed
the “outstanding” and “excellent” ratings and discarding the lower the data transformations described?
four ratings on the scale.The figures then were 36 percent for Body
on Tap and 24 percent (of a separate group of women) for Sassoon.
Downy-Q Quilt
CASE The research for Downy-Q is an example of a com- Method
20.2 mercial test that was conducted when an advertis- Brand choices for the same individuals were obtained before
and after viewing the commercial. The commercial was tested
ing campaign for an established brand had run its in 30-second, color-moving, storyboard form in a theater test.
course.13 The revised campaign, “Fighting the Cold,” emphasized Invited viewers were shown programming with commercial
that Downy-Q was an “extra-warm quilt”; previous research had inserts. Q ualified respondents were women who had bought
demonstrated that extra warmth was an important and deliverable quilts in outlets that carried Downy-Q. The results are shown in
product quality. The commercial test was requested to measure the Case Exhibits 20.2-1 through 20.2-4.
campaign’s ability to generate purchase interest.
Question
The marketing department had recommended this revised adver- Interpret the data in these tables. What recommendations and
tising campaign and was now anxious to know how effectively this c onclusions would you offer to Downy-Q management?
commercial would perform.The test concluded that “Fighting the
Cold” was a persuasive commercial. It also demonstrated that the new (Continued )
campaign would have greater appeal to specific market segments.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
504 Part 6: Data Analysis and Presentation
Case 20.2 (Continued )
CASE EXHIBIT 20.2-1
Shifts in Brand Choice before and after Showing of Downy-Q Quilt Commercial
Question: We are going to give away a sample of fabric softener. You can select the
brand you most prefer. Which brand would you chose?
Brand Choice
after Commercial (%)
Brand Choice before Downy-Q Other Brand
Commercial (n 5 23) (n 5 237)
Downy-Q
Other brand 78 19
22 81
CASE EXHIBIT 20.2-2
Pre/Post Increment in Choice of Downy-Q
Improvement in score based on exposure to commercial.
“Fighting the Cold” Norm: All Quilt Commercials
Demographic Group Base Score Average Range
Total audience (260) +15 +10 6–19
By marital status (130) +17
Married (130) +12
Not married
(130) +14
By age (130) +15
Under 35
35 and over (90) +13
(170) +18
By employment status
Not employed
Employed
CASE EXHIBIT 20.2-3
Adjective Checklist for Downy-Q Quilt Commercial
Question: Which of these words do you feel come closest to describing the commercial
you’ve just seen? (Check all that apply.)
Adjective “Fighting the Cold” (%) Norm: All Quilt Commercials (%)
Positive 19 23
Appealing 5 24
Clever 24
Convincing 18 40
Effective 11 14
Entertaining 20 21
Fast moving 12
Genuine 4
Imaginative 7 21
Informative 7 18
Interesting 24 17
Original 13 20
Realistic 7
Unusual 8 3
3 8
(Continued )
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 20: Basic Data Analysis: Descriptive Statistics 505
CASE EXHIBIT 20.2-3 (Continued)
Adjective Checklist for Downy-Q Quilt Commercial
Negative 9 11
Amateurish 4 4
Bad Taste 33
Dull 17 20
Repetitious 8 16
Silly 8 19
Slow 3
Unbelievable 3 7
Unclear 14 5
Unimportant 32 2
Uninteresting 14
19
CASE EXHIBIT 20.2-4
Product Attribute Checklist for Downy-Q
Question: Which of the following statements do you feel apply to Downy-Q? (Mark as
many or as few as you feel apply.)
Attributes “Fighting the Cold” (%)
Extra warm 56
Lightweight 48
Pretty designs 45
Durable fabrics 28
Nice fabrics 27
Good construction 27
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Univariate Statistical
Analysis
21CHAPTER LEARNING OUTCOMES
After studying this chapter, you should be able to
1. Implement the hypothesis-testing procedure
2. Use p-values to test for statistical significance
3. Test a hypothesis about an observed mean compared to some
standard
4. Know the difference between Type I and Type II errors
5. Know when a univariate x2 test is appropriate and how to
conduct one
Chapter Vignette:
Well, Are They Satisfied or Not?
E d Bond had worked for PrecisionMetals for six guess. What about Madison?” Ed, realizing that he was not
years, but had really only served as an analyst for communicating the information well, responded, “The satisfac-
the production facility. This was the first corpo- tion score from Madison is 3.5. Historically, both plants have
rate-level opportunity to showcase his research had a satisfaction score of 3.5.”
skills. His corporate contacts are Rob Baer, who currently serves Kathy realized that Ed was getting flustered. It was time to
reassure him. “Ed, we appreciate what you are doing. I’m sorry
as Chief Operations Officer, and Kathy Hahn, the Chief Execu- but I don’t know exactly what the scores mean. Is 3.9 good?
Is 3.5 good? Is the difference between Richmond this year and
tive Officer for PrecisionMetals. Rob and Kathy specifically asked the scores we have seen in the past significant? Is the difference
between those scores enough to explain the difference in our
to meet with Ed about the employee satisfaction survey con-
ducted a month ago.
“Ed, we continue to worry about losing metalwork employ-
ees at our Madison plant, but our Richmond plant seems to be
improving in terms of turnover,” Rob stated. “What is your take
on our employee satisfaction?” Ed replied, “We put together
an index of three questions that asked about job satisfaction.
We have analyzed the data from the Richmond plant, and our
average satisfaction is 3.9.”
Kathy asked, “What does 3.9 mean? What am I supposed to
take away from that?” Ed responded, “I’m sorry, I should © Blend Images/Jupiter Images
have explained this better. We asked the employees on a scale
with five categories, with 1 meaning ’Strongly Disagree,’ and
5 meaning ’Strongly Agree.’ When the scores are averaged
for Richmond, our overall satisfaction is 3.9 on this 5.0 point
scale.” Rob continued, “Is that good or bad? It sounds OK…I
506
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 21 Univariate Statistical Analysis 507
turnover? I just want to know if the survey shows if our employ- score with the old scores,” he thought. And as he walked into his
ees are satisfied or not.” office and shut the door quietly he said to himself, “I can’t just
speak about scores of 3.9 and 3.5. I’m here to help them under-
Ed went back to the research section, with two things on his stand what the scores really mean.” It was time to get to work!
mind. “I’ve got to actually compare the Richmond satisfaction
Introduction
Empirical testing of business research data typically involves inferential statistics. This means that univariate statistical
an inference or conclusion will be drawn about the population based on observations of a sample analysis
representing that population. Statistical analysis can be divided into several groups: Tests of hypotheses involving
only one variable.
■■ Univariate statistical analysis tests hypotheses involving only one variable.
■■ Bivariate statistical analysis tests hypotheses involving two variables. bivariate statistical analysis
■■ Multivariate statistical analysis tests hypotheses and models involving multiple (three or Tests of hypotheses involving
two variables.
more) variables or sets of variables.
multivariate statistical
The focus in this chapter is on univariate statistics. Thus, we examine statistical tests appropriate analysis
for drawing inferences about a single variable. In the chapter vignette, PrecisionMetals execu- Statistical analysis involving
tives are interested in the employee satisfaction at their plant and how it compares to the histori- three or more variables or sets
cal score. This represents an opportunity to test hypotheses about a single variable—in this case of variables.
employee job satisfaction. The survey data regarding job satisfaction will be analyzed and tested
against the historical benchmark of 3.5.
Hypothesis Testing
Descriptive research and causal research designs often conclude with hypothesis tests. Hypotheses
are defined as formal statements of explanations stated in a testable form. Generally, hypotheses
should be stated in concrete fashion so that the method of empirical testing seems almost obvious.
Types of hypotheses tested commonly in business research include the following:
1. Relational hypotheses—examine how changes in one variable vary with changes in another.
This is usually tested by assessing covariance in some way, very often with correlation and
regression analysis.
2. Hypotheses about differences between groups—examine how some variable varies from one
group to another. These types of hypotheses are very common in causal designs, but are also
used to examine group differences in survey research. The tests are often t-tests or ANOVA.
3. Hypotheses about differences from some standard—examine how some variable differs from
some preconceived standard. The preconceived standard sometimes represents the true value
of the variable in a population. These tests can involve either a test of a mean for interval or
ratio variables or a test of frequencies if the variable is ordinal or nominal. These tests typify
univariate statistical tests.
The Hypothesis-Testing Procedure
Process
Hypotheses are tested by comparing the researcher’s educated guess with empirical reality. In other
words, how does what we expected compare to the data we have gathered? The process can be
described as follows:
1. First, the hypothesis is derived from the research objectives. The hypothesis should be stated
as specifically as possible.
2. Next, a sample is obtained and the relevant variable is measured.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
STUHRVISEY! Courtesy of Qualtrics.com
Hypothesis testing is a critical part of what business
researchers do for the organization. It is particularly
important to understand how data you gather com-
pare to benchmarks set by your work group, your
firm, or even your industry. Here is a short exercise
that will help you understand the importance of this
analysis.
1. Select two variables from the Qualtrics survey (job
performance characteristics, customer satisfaction,
etc.) that could serve as a possible benchmark for
a firm.
2. Using frequency distributions of both variables, identify the mean and standard deviation of both variables.
3. Develop a hypothesis statement for both variables.
4. Conduct a hypothesis test for both variables, setting your benchmark value to the scale midpoint (4.0).
5. Notice and comment on the significance of these tests for both variables. What do the results tell you?
3. The measured value obtained in the sample is compared to the value either stated explicitly
or implied in the hypothesis. If the value is consistent with the hypothesis, the hypothesis is
supported. If the value is not consistent with the hypothesis, the hypothesis is not supported.
A univariate hypothesis consistent with the chapter vignette would be
H1:The average satisfaction at the Richmond plant is greater than 3.5.
If the average job satisfaction is 3.4, the hypothesis is obviously not supported. However, if the aver-
age job satisfaction is 3.9, is the hypothesis supported? While we all know 3.9 is higher than 3.5,
we are trying to determine if the results of our sample can be inferred to the population. In other
words, is the observed value an accurate reflection of the population or possibly an artifact of our
sample? Examining this issue is the role of hypothesis testing.
Univariate hypotheses are typified by tests comparing some observed sample mean against
a benchmark value. The test addresses the question, “Is the sample mean truly different from the
benchmark?” But how different is really different? If the observed sample mean is 3.55 and the
benchmark is 3.50, would the hypothesis still be supported? Probably not! How about 3.60? When
the observed mean is so close to the benchmark, we do not have sufficient confidence that a second
set of data, using a new sample taken from the same population, would show the same results. In
contrast, when the mean turns out well above 3.5, perhaps 3.9, then we could more easily trust that
another sample would also not produce a mean equal to or less than 3.5.
In statistics classes, students are exposed to hypothesis testing as a contrast between a null and
an alternative hypothesis. Perhaps most simply, the “null” hypothesis can be thought of as the oppo-
site of the actual hypothesis. In other words, the null hypothesis is constructed such that disproving
it allows us to conclude what we assume to be the truth.The null to the hypothesis presented in
the Research Snapshot regarding the “Freshman 7.8” is:
H0:The average number of pounds gained in the freshman year of college is not equal to 7.8.
The alternative hypothesis states the opposite of the null and represents what we actually believe is
the true situation. In this case, the alternative hypothesis is:
H1:The average number of pounds gained in the freshman year of college is equal to 7.8.
So, the researcher’s real hypothesis is generally stated in the form of an “alternative” hypothesis.
While this terminology is common in statistical theory, we certainly understand it is confusing.
Therefore, the use of the term null hypothesis will be avoided when at all possible. The reader
should instead focus on what the findings should look like if the proposed hypothesis is true. If the
hypothesis above is true, an observed sample’s mean should be approximately 7.8.We then test to
see if this idea can be supported by the empirical evidence.
508
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
RESEARCH
SNAPSHOT
The “Freshman 7.8”
There is a common belief that when college freshman © Workbook Stock/Getty Images
students start their first semester away from their families,
they gain 15 pounds in the first year. Commonly referred to The test was conducted with 46 male and female students
as the “Freshman 15,” few research studies have examined if with the results shown below:
this extra 15 pounds actually appears. In fact, this belief is so
prevalent that at an annual meeting of the Obesity Society, N Mean Std. Deviation Std. Error Mean
the current generation of college students was referred to as
“Generation XL.” Students 46 5.63 2.51 0.369
Researchers at Purdue University conducted a study of Was the self-reported weight gain of these students supportive
freshman-year weight gain, using 907 freshman students. of the hypothesis? The univariate statistic testing this result sug-
Their results were consistent with another study at Brown gests the answer to this question is no. The p-value for this test
University. For both universities, freshman students gained is less than 0.001, which supports the position that the mean
between 6 and 8 pounds, with the Purdue average being number of pounds gained by our students is significantly less
7.8 pounds. Male students were more likely to gain weight than 7.8 pounds. It certainly suggests that the “Freshman 15”
than female students. Clearly, students were gaining weight. should lose a few pounds!
Many of them placed the blame on their newfound freedom.
It was just too easy to eat whatever and whenever they Test Value 5 7.8
wanted. However, it appears that the belief that new stu-
dents experience the “Freshman 15” was actually quite a bit 95% Confidence
higher than reality. Interval of the
Difference
Is the weight gain experienced by the Purdue students p-value Mean
typical? As a test, we asked our own students about their T df (two-tailed) Difference Lower Upper
weight gain during their freshman year. Granted, it was
certainly a subjective assessment, as students were not Pounds 25.86 45 0.000 22.166 22.911 21.421
weighed before or after they started their university educa-
tion. But it does allow us to conduct a hypothesis test: Given Source: Hellmich, Nancy, “Freshman 15 Drops Some Pounds,” USA Today
the results of the Purdue University study, do our students (October 23, 2006), http://www.usatoday.com/news/health/2006-10-22-
report the same 7.8 pounds of weight gain in their first year freshman-weight_x.htm, accessed August 18, 2011.
of school?
Empirical evidence is provided by test results comparing the observed mean against some
sampling distribution.The variance in observations also plays a role because with greater variance,
there is more of a chance that the confidence interval includes 7.8. A statistical test’s significance
level or p-value (typically .05) is the key indicator of whether or not a hypothesis can be supported.
Significance Levels and p-Values significance level
A significance level is a critical probability associated with a statistical hypothesis test that indicates A critical probability associated
how likely it is that an inference supporting a difference between an observed value and some sta- with a statistical hypothesis
tistical expectation is true.The term p-value stands for probability-value and is essentially another test that indicates how likely
name for an observed or computed significance level. Exhibit 21.1 discusses interpretations of an inference supporting a
p-values in different kinds of statistical tests.The probability in a p-value is that the statistical expec- difference between an observed
tation (null) for a given test is true. So, low p-values mean there is little likelihood that the statistical value and some statistical
expectation is true.This means the researcher’s hypothesis positing (suggesting) a difference between expectation is true. The
an observed mean and a population mean, or between an observed frequency and a population acceptable level of Type I error.
frequency, or for a relationship between two variables, is likely supported. Are you confused now?
Hopefully the remainder of this chapter and those that follow will help clarify these relationships. p-value
Traditionally, researchers have specified an acceptable significance level for a test prior to the Probability value, or the
analysis. Later, we will discuss this as an acceptable amount of Type I error. For most applications, the observed or computed
significance level is 0.05, but sometimes the acceptable amount of error is specified as 0.1 or 0.01. significance level; p-values are
compared to significance levels
to test hypotheses.
509
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•510 PART SIX Data Analysis and Presentation
EXHIBIT 21.1
p-Values and Statistical Tests
Test Description Test Statistic ␣ ϭ .025 ␣ ϭ .025
Compare an Observed
Mean with Some Z or t-test—Low p-values Indicate m ϭ 3.0 X
Predetermined Value the Observed Mean Is Different
Than Some Predetermined Value df ϭ 1
Compare an Observed (Often 0)
Frequency with a 1
Predetermined Value X 2—Low p-values Indicate
That Observed Frequency Is 50%
Compare an Observed Different Than Predetermined
Proportion with Some Value
Predetermined Value
Z or t-test for Proportions—Low
Bivariate Tests: p-values Indicate That the
Compare Whether Two Observed Proportion Is Different
Observed Means Are Different Than the Predetermined Value
from One Another.
Z or t-test—Low p-values
Indicate the Means Are Different
ϭ 0
Compare Whether Two X 2—Low p-values Indicate the df ϭ 3
Less-Than Interval Variables Are Variables Are Related to One
Related Using Cross-tabs Another
3
Compare Whether Two Interval t-test for Correlation—Low © Cengage Learning 2013
or Ratio Variables Are Correlated p-values Indicate the Variables
to One Another Are Related to One Another
rϭ0
If the p-value resulting from a statistical test is less than the prespecified significance level, then a
hypothesis about differences is supported.
Consider an example where researchers have identified that a successful fast food franchise
relies on households with an average of at least 1.4 children within a 10-minute drive of their
location.The researchers then collected data regarding family size in an area being considered for
a new store, and the sample shows the average family has 3.1 children. Exhibit 21.2 depicts this
data and illustrates an important property of p-values. In this case, the comparison standard of 1.4
is shown as a light blue line, while the sample result is shown as a dark blue line (3.1).The normal
curve illustrates what other sample results would likely be.What is most important to realize is that
as the observed value gets further from 1.4, the p-value gets smaller, meaning that the chance of the
mean actually equaling 1.4 also is less.With the observed mean of 3.1 and the observed standard
deviation of 1.02, there is very little chance that the researcher would be wrong in concluding the
actual number of children per family in this area is greater than 1.4.
Consider the test in the Research Snapshot “The ‘Freshman 7.8’.” The statistical test is whether
the mean computed from the 46 observations is equal to 7.8. Given the risk associated with being
wrong, the researcher uses an acceptable significance level of 0.05.After computing the appropriate
test, the research observes a computed significance level or p-value that is less than 0.05 (in fact less
than 0.001). Therefore, the hypothesis is rejected. The researcher can conclude that our students
gain less then 7.8 freshman pounds.
In discussing confidence intervals, statisticians use the term confidence level, or confidence
coefficient, to refer to the level of probability associated with an interval estimate. However, when
discussing hypothesis testing, statisticians change their terminology and call this a significance level,
a (the Greek letter alpha).
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10 Standard Sample Mean •CHAPTER 21 Univariate Statistical Analysis 511
8
6 EXHIBIT 21.2
As the observed mean gets
further from the standard
(proposed population mean),
the p-value decreases. The
lower the p-value, the more
confidence you have that the
sample mean is different.
Frequency4
2 © Cengage Learning 2013Mean ϭ 3.0684
Std. Dev. ϭ 1.01793
0 N ϭ 68
p-Value 2.00 4.00
0 Children
0.10 0.0001 0.000001
0. 5 0.001 0.00001
An Example of Hypothesis Testing
The example described here illustrates the conventional statistical approach to testing a univariate
hypothesis with an interval or ratio variable. Suppose the Pizza-In restaurant is concerned about
store image before deciding whether to expand. Pizza-In managers are most interested in how
friendly customers perceive the service to be. A sample of 225 customers was obtained and asked
to indicate their perceptions of service on a five-point scale, where 1 indicates “very unfriendly”
service and 5 indicates “very friendly” service. The scale is assumed to be an interval scale, and
experience has shown that the previous distribution of this attitudinal measurement assessing the
service dimension was approximately normal.
Now, suppose Pizza-In believes the service has to be different from 3.0 before a decision about
expansion can be made. In conventional statistical terminology, the null hypothesis for this test is
that the mean is equal to 3.0:
H0: m 5 3.0
The alternative hypothesis is that the mean does not equal 3.0:
H1: m Þ 3.0
More practically, the researcher is likely to write the substantive hypothesis (as it would be stated
in a research report or proposal) something like this:
H1: Customer perceptions of friendly service are not equal to three.
Note that the substantive hypothesis matches the “alternative” phrasing. In practical terms, research-
ers often do not state null and alternative hypotheses. Only the substantive hypothesis implying
what is expected to be observed in the sample is formally stated.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•512 PART SIX Data Analysis and Presentation
critical values Next, the researcher must decide on a significance level.This level corresponds to a region of
rejection on a normal sampling distribution as shown in Exhibit 21.1. The peak of the distribu-
The values that lie exactly on tion is the theoretical expected value for the population mean. In this case it would be 3. If the
the boundary of the region of acceptable significance level is 0.05, then the 0.025 on either side of the mean that is furthest away
from the mean forms the rejection zone (shaded dark blue in Exhibit 21.1).The values within the
rejection. unshaded area are called acceptable at the 95 percent confidence level (or 5 percent significance
level, or 0.05 alpha level), and if we find that our sample mean lies within this region we conclude
that it does not differ from the expected value, 3 in this case. More precisely, we fail to reject the null
hypothesis. In other words, the range of acceptance (1) identifies those acceptable values that reflect
a difference from the hypothesized mean in the null hypothesis and (2) shows the range within
which any difference is so small that we would conclude that any observed difference was actually
due to random sampling error rather than to a false null hypothesis. H1 would not be supported.
In our example, the Pizza-In restaurant hired research consultants who collected a sample of
225 interviews.The mean friendliness score on a five-point scale equaled 3.78. The sample stan-
dard deviation was S 5 1.5. (If s is known, it is used in the analysis; however, this is rarely true and
was not true in this case.1) Now we have enough information to test the hypothesis.
The researcher has decided that the acceptable significance level will be set at 0.05.This means
that the researcher wishes to draw conclusions that will be erroneous 5 times in 100 (0.05) or
fewer. From the table of the standardized normal distribution, the researcher finds that the Z score
of 1.96 represents a probability of 0.025 that a sample mean will be above 1.96 standard errors
from m. Likewise, the table shows that 0.025 of all sample means will fall below 21.96 standard
errors from m.Adding these two “tails” together, we get 0.05.
The values that lie exactly on the boundary of the region of rejection are called critical values of
m.Theoretically, the critical values are Z 5 21.96 and 11.96. Now we must transform these critical
Z-values to the sampling distribution of the mean for this image study. The critical values are
Critical value 2 lower limit 5 m 2 ZSX or m 2 Z S
1n
5 3.0 2 1.96 a 1.5 b
2225
5 3.0 2 1.96(.1)
5 3.0 2 0.196
5 2.804
Critical value 2 upper limit 5 m 1 ZSX or m 1 Z S
1n
5 3.0 1 1.96 a 1.5 b
2225
5 3.0 1 1.96(.1)
5 3.0 1 0.196
5 3.196
Based on survey results, the sample mean (X) is 3.78.The sample mean is contained in the region
of rejection (see the dark shaded areas of Exhibit 21.3). Since the sample mean is greater than the
critical value of 3.196, falling in one of the tails (regions of rejection), the researcher concludes that
the sample result is statistically significant beyond the 0.05 level. A region of rejection means that
the thought that the observed sample mean equals the predetermined value of 3.0 will be rejected
when the computed value takes a value within the range. Here is another way to express this result:
if we took 100 samples from this population and the mean were actually 3.0, fewer than five will
show results that deviate this much.
What does this mean to the management of the Pizza-In? The results indicate that custom-
ers believe the service is pretty friendly.The probability is less than 5 in 100 that this result (X 5
3.78) would occur because of random sampling error.This suggests that friendliness of the service
personnel may not be a problem. However, a more useful comparison might be between the
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 21 Univariate Statistical Analysis 513
EXHIBIT 21.3
A Hypothesis Test Using the
Sampling Distribution of X
under the Hypothesis m 5 3.0
2.804 3.0 3.196 3.78 X © Cengage Learning 2013
Critical Critical X from
Value– Hypothesized Value– Sample
Lower Limit Upper Limit
friendliness rating of Pizza-In and that of a key competitor.That analysis will have to wait until we
cover bivariate tests in Chapter 22.
An alternative way to test the hypothesis is to formulate the decision rule in terms of the
Z-statistic. Using the following formula, we can calculate the observed value of the Z-statistic
given a certain sample mean, (X):
Zobs 5 X 2 m
SX
3.78 2 m
5 SX
5 3.78 2 3.0
.1
5 .78
.1
5 7.8
In this case, the Z-value is 7.8 and we find that we have met the criterion of statistical significance
at the 0.05 level. This result produces a p-value of 0.000001. Once again, since the p-value is
less than the acceptable significance level, the hypothesis is supported. The service rating is sig-
nificantly higher than 3.0. This example used the conventional statistical terminology involving
critical values and a statistical null hypothesis. Once again, it is rare that researchers have to look
up tabled values for critical values since the statistical packages will provide an exact p-value for
a given test. Thus, the p-value, or a confidence interval associated with the p-value, is the key to
interpretation.
Type I and Type II Errors
Hypothesis testing using sample observations is based on probability theory. We make an observa-
tion of a sample and use it to infer the probability that some observation is true within the popula-
tion the sample represents. Because we cannot make any statement about a sample with complete
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
RESEARCH
SNAPSHOT
The Law and Type I and Type II Errors © Cengage Learning 2013 © Michael Newman/PhotoEdit
Although most attorneys and judges do not concern would occur if a guilty party were set free (the null hypoth-
themselves with the statistical terminology of Type I and esis would have been accepted). Our society places such a
Type II errors, they do follow this logic. For example, our high value on avoiding Type I errors that Type II errors are
legal system is based on the concept that a person is inno- more likely to occur.
cent until proven guilty. Assume that the null hypothesis is
that the individual is innocent. If we make a Type I error,
we will send an innocent person to prison. Our legal system
takes many precautions to avoid Type I errors. A Type II error
certainty, there is always the chance that an error will be made. When a researcher makes the
observation using a census, meaning that every unit (person or object) in a population is measured,
a population parameter results and the conclusions are certain. Business researchers very rarely use
a census, having to rely on samples and sample statistics.
The researcher using sampling runs the risk of committing two types of errors. Exhibit 21.4
summarizes the state of affairs in the population and the nature of Type I and Type II errors.The
four possible situations in the exhibit result because the null hypothesis (using the example above,
μ 5 3.0) is actually either true or false and the observed statistics (X 5 3.78) will result in accep-
tance or rejection of this null hypothesis.
EXHIBIT 21.4 Actual State in the Population Accept H0 Decision © Cengage Learning 2013
Reject H0
Type I and Type II Errors in
Hypothesis Testing
It is terrible to H0 is true Correct—no error Type I error
H0 is false Type II error Correct—no error
‘‘speak well and be
Type I Error
wrong.
With our example above the null hypothesis is that the mean is equal to 3.0. Suppose the true
’’ —SOPHOCLES population mean is indeed equal to 3.0, but the observed sample mean leads to the conclusion
that the mean is greater (or less) than 3.0. A Type I error has occurred. A Type I error occurs when
Type I error a condition that is true in the population is rejected based on statistical observations. When a
researcher sets an acceptable significance level for a, he or she is determining tolerance for a Type
An error caused by rejecting I error. Simply put, a Type I error occurs when the researcher concludes that there is a statistical
the null hypothesis when it is difference when in reality one does not exist.
true; has a probability of alpha.
Practically, a Type I error occurs Type II Error
when the researcher concludes
that a relationship or difference Conversely, if our null hypothesis is indeed false, but we conclude that we should not reject the
exists in the population when null hypothesis, we make what is called a Type II error. In this example, our null hypothesis that the
mean is equal to 3.0 is not true. However, our sample data indicates the mean does not differ from
in reality it does not exist. 3.0. So, a Type II error is the probability of failing to reject a false null hypothesis.This incorrect
decision is called beta (b). In practical terms, a Type II error means that our sample does not show
Type II error a difference between an observed mean and a benchmark exists when in fact the difference does
exist in the population.The Research Snapshot “The Law and Type I and Type II Errors” provides
An error caused by failing to further clarification of the Type I and Type II conditions.
reject the null hypothesis when
the alternative hypothesis
is true; has a probability of
beta. Practically, a Type II error
occurs when a researcher
concludes that no relationship
or difference exists when in
fact one does exist.
514
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 21 Univariate Statistical Analysis 515
Unfortunately, without increasing sample size the researcher cannot simultaneously reduce
Type I and Type II errors. They are inversely related. Thus, reducing the probability of a Type II
error increases the probability of a Type I error. In business problems, Type I errors generally are
considered more serious than Type II errors. Thus more emphasis is placed on determining the
significance level, a, than in determining b.2
Choosing the Appropriate Statistical
Technique
Numerous statistical techniques are available to assist the researcher in interpreting data. While
gaining competence in these statistical techniques is challenging, perhaps the more difficult
task is determining when to use each method. Choosing the right tool for the job is just as
important to the researcher as to the mechanic. Making the correct choice can be determined
by considering
1. The type of question to be answered
2. The number of variables involved
3. The level of scale measurement
Today, the researcher would only perform a paper and pencil calculation on the simplest of data
sets.Virtually all business research hypotheses are tested by using a correct click-through sequence
in a statistical software package.The mathematics of these packages is highly reliable.Therefore, if
the researcher can choose the right statistic, know the right click-through sequence, and read the
resulting output, the right statistical conclusion should be easy to reach.
Type of Question to Be Answered
The type of question the researcher is attempting to answer is a consideration in the choice of
statistical technique. For example, a researcher may be concerned simply with the central tendency
or of the distribution of a variable. Comparison between different business divisions’ sales results
with some target level will require a one-sample t-test. Comparison of two salespeople’s average
monthly sales will require a t-test of two means, but a comparison of quarterly sales distributions
will require a chi-square test.
The researcher should consider the method of statistical analysis before choosing the research
design and before determining the type of data to collect. Once the data have been collected, the
initial orientation toward analysis of the problem will be reflected in the research design.
Number of Variables
The number of variables that will be simultaneously investigated is a primary consideration in
the choice of statistical technique. A researcher who is interested only in the average number of
times a prospective home buyer visits financial institutions to shop for interest rates can concen-
trate on investigating only that single variable at a time. However, a researcher trying to measure
multiple complex organizational variables cannot do the same. Simply put, univariate, bivariate,
and multivariate statistical procedures are distinguished based on the number of variables involved
in an analysis.
Level of Scale of Measurement
The scale measurement level helps choose the most appropriate statistical techniques and empirical
operations.Testing a hypothesis about a mean, as we have just illustrated, is appropriate for interval-
scaled or ratio-scaled data. Suppose a researcher is working with a nominal scale that only identi-
fies users versus nonusers of bank credit cards. Because of the type of scale, the researcher may use
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
RESEARCH
SNAPSHOT
Living in a Statistical Web © Marika Eglite/Shutterstock
Having trouble learning statistical concepts? Do a little surfing THE RICE VIRTUAL LAB IN STATISTICS © Cengage Learning 2013
and the concepts may become clear. Many sources exist that
illustrate statistical problems and provide data for practice. Here http://onlinestatbook.com/rvls.html
are just a few: http://davidmlane.com/hyperstat/
STATLIB The Rice Virtual Lab in Statistics provides hypertext materials
such as HyperStat Online.
http://lib.stat.cmu.edu/
STATCRUNCH
StatLib is a system for distributing statistical software, data
sets, and information by electronic mail, FTP, and the World http://www.statcrunch.com/
Wide Web.
Stat-Crunch is a statistical software package via the World
STAT-HELP Wide Web.
http://www.stat-help.com/ GRAPHPAD
Stat-Help.com provides help with statistics via the Inter- http://www.graphpad.com/quickcalcs/Statratio1.cfm
net and contains spreadsheets for performing many basic
calculations. GraphPad software is a p-value calculator.
SURFSTAT.AUSTRALIA
http://surfstat.anu.edu.au/surfstat-home/surfstat-main.html
SurfStat.australia is an online text in introductory statistics from
the University of Newcastle and the Australian government.
only the mode as a measure of central tendency. In other situations, where data are measured on
an ordinal scale, the median may be used as the average or a percentile may be used as a measure
of dispersion. For example, ranking brand preferences generally employs an ordinal scale. Nominal
and ordinal data are often analyzed using frequencies or cross-tabulation.
Parametric versus Nonparametric Hypothesis Tests
parametric statistics The terms parametric statistics and nonparametric statistics refer to the two major groupings of
statistical procedures.The major distinction between them lies in the underlying assumptions about
Involve numbers with known, the data to be analyzed. Parametric statistics involve numbers with known, continuous distribu-
continuous distributions; when tions.When the data are interval or ratio scaled and the sample size is large, parametric statistical
procedures are appropriate. Nonparametric statistics are appropriate when the numbers do not
the data are interval or ratio conform to a known distribution.
scaled and the sample size is
Parametric statistics are based on the assumption that the data in the study are drawn from a pop-
large, parametric statistical ulation with a normal (bell-shaped) distribution and/or normal sampling distribution. For example,
procedures are appropriate. if an investigator has two interval-scaled measures, such as gross national product (GNP) and industry
sales volume, parametric tests are appropriate. Possible statistical tests might include product-moment
nonparametric statistics correlation analysis, analysis of variance, regression, or a t-test for a hypothesis about a mean.
Appropriate when the variables Nonparametric methods are used when the researcher does not know how the data are distrib-
being analyzed do not conform uted. Making the assumption that the population distribution or sampling distribution is normal
generally is inappropriate when data are either ordinal or nominal.Thus, nonparametric statistics
to any known or continuous are referred to as distribution free.3 Data analysis of both nominal and ordinal scales typically uses
distribution. nonparametric statistical tests.
Exhibit 21.5 illustrates the process of selecting an appropriate univariate statistical method.The
exhibit illustrates how statistical techniques vary according to scale properties and the type of ques-
tion being asked.While more univariate statistical tests exist than are shown in Exhibit 21.5, these
basic options address the majority of univariate analyses in business research contexts.
516
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
EXHIBIT 21.5 •CHAPTER 21 Univariate Statistical Analysis 517
Univariate Statistical Choice Made Easy Type of
Variable?
Interval or Ordinal Nominal Nominal-
Ratio Proportions
Is sample mean Are rankings evenly Is number in each Is observed proportion
different from distributed? classification equal? different from a
hypothesized value?
hypothesized value?
Z-test or t-test 2 test Kolmogorov-Smirnov t-test of proportion © Cengage Learning 2013
Test
The t-Distribution
A univariate t-test is appropriate for testing hypotheses involving some observed mean against t-test
some specified value. The t-distribution, like the standardized normal curve, is a symmetrical,
bell-shaped distribution with a mean of 0 and a standard deviation of 1.0.When sample size (n) A hypothesis test that uses
is larger than 30, the t-distribution and Z-distribution are almost identical.Therefore, while the the t-distribution. A univariate
t-test is strictly appropriate for tests involving small sample sizes with unknown standard devia- t-test is appropriate when
tions, researchers commonly apply the t-test for comparisons involving the mean of an interval the variable being analyzed is
or ratio measure.The precise height and shape of the t-distribution vary with sample size. More interval or ratio.
specifically, the shape of the t-distribution is influenced by its degrees of freedom (df ). The
degrees of freedom are determined by the number of distinct calculations that are possible t-distribution
given a set of information. In the case of a univariate t-test, the degrees of freedom are equal
to the sample size (n) minus one. In the Research Snapshot regarding the “Freshman 7.8,” one A symmetrical, bell-shaped
can see that the sample size is 46 (n 5 46) and the degrees of freedom for the univariate t-test distribution that is contingent
are 45 (df 5 45). on sample size; has a mean
of 0 and a standard deviation
Exhibit 21.6 illustrates t-distributions for 1, 2, 5, and an infinite number of degrees of equal to 1.
freedom. Notice that the t-distribution approaches a normal distribution rapidly with increas-
ing sample size.This is why, in practice, marketing researchers usually apply a t-test even with degrees of freedom (df )
large samples. The practical effect is that the conclusion will be the same since the distribu-
tions are so similar with large samples and the correspondingly larger numbers of degrees of The number of observations
freedom. minus the number of
constraints or assumptions
needed to calculate a statistical
term.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•518 PART SIX Data Analysis and Presentation
EXHIBIT 21.6 0.40
The t-Distribution for Various Normal
Degrees of Freedom
0.35 5
2
0.30 1
Relative Frequency 0.25
0.20
0.15 nϭ1
nϭ2
0.10 nϭ5
0.05 Normal2 3 4t
0.00 Ϫ4 Ϫ3 Ϫ2 Ϫ1 0 1 © Cengage Learning 2013
Values of t
Another way to look at degrees of freedom is to think of adding four numbers together when
you know their sum—for example,
Ultra-luxury car makers have 4
sales goals that may involve
selling 1,000 cars per year 2
or fewer worldwide. What
questions are asked in marketing 1
a car like this that might involve
a univariate analysis?4 1X
12
The value of the fourth number has to be 5. The values of the first three digits could change to any
value (freely vary), but the fourth value would have to be determined for the sum to still equal 12.
In this example there are three degrees of freedom. Degrees of freedom can be a difficult concept
to understand fully. For most basic statistical analyses, the user only needs to remember the rule
for determining the number of degrees of freedom for a given test. Today, with computerized
software packages, even that number is provided automatically for most tests.
The calculation of t closely resembles the
calculation of the Z-value. To calculate t, use the
formula
t 5 X 2 m
SX
© KENCKOphotography/Shutterstock with n − 1 degrees of freedom.
The Z-distribution and the t-distribution are
very similar, and thus the Z-test and t-test will pro-
vide much the same result in most situations. How-
ever, when the population standard deviation (s) is
known, the Z-test is most appropriate. When s is
unknown (the situation in most business research
studies), and the sample size greater than 30, the
Z-test also can be used.When s is unknown and the
sample size is small, the t-test is more appropriate.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 21 Univariate Statistical Analysis 519
Calculating a Confidence Interval Estimate
Using the t-Distribution
Suppose a business organization is interested in finding out how long newly hired MBA gradu-
ates remain on their first jobs. On the basis of a small sample of 18 employees with MBAs, the
researcher wishes to estimate the population mean with 95 percent confidence.The data from the
sample are presented below.
Number of years on first job: 3 5 7 1 12 1 2 2 5
4231 34267
To find the confidence interval estimate of the population mean for this small sample, we use
the formula
m 5 X 6 tc.l.SX
or Upper limit 5 X 1 tc.l. a S
where 1n b
Lower limit 5 X 2 tc.l. a S
1n b
m 5 population mean
X 5 sample mean
tc.l. 5 critical value of t at a specified confidence level
SX 5 standard error of the mean
S 5 sample standard deviation
n 5 sample size
More specifically, the step-by-step procedure for calculating the confidence interval is as follows:
1. We calculate X from the sample. Summing our data values yields oX 5 70, and X 5 oX/n 5
70/18 5 3.89.
2. Since s is unknown, we estimate the population standard deviation by finding S, the sample
standard deviation. For our example, S 5 2.81.
3. We estimate the standard error of the mean using the formula SX 5 S/ 1n. Thus,
SX 5 2.81/ 218 or SX 5 0.66.
4. We determine the t-values associated with the desired confidence level. To do this, we go
to Table A.3 in the appendix. Although the t-table provides information similar to that in
the Z-table, it is somewhat different. The t-table format emphasizes the chance of error, or
significance level (a), rather than the 95 percent chance of including the population mean in
the estimate. Our example is a two-tailed test. Since a 95 percent confidence level has been
selected, the significance level equals 0.05 (1.00 2 0.95 5 0.05). Once this has been deter-
mined, all we have to do to find the t-value is look under the 0.05 column for two-tailed
tests at the row in which degrees of freedom (df ) equal the appropriate value (n − 1). Below
17 degrees of freedom (n 2 1 5 18 2 1 5 17), the t-value at the 95 percent confidence level
(0.05 level of significance) is t 5 2.12.
5. We calculate the confidence interval:
Lower limit 5 3.89 2 2.12 a 2.81 b 5 2.49
218
Upper limit 5 3.89 1 2.12 a 2.89 b 5 5.28
218
In our hypothetical example it may be concluded with 95 percent confidence that the popula-
tion mean for the number of years spent on the first job by MBAs is somewhere between 2.49
and 5.28.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•520 PART SIX Data Analysis and Presentation
One- and Two-Tailed t-Tests
Univariate Z-tests and t-tests can be one- or two-tailed. A one-tailed univariate test is appropri-
ate when the researcher has a directional hypothesis implying that an observed mean can only
be greater than or less than a hypothesized value.Thus, only one of the “tails” of the bell-shaped
normal curve is relevant. For instance, the following hypothesis could be appropriately examined
with a one-tailed test:
H1:The number of pizza restaurants within postal code 49100 in Angers, France is greater than five.
A two-tailed test is one that tests for differences from the population mean that are either greater
or less.Thus, the extreme values of the normal curve (or tails) on both the right and the left are
considered. In practical terms, when a research question does not specify whether a difference
should be greater than or less than, a two-tailed test is most appropriate. For instance, the following
research question could be examined using a two-tailed test:
H2:The number of pizza restaurants within postal code 49100 in Angers, France is not equal to 5.
In the case of H1, where the hypothesis proposes “greater than five,” if the observed value is signifi-
cantly less than five, the hypothesis is still not supported. Practically, a one-tailed test can be deter-
mined from a two-tailed test result by taking half of the observed p-value. When the researcher
has any doubt about whether a one- or two-tailed test is appropriate, he or she should opt for the
two-tailed test. Most computer software will assume a two-tailed test unless otherwise specified.
Univariate Hypothesis Test Using the t-Distribution
The step-by-step procedure for a t-test is conceptually similar to that for hypothesis testing with
the Z-distribution. Suppose a Pizza-In store manager believes that the average number of custom-
ers who order take-out is 20 per day.The store gathers a sample of data by recording the number
of take-out orders for each of the 25 days it was open during a given month. Does the number of
take-out orders differ from 20 per day? The substantive hypothesis is
H1: m Þ 20
1. The researcher calculates a sample mean and standard deviation. In this case, X 5 22 and
S (sample standard deviation) 5 5.
2. The standard error is computed (SX):
SX 5 S
1n
55
225
51
3. The researcher then finds the t-value associated with the desired level of confidence level or
statistical significance. If a 95 percent confidence level is desired, the significance level is 0.05.
4. The critical values for the t-test are found by locating the upper and lower limits of the confi-
dence interval. The result defines the regions of rejection. This requires determining the value
of t. For 24 degrees of freedom (n 5 25, df 5 n 2 1), the t-value is 2.064. The critical values are
Lower limit 5 m 2 tc.l.SX 5 20 2 2.064 a 5
b
225
5 20 2 2.064(1)
5 17.936
Upper limit 5 m 1 tc.l.SX 5 20 1 2.064 a 5
b
225
5 20 1 2.064(1)
5 22.064
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•CHAPTER 21 Univariate Statistical Analysis 521
Finally, the researcher makes the statistical decision by determining whether the sample mean falls
between the critical limits. For the pizza store sample, X 5 22. The sample mean is not included
in the region of rejection. Even though the sample result is only slightly less than the critical value
at the upper limit, the null hypothesis cannot be rejected. In other words, the pizza store man-
ager’s assumption appears to be correct.
As with the Z-test, there is an alternative way to test a hypothesis with the t-statistic. This is
by using the formula
tobs 5 X 2 m
SX
tobs 5 22 2 20 5 2 5 2
1 1
We can see that the observed t-value is less than the critical t-value of 2.064 at the 0.05 level when
there are 25 2 1 5 24 degrees of freedom. As a result, the p-value is greater than 0.05 and the
hypothesis is not supported. Again, we cannot conclude with 95 percent confidence that the mean
is not 20.
The Chi-Square Test for Goodness of Fit
A chi-square (2) test is one of the most basic tests for statistical significance and is particularly chi-square (2) test
appropriate for testing hypotheses about frequencies arranged in a frequency or contingency table.
Univariate tests involving two nominal or ordinal variables are examined with a x2 test. More One of the basic tests for
generally, the x2 test is associated with goodness-of-fit (GOF). GOF can be thought of as how well statistical significance that is
some matrix (table) of numbers matches or fits another matrix of the same size. Most often, the test particularly appropriate for
is between a table of frequency counts observed in the sample data and another table of expected testing hypotheses about
values (central tendency) for those counts. frequencies arranged in a
frequency or contingency table.
Please consider the following hypothesis:
goodness-of-fit (GOF)
H1: Papa John’s Pizza stores are more likely to be located in a standalone location than in a shopping
center. A general term representing
how well some computed
A competitor may be interested in this hypothesis as part of the competitor analysis in a market- table or matrix of values
ing plan. A researcher for the competitor gathers a random sample of 100 Papa John’s locations in matches some population or
California (where the competitor is located). The sample is selected from phone directories and predetermined table or matrix
the locations are checked by having an assistant drive to each location. The following observations of the same size.
are recorded in a frequency table.
Location One-Way Frequency Table
Standalone 60 stores
Shopping Center 40 stores
Total 100 stores
These observed values (Oi) can be compared to the expected values for this distribution (Ei) to
complete a x2 test.The x2 value will reflect the likelihood that the observed values come from a
distribution reflected by the expected values.The higher the value of the x2 test, the less likely it is
that the expected and observed values are the same.
In statistical terms, a x2 test determines whether the difference between an observed frequency
distribution and the corresponding expected frequency distribution is due to sampling variation.
Computing a x2 test is fairly straightforward and easy. Students who master this calculation should
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
•522 PART SIX Data Analysis and Presentation
have little trouble understanding future significance tests since the basic logic of the x2 test under-
lies these tests as well.
The steps in computing a x2 test are as follows:
1. Gather data and tally the observed frequencies for the categorical variable.
2. Compute the expected values for each value of the categorical variable.
3. Calculate the x2 value, using the observed frequencies from the sample and the expected
frequencies.
4. Find the degrees of freedom for the test.
5. Make the statistical decision by comparing the p-value associated with the calculated x2 against
the predetermined significance level (acceptable Type I error rate).
These steps can be illustrated with the pizza store location example.
■■ The data for the location variable (standalone or shopping center) are provided in the fre-
quency table above.
■■ The next step asks, “What are the expected frequencies for the location variable? This is
another way of asking the central tendency for each category, or how the observations would
be distributed if there was no relationship between the categories and number of stores. Since
the sample size is 100 and we are dealing with two categories, finding the expected values
is easy. If no pattern exists in the locations, they should be distributed randomly and evenly
across the two categories. We would expect that half (50) of the locations would be standalone
and half (50) would be in a shopping center. This is another way of saying that the expected
probability of being either type of location is 50 percent. The expected values also can be
placed in a frequency table:
Location Expected Frequencies
Standalone 100/2 5 50 stores
Shopping Center 100/2 5 50 stores
Total 100 stores
■■ The actual x2 value is computed using the following formula:
ox2 5 (Oi 2 Ei)2
Ei
where
x2 5 chi-square statistic
Oi 5 observed frequency in the ith cell
Ei 5 expected frequency in the ith cell
Sum the squared differences:
x2 5 (O1 2 E1)2 1 (O2 2 E2)2
Ei E2
Thus, we determine that the chi-square value equals 4:
(60 2 50)2 (40 2 50)2
x2 5 50 1 50
54
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.