CHI SQUARE

FOR CATEGORICAL DATA

N O N PA R A M E T R I C T E S T

Test of

Independence

WHY USED CHI SQUARE

(2)?

• Use with categorical data – when all you have is the frequency with

which certain events have occurred.

• We measure the “goodness of fit” between our observed outcome and

the expected outcome for some variable (CHI SQUARE GOODNESS

OF FIT TEST).

• With two variables, we test in particular whether they are independent

of one another using the same basic approach (CHI SQUARE TEST OF

INDEPENDENCE) .

THE CHI SQUARE DISTRIBUTION

• Positively skewed but becomes

symmetrical with increasing degrees

of freedom.

• The values are non-negative. That

is, the values of are greater than or

equal to 0.

2 (O E)2

E

CHI SQUARE GOODNESS OF FIT TEST

• Used when we have distributions of frequencies across two or more

categories on one variable.

• Test determines how well a hypothesized distribution fits an obtained

distribution.

• Compares observed frequencies with theoretically expected/predicted

frequencies.

• Hypotheses:

H0: The observed data do fit the expected frequencies of the

population.

Ha: The observed data do not fit the expected frequencies for the

population.

CHI SQUARE GOODNESS OF FIT TEST

• Assumption #1: One categorical variable (i.e., the variable can be

dichotomous, nominal or ordinal).

• Assumption #2: You should have independence of observations,

which means that there is no relationship between any of the cases

(e.g., participants).

• Assumption #3: There must be at least 5 expected frequencies in

each group of your categorical variable.

• When the sample size is very small in any cell (expected value<5), Fisher’s

exact test is used as an alternative to the chi-square test.

CHI SQUARE GOODNESS OF FIT TEST

It is believed by a university that the distribution of male

and female students enrolled in the education program is

equal every year.The program coordinator is interested in

determining whether the enrollment has changed from

previous year.

# enrolled Male Female

123 185

HH0a:: The distribution of male and female students is deiqffuearle(nptM. =0.5, pF=0.5).

The distribution of male and female students is

CHI SQUARE GOODNESS OF FIT TEST

Observed frequencies (O) male female Total

Expected frequencies (E) 123 185 308

2 (O E)2 Critical value?

E

= .05

df = k-1 (k is number of response

categories) = 1

From table A.4

CHI SQUARE GOODNESS OF

FIT TEST male female Total

123 185 308

Observed frequencies (O) 154 154 308

Expected frequencies (E)

2 (O E)2 Critical value?

E

= .05

χ2 = (123 − 154)2 (185 − 154)2 df = k-1 (k is number of response

154 + 154 categories) = 1

From table A.4

(−31)2 312 Critical value =3.841

= 154 + 154

= 12.48(obtained value)

Reject Ho or Do not reject Ho ?

CHI SQUARE GOODNESS OF FIT TEST

SPSS output

A chi-square goodness-of-fit was conducted to check whether the

enrollment has changed from previous year.There were statistically

significant differences in the number of enrollment between male and

female students where χ2(1)=12.481, p<.001 .

CHI SQUARE TEST OF

INDEPENDENCE

• Used when we compare the distribution of frequencies across

categories in two or more independent samples.

• Used in a single sample when we want to know whether two

categorical variables are related (to discover if there is an association

between two categorical variables).

• Hypotheses:

H0: There is no association between the variables.

Ha: There is an association between the variables.

CHI SQUARE TEST OF INDEPENDENCE

• Assumption #1:The two variables should be measured at an ordinal

or nominal level (i.e., categorical data).

• Assumption #2:The two variables should consist of two or more

categorical, independent groups.

• Assumption #3: There must be at least 5 expected frequencies in

each group of your categorical variable.

– Expected frequency for a given cell is obtained by multiplying together

totals for the row and column which the cell is located (marginal totals)

= × .

CHI SQUARE TEST OF INDEPENDENCE

Educators are always looking for novel ways in which to

teach statistics to undergraduates as part of a non-statistics

degree course (e.g., psychology, education, law). With current

technology, it is possible to present how-to guides for

statistical programs online instead of in a book. However,

different people learn in different ways.

An educator would like to know whether gender (male/female) is associated

with the preferred type of learning medium (online vs. books).

CHI SQUARE TEST OF INDEPENDENCE

H0:Tidak terdapat perkaitan antara kaedah pengajaran dan jantina.

Ha:Terdapat perkaitan antara kaedah pengajaran dan jantina.

Male observed value Books Online Total

Female expected value 16 24 40

observed value

expected value 13 27 40

Total

29 51 80

2 (O E)2

E

CHI SQUARE TEST OF INDEPENDENCE

Books Online Total

Male observed value 16 24 40

Female expected value 29×40 = 14.5 51×40 = 25.5 40

80

observed value 80 80

expected value

13 27

Total 29×40 = 14.5 51×40 = 25.5

80 80

29 51

2= (16-14.5)^2 + (24-25.5)^2 + (13-14.5)^2 + (27-25.5)^2 =

14.5 25.5 14.5 25.5

Critical value=?? 2 (O E)2

df= (r-1)(c-1)=3.841 E

CHI SQUARE TEST OF INDEPENDENCE

SPSS Output

Nilai chi square=0.487 adalah lebih kecil dari nilai kritikal = 3.841. Maka Ho gagal

ditolak.Ada bukti yang mencukupi untuk gagal/tidak menyatakan terdapat perkaitan

antara kaedah pengajaran (online vs. book) dan jantina(lelaki vs perempuan).

LATIHAN

Pada tahun 2010, satu kajian telah dijalankan untuk

mendapatkan maklumat status semakan (status updates)

penggunaan Facebook dalam kalangan pelajar tingkatan 5 di

sebuah sekolah di daerah Hulu Langat Selangor.

Status Lelaki Perempuan

Setiap hari 138 164

3-5 hari/seminggu 83 97

1-2 hari/seminggu 64 84

Jalankan pengujian hipotesis dengan menggunakan ujian

inferensi yang bersesuaian dan laporkan dapatan kajian.