The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by missberryberry542, 2022-02-08 01:57:31

Data analysis

Data analysis

Data Analysis

Hamidah Ishak

Definition

Data Analysis is the process of
systematically applying statistical and/or
logical techniques to describe and illustrate,
condense and recap, and evaluate data. ...
An essential component of
ensuring data integrity is the accurate and
appropriate analysis of research findings

Importance of Data Analysis

To explore the characteristics / features
of the data

To screen for errors and correct them
To look for distribution patterns – normal
distribution or not
To look association or correlation
between variables
To determine whether differences
variables are significant

Process Of Data Management
Procedures

The important processes in the data analysis process
consist of several stages:

i. Identifying the data structures
ii. Editing the data
iii. Classifying the data
iv. Transcriptions of data
v. Coding
vi. Tabulations of data ( data entry)

i. Identifying the data structures
§ The data are prepared in a data format, which

allows the analyst to use modern analysis
software, such as SPSS

§ The major criterion in this is to define the data
structure

§ A data structure is a dynamic collection of
related variables and can be conveniently
represented as a graph

ii. Editing the data

- the completed questionnaires must be
edited.

- potentially invalid or inaccurate
questionnaires should be eliminated.

- It is a process of checking to

- Detect and correct errors and omissions.
- Checks researcher data for

a. Completeness

b. accuracy
c. uniformity

a. Completeness:

§ The first step is to check whether there is an
answer to all the questions / variables set out
in the data set

§ If there were any omission, the researcher
sometimes would be able to deduce the correct
answer from other related data on the same
instrument

§ For example, sources of income. If the information
is vital and has been found to be incomplete, then
researcher can take the step of contacting the
respondent personally again and solicit the
requisite data again

§ If none of these steps could be resorted to the
marking of the data as ‘missing’ must be resorted
to.

b. Accuracy

§ Apart from checking for omissions, the
accuracy of each recorded answer should be
checked

§ A random check process can be applied to
trace the errors at this step

§ The reliability of the data set would heavily
depend on this step of error correction

§ Fake responses should be dropped from the
data sets

c.Uniformity

§ In editing data sets, another keen look out
should be for any lack of uniformity in
interpretation of questions and instructions by
the data recorders

§ Uniformity checks for consistency in coding
throughout the questionnaire / interview
schedule response / data set

§ Final point, is to maintain a log of all
corrections help the researcher to retain the
original data set

iii. Classifying the data

- The classification should be linked to the
theory and the aim of the particular
study. The categorization should meet
the information required to test the
hypotheses.

- The recording of the data is done on the
basis of this coding scheme such as
Numeric coding, Alphabetic coding, and
Zero coding.

- Classification can be one of the
following two types, depending upon the
nature of the phenomenon involved:

a) Classification according to attributes:
Data are classified on the basis of common.
characteristics which can be either
descriptive such as literacy, sex, honesty
etc. or numerical such as weight, height and
income etc.

b) Classification according to class-intervals:

This numerical data refer to quantitative
phenomenon which can be measured through
some statistical units such as data relating to
income, productions age, marks weight etc.
Such data are known as statistic of variable and
are classified on the basis of class-intervals.

Example Income 1
RM 2001-3000 2
RM 3001-4000 3
RM 4001-5000
1
Age 2
Below 25 years 3
26-30 years 4
31-35 years 5
36-40 years
Above 40 years

iv. Transcriptions of data

- It is a process of transferring information to a
data sheet, to help the researcher to arrive at
preliminary conclusion as the nature of the
sample collected. It is an intermediary
process between data coding and data
tabulation.

Methods of transcription:

1. Manual transcription: if sample size
manageable

2. Preliminaries for Computerizes data
processing: if sample size is large and / or
the variables studied is vast and
inter-related

v. Coding

- Coding is the process of assigning numerals
or other symbols (character codes) to classify
answers of all responses so that responses
can be put into a limited number of
categories or classes. This enables us to
enter data for analysis.

- Coding is necessary for efficient analysis and
through it the several replies may be reduced
to a small number of classes which contain the
critical information required for analysis.

- The recording of the data is done on the basis
of this coding scheme such as Numeric coding,
Alphabetic coding, and Zero coding.

i. Numeric coding:
When the variable is to be subjected to
further parametric analysis

ii. Alphabetical coding:
A mere tabulation or frequency count or
graphical representation of the variable may
be given an alphabetic coding

iii. Zero coding:

• A coding of zero has to be assigned
carefully to a variable.

• When manual analysis is done, a code of
zero would imply ‘no response’ from the
respondents

• Hence, if a value of zero is to be given to a
specific response in the data sheet, it
should not lead to the same interpretation
of ‘no response’

Example of coding process of demographic variables

Variables / Observation Response Categories Code

Owner of house Yes 2
Student performance No 1

Age Excellent 5
Good 4
Adequate 3
Bad 2
Worst 1

Up to 20 year 1
21-40 year 2
41-60 year 3
Above 61 year 4

vi. Tabulations of data

- Tabulation is a process of summarizing and
arrange raw data and displaying them on
compact statistical tables for further analysis.
It involves counting the numbers of cases
falling into each of the categories identified
by the researcher.

- Tabulation is the summarization of results in
the form of statistical method lies in the clarity
and precision with which numerical data are
presented to the reader.

- Tabulation can be done by manually or
through the computer.

- The choice depends upon the size and type of
study, cost considerations, time pressures and
availability of software packages

- Manual tabulation is suitable for small and
simple studies.

- Data may be organized in various ways like:
a. Constructing an array - is a common way.

b. Frequency distribution
c. Design database - Decide the software

Plan For Data Interpretation

Type of data Categorical Nominal
(Qualitative) Ordinal
Interval
Numerical Ratio
(Quantitative)

Categorical Data that Nominal
Variable describe the
characteristic of - Has no rank or specific
(Qualitative) the sample order
Can be
categorized - Gender (male/female),
Non-numeric - Blood group
value
(O, AB, A, B),
- Race (Malay, Chinese,

Indian, Others)

-Yes/No

OrdinaL

- Has rank and order
- Size (S, M, L, XL,XXL)
- Stage I, II, III, IV
- Low, Middle, High
- Strongly agree, Agree,

Neutral, Disagree,

Strongly disagree

Numerical -Characteristics Interval
Variable - Counting such a
(Quantitative) that are
measurable number of
children, body
- Numeric data temperature
- Zero point (0) is
arbitrary

Ratio
- Any value within

a range such as
blood
pressure,height,
weight, glucose
level, blood pH
- Zero point (0), is
not arbitrary

Ø Nominal

- The lowest level of management,
involves using numbers simply to classify
characteristics into categories

- Analyze using frequency distributions,
proportions, mode, cross tabulation.

- A measurement scale where numerals
(numbers) are assigned to represent
variables with the purpose of categorizing
them.

Examples of nominal scale:

Variable Meaning for the numeral

Gender : 1 1 indicates male respondent
Male : 2 2 indicates female respondent
Female

Ethnicity : 1 1 indicates the respondents is a Malay
Malay : 2 2 indicates the respondents is a Chinese
Chinese : 3 3 indicates the respondents is a Indian
Indian

Ø Ordinal

- Typical analyses include frequency
distributions, proportions, medians.

- Ranks the different categories in the variables
measured.

- Usually the numerals are assigned in ascending
order

Examples of ordinal scale:

Variable Meaning of the numeral

Breast cancer grade 1 indicates low
Histology grade 2 indicates moderate
3 indicates high
1
2
3

Examples of ordinal scale:

Variable Meaning of the numeral
Smoker category
1 indicates light smoker
Light smoker : 1 2 indicates heavy smoker
Heavy smoker : 2
Those given a score of 2
smokes more cigarettes than
those with a score of 1

Ø Interval and ratio

– researcher can specify the ranking of
objects on an attribute and the distance
between those objects.

– Indicating not only the ranking of objects but
also the distance between them.

– Researcher can do frequency distributions,
analysis of proportions, means and standard
deviations.

- Interval and ratio scales have equal interval
between subsequent numbers

- The difference between interval and ration data
is the latter has absolute zero point on the scale
(point zero on ratio scale indicates absolute
absence of variable measured)

Examples of interval and ratio scale

Interval Temperature The difference between 40° C and
scale 20° C 60° C is the same as the difference
40° C between 60° C and 40° C
60° C
0° C 40° C is two times hotter than 20° C

0° C does not mean that there is
absolute absence of heat

Examples of interval and ratio scale

Ratio Mass The difference between 40 kg
scale 20 kg and 60 kg is the same as the
40 kg difference between 60 kg and 40
60 kg kg
0 kg
40 kg is two times hotter than 20
kg

0 kg does not mean that there is
absolute absence of mass

Statistical analysis2. Statistical Analysis Methods

There are two approaches to the statistical
analysis of data:

Descriptive
Inferential

Descriptive statistics:

• Frequency distribution – is a systematic
arrangement of numeric values from the
lowest to the highest, together with a count
(or percentage) of the number of times each
value was obtained.

• Central Tendency – Mode, Median, Mean

• Variability – Range, Standard Deviation,
Variance, contingency tables.

• Used to describe the basic features of the data
in a study.

• Provide simple summaries about the sample
and the measures.

• Describe what is or what the data show.

• Help us to simplify large amounts of data in a
sensible way.

• Include graphs, charts and tables and the
calculation of various descriptive measures
such as averages (means) and measures of
variation (standard deviations)

• Descriptive statistics are used to summarize a
collection of data and presented in a way that can
be easily and clearly understood.

• Used to describe what is or what the data show
based on the sample.

• Descriptive statistics are used to present
quantitative descriptions in a manageable form.

Organizing & Displaying Data for Numerical
(quantitative) Variable

Measures of Central Measures of
Tendency Dispersion

• Mean • Variance
• Median • Standard deviation
• Mode • Max, min, range,

inter quartile range

Summarize Variance
standard
Measures of deviation
Dispersion min,max,range
Inter-quartile
Desriptive Describe the
characteristic of range
mean
variables
median
Measures
central of mode
tendency

Inferential statistical analysis:

• Involves using information from a sample to
make inferences, or estimates, about the
population.

• The inferential approach helps to decide whether
the outcome of the study is a result off actors
planned within design of the study or determined
by chance.

• Inferential statistics Investigate questions,
models, and hypotheses; infer population
characteristics based on sample, and make
judgments about what researchers observe.

• Data are described with descriptive
statistics, and then additional statistical
manipulations are done to make inferences
about the likelihood that the outcome was due
to chance through inferential statistics.

• Used to make inferences concerning
some unknown aspect of a population.

• Used to make conclusions about a
population based on information obtained
from a sample of the population.

• Provide conclusions solely on what one
already knows.

Type Inferential testing

Hypothesis
testing

Inferential Test Group
test Differences

Test
Relationship

Hypothesis Testing
• Researchers use a variety of statistical tests to

make inferences about the validity of their
hypothesis.

• Statistical hypothesis testing provides objective
criteria for deciding whether research
hypotheses should be accepted as true or
rejected as false.

Test Group T-test
Difference Analysis of
Variances(ANOVA)
Chi-Square Test

Test of group differences
want to know whether two populations differ with
respect to their mean scores on some response
variable.

The frequently used bivariate tests are:
i) t-Test: to test the statistical significance of a

difference between the means of two groups.

ii) Analysis of Variance (ANOVA): is a
parametric procedure used to test
mean group differences of three or
more groups. The statistic computed
in an ANOVA is F ratio.

iii) Chi-Square Test: a nonparametric
procedure used to test hypotheses
about the proportion of cases that fall
into various categories, as in a
contingency table.


Click to View FlipBook Version