The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

## LMCW Analitik Data 2nd Mtg

Related Publications

Discover the best professional documents and content resources in AnyFlip Document Base.

# Regression - An Overview

### LMCW Analitik Data 2nd Mtg

INTRODUCTION

• Econometrics

– Literal interpretation : economic measurement; Economic Theory – Mathematics - Statistics

• Mathematical economics – express economic theory in mathematical form (equations)

• Economic statistics – collecting, processing and presenting economic data in form of charts
theories. and tables. Not concerned of using the collected data to test economic

– gives empirical content to most economic theory;

– use of statistical methods to analyze economic data

• Typical goals of econometric analysis

– Estimating relationships between economic variables
– Testing economic theories and hypotheses
– Forecasting economic variables
– Evaluating and implementing government policies

1

Methodology of Econometrics

1. Statement of theory or hypothesis
Marginal propensity to consume (MPC):
Rate of change of consumption for a unit change in income (0,1)

2

Methodology of Econometrics

2. Specification of the Mathematical Model of Consumption
Positive relationship between consumption and income but no
precise functional form.

= 1 + 2
0 < 2 < 1

exact or deterministic relationship

3

Methodology of Econometrics

3. Specification of the Econometric Model of Consumption

 Relationship between economic variables are generally inexact.
 Other variables may influence consumption expenditure:

• size of family
• ages of members in the family
• To allow for the inexact relationships between economic variables:

= 1 + 2 +
u : disturbance term @ error term

: random (stochastic) variable with well-defined probabilistic properties
: represent other factors that affect consumption but are not taken into
account explicitly.

An example of econometric model
Linear regression model

4

Methodology of Econometrics

4. Obtaining data
Data is needed to obtain the values of 1 2

5

Methodology of Econometrics

5. Estimation of Econometric Model
Obtaining parameter estimates by regression analysis

= −299.5913 + 0.7218

The estimated consumption function:
-fits the data quite well, data points very
close to regression line.
For period 1960-2005, an increase in real
income of one dollar led, on average, to an
increase of about 72 cents in real
consumption expenditure

6

Methodology of Econometrics

6. Hypothesis testing
Is 0.72 statistically less than 1?
Confirmation of economic theories on basis of sample evidence
– statistical inference / hypothesis testing

7

Methodology of Econometrics

7. Forecasting or Prediction
Predict the future value(s) of the dependent, or forecast, variable Y
on the basis of the known or expected future value(s) of the explanatory, or
predictor, variable X

2006 = −299.5913 + 0.7218 11319.4 = 7870.7516
Given the value of GDP for 2006 is 11319.4 billion dollars, the mean, or
average, forecast consumption expenditure is about 7870 billion dollars.
Actual figure : 8044 billion dollars
Forecast error: 174 billion dollars

8

Methodology of Econometrics

8. Use of the Model for control or Policy Purposes

Suppose government target 8750 billion of consumption
expenditure, what is the level of income needed?

8750 = −299.5913 + 0.7218 2006
An income level of about 12537 (billion) dollars, given an MPC
dollars.

An estimated model may be used for control, or policy
purposes.

Control variable X
Target variable Y

9

10

11

12

Log-log
ln( ) = 1 + 2ln( ) +
2 is the elasticity of y with respect to x
2 is approximately the percentage change in y, given a percent change in x

Log-linear
ln( ) = 1 + 2 +
1 unit change in x leads to approximately a 100 2 % change in y

Linear-log
= 1 + 2ln( ) +
1% change in x leads to approximately 2/100 unit change in y

THE NATURE OF REGRESSION ANALYSIS

Regression
 main tool of econometrics
 Concerned with the study of the dependence of one

variable , the dependent variable, on one or more other
variables, the explanatory variables, with a view to estimating
and/or predicting the (population) mean or average value of the
former in terms of the known or fixed (in repeated sampling)
values of the latter.

TERMINOLOGY AND NOTATION

Dependent variable Explanatory variable
Explained variable Independent variable
Predictand Predictor
Regressand Regressor
Response Stimulus
Endogenous Exogeneous
Outcome Covariate
Controlled variable Control variable

THE NATURE AND SOURCES OF DATA FOR ECONOMIC
A N A LY S I S :

Types of data:

(1) Time series data
• Stationary / not stationary

(2) Cross-section data

(3) Panel, longitudinal or micropanel data
• Balanced panel
• Unbalanced panel

17

18

19

20

TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS

Population Regression Function (PRF)

Regression coefficients
Intercept and slope coefficients

Stochastic specification of PRF

The average consumption expenditure of families with weekly income of \$100 is
greater than the average consumption expenditure of families with weekly income
of \$80.

However individual family’s consumption expenditure does not necessarily increase
as the income level increases.

• Given the level of income Xi , an individual family’s consumption expenditure is
clustered around the average consumption of all families.

stochastic disturbance @ stochastic error term
• Expenditure of an individual family, given its income level is sum of two

components:

systematic @ deterministic + random @ nonsystematic

Sample Regression Function (SRF)

SAMPLE REGRESSION FUNCTION (SRF)

• SRF in stochastic form

sample residual term

– Primary objective in regression analysis
Estimate the PRF

On the basis of the SRF = 1 + 2 +

Recap:

SRF is an approximation of PRF. How is SRF determined?
Method of estimation: ordinary least squares (OLS)

ESTIMATION: THE METHOD OF ORDINARY LEAST SQUARES

=>
Minimize

Exercise: i
1 52.25 258.30
2 58.32 343.10
3 81.79 425.00
4 119.90 467.50
5 125.80 482.90
6 100.46 487.70
7 121.51 496.50
8 100.08 519.40
9 127.75 543.30
10 104.94 548.70
11 107.48 564.60
12 98.48 588.30
13 181.21 591.30
14 122.23 607.30
15 129.57 611.20
16 92.84 631.00
17 117.92 659.60
18 82.13 664.00
19 182.28 704.20
20 139.13 704.80

i − − ( − )( − ) −

1 52.25 258.30 -60.0535 -286.635 17213.43 82159.62

2 58.32 343.10 -53.9835 -201.835 10895.76 40737.37

3 81.79 425.00 -30.5135 -119.935 3659.64 14384.40

4 119.90 467.50 7.5965 -77.435 -588.23 5996.18

5 125.80 482.90 13.4965 -62.035 -837.26 3848.34

6 100.46 487.70 -11.8435 -57.235 677.86 3275.85

7 121.51 496.50 9.2065 -48.435 -445.92 2345.95

8 100.08 519.40 -12.2235 -25.535 312.13 652.04

9 127.75 543.30 15.4465 -1.635 -25.26 2.67

10 104.94 548.70 -7.3635 3.765 -27.72 14.18

11 107.48 564.60 -4.8235 19.665 -94.85 386.71

12 98.48 588.30 -13.8235 43.365 -599.46 1880.52

13 181.21 591.30 68.9065 46.365 3194.85 2149.71

14 122.23 607.30 9.9265 62.365 619.07 3889.39

15 129.57 611.20 17.2665 66.265 1144.16 4391.05

16 92.84 631.00 -19.4635 86.065 -1675.13 7407.18

17 117.92 659.60 5.6165 114.665 644.02 13148.06

18 82.13 664.00 -30.1735 119.065 -3592.61 14176.47

19 182.28 704.20 69.9765 159.265 11144.81 25365.34

20 139.13 704.80 26.8265 159.865 4288.62 25556.82

2246.07 10898.70 45907.91 251767.87

= = 112.3035 = = 45907 .91 = 0.1823
2 251767 .87

= = 544.935 = − = 12.9388

SRF: = 1 + 2 = . + . 31

SIMPLE REGRESSION IN EVIEWS:

• Step 1: Open Eviews

• Step 2: Click on File/New/Workfile in order to create a new file

• Step 3: Choose the frequency of the data in the case of time series data or Undated or Irregular in the
case of cross-sectional data, and specify the start and end of your data set. Eviews will open a new
window which automatically contains a constant (c) and a residual (resid) series.

• Step 4: On the command line type:

genr x=0 (press enter)

genr y=0 (press enter)

which creates two new series named x and y that contains zeros for every observation.
Open x and y as a group by selecting them and double clicking with the mouse.

• Step 5: Either type the data or copy/paste from Excel.To be able to type (edit) the data of your series
or to paste anything into the Eviews cells, the edit +/- button must be pressed.After editing the series
press the edit +/- button again to lock or secure the data.

• Step 6: Once the data have been entered into Eviews, the regression line may be estimated either by
typing

ls y c x (press enter)

on the command line, or by clicking on Quick/Estimate equation and then writing your equation (y c x) in
the new window.

32

Dependent Variable: Y
Method: Least Squares
Date: 09/28/16 Time: 15:40
Sample: 1 20
Included observations: 20

Variable Coefficient Std. Error t-Statistic Prob.

C 12.93884 28.96658 0.446682 0.6604
X 0.182342 0.052064 3.502275 0.0025

R-squared 0.405272 Mean dependent var 112.3035
Adjusted R-squared 0.372231 S.D. dependent var 32.97140
S.E. of regression 26.12385 Akaike info criterion 9.458214
Sum squared resid 12284.20 Schwarz criterion 9.557787
Log likelihood -92.58214 Hannan-Quinn criter. 9.477651
F-statistic 12.26593 Durbin-Watson stat 2.326386
Prob(F-statistic) 0.002544

33

THE CLASSICAL LINEAR REGRESSION MODEL

= 1 + 2 +

Assumptions (pertain to PRF):
1. Linear in the parameters.
2. Fixed X values in repeated samples (fixed regressor)
3. Zero mean value of disturbance ; ( ) = 0

34

THE CLASSICAL LINEAR REGRESSION MODEL

4. Constant variance of (homoscedasticity); = 2
= 2 heteroscedasticity

35

THE CLASSICAL LINEAR REGRESSION MODEL

5. No autocorrelation between the disturbances ; , = 0

6. The number of observations n must be greater than the number of parameters to be estimated
7. There must be variation in the values of the X variables
8. No exact colinearity between the X variables
9. There is no specification bias

36

Properties of Least Square Estimators

The Gauss-Markov Theorem
Given the assumptions of the classical linear regression model, the
least-squares estimators, in the class of unbiased linear estimators,
have minimum variance, that is they are BLUE (Best linear unbiased
estimator)
1. Linear function of a random variable, such as the dependent

variable Y in the regression model.
2. Unbiased: its average of expected value is equal to the true

value.
3. Minimum variance in the class of all such linear unbiased

estimators -> efficient estimator

37

MULTIPLE REGRESSION ANALYSIS

• 1 ∶

• 2 3: /

• p e2r: Muneiatscuhreantghee change in the mean value of Y
constant. in 2, holding the value of 3

ℎ 2 3

CM: child mortality (the number of deaths of children under five per 1000 live births)
PGNP: per capita GNP in 1980
FLR: female literacy rate (in percent)

• As PGNP increases by a dollar, on average, child mortality decreases by 0.0056 units, holding FLR constant.

• As PGNP increases by a thousand dollar, on average, the number of deaths of children under age 5 decreases by
about 5.6 per thousand live births, holding female literacy rate constant.

• As female literacy rate increases by one percentage point, on average, the number of deaths of children under
age 5 decreases by about 2.23 per thousand live births.

• 71 percent of variation in child mortality is explained by PGNP and FLR.

REGRESSION ON STANDARDIZED VARIABLES

Standardized variables with mean 0 and unit variance

Compare directly all standardized regressors.
Measure of relative strength of regressors.
Larger values of beta coefficient contributes more latively
to the explanation of Y.
-> regression through origin since 1 = − 2 = 0
1∗, 2∗:
Interpretation: If the standardized independent variable increases by one standard deviation, on average,
the dependent variable increases by 2∗ standard deviation units.

*Standardized variables

CM: child mortality (the number of deaths of children under five per 1000 live births)
PGNP: per capita GNP in 1980
FLR: female literacy rate (in percent)

• A s.d increase in PGNP leads, on average, to a 0.2026 s.d. decrease in CM, holding FLR constant.
• A s.d. increase in FLR leads, on average, to a 0.7639 s.d. decrease in CM, holding PGNP constant.

• Female literacy has more impact on child mortality than per capita GNP.

Hypotheses Testing

Assume a random variable X with a known PDF, where is the parameter of distribution.

Having a random sample size n, obtain the point estimator .

Since the true value of is rarely known, hypothesized a specific numerical value of , which is
denoted as ∗.

Question arised…
Is the estimator “compatible” with the hypothesized value of ? Is = ∗?

Null hypothesis vs Alternative hypothesis
To test the null hypothesis:

• Use the sample information to obtain the test statistic (point estimator of the unknown parameter)

• Find out the sampling distribution of the test statistic

• Use confidence interval or test of significance approach to test the null hypothesis

Example:

Labor economics: quantitative impact of education (X: number of years schooling) on wages (Y)

n = 13 Dependent Variable: WAGE
= 1 + 2 Method: Least Squares
Sample: 1 13
Included observations: 13

Variable Coefficient Std. Error t-Statistic Prob.

= −0.0144 + 0.724 C -0.014453 0.874624 -0.016525 0.9871
EDUC 0.724097 0.069581 10.40648 0.0000

Postulate 2 = 0 R-squared 0.907791 Mean dependent var 8.674708
Adjusted R-squared 0.899409 S.D. dependent var 2.959706
S.E. of regression 0.938704 Akaike info criterion 2.852004

Sum squared resid 9.692810 Schwarz criterion 2.938920

0: 2 = 0 Log likelihood -16.53803 Hannan-Quinn criter. 2.834139
1: 2 ≠ 0 F-statistic 108.2948 Durbin-Watson stat 1.737984
Prob(F-statistic) 0.000000

Theory of hypothesis testing is concerned with developing rules or procedures for deciding whether to

reject or do not reject the null hypothesis.

Example:

In a trial a jury must decide between two hypotheses.

H0:The defendant is innocent
H1:The defendant is guilty.

• The jury does not know which hypothesis is true.They must make a decision on the basis of evidence
presented.

• In the language of statistics, convicting the defendant is called rejecting the null hypothesis in favor of the
alternative hypothesis.

• That is, the jury is saying that there is enough evidence to conclude that the defendant is guilty (i.e.,
there is enough evidence to support the alternative hypothesis).

• If the jury acquits it is stating that there is not enough evidence to support the alternative hypothesis.
• The jury is not saying that the defendant is innocent, only that there is not enough evidence to

support the alternative hypothesis.That is why it is preferable to state ‘do not reject’ rather than
‘accept’.

There are two possible errors:

A Type I error occurs when we reject a true null hypothesis.
That is, a Type I error occurs when the jury convicts an innocent person.
-> probability of rejecting the true hypothesis
-> α : probability of committing Type I error

A Type II error occurs when we do not reject a false null hypothesis.
That occurs when a guilty defendant is acquitted.
-> probability of accepting the false hypothesis
-> β : probability of committing Type II error

The two probabilities are inversely related. Decreasing one increases the other.
Classical approach: a type I error is likely to be more serious in practice than a type II error.
Generally follows the practice of setting the value of α at 1 or 5 or at most 10 percent.

The dilemma of choosing the appropriate value of can be avoided by using the p value of the test
statistic.

Dependent Variable: CM Example – test of significance approach:
Method: Least Squares
Sample: 1 64
Included observations: 64

Variable Coefficient Std. Error t-Statistic Prob.

C 263.6416 11.59318 22.74109 0.0000
PGNP -0.005647 0.002003 -2.818703 0.0065
FLR -2.231586 0.209947 -10.62927 0.0000

R-squared 0.707665 Mean dependent var 141.5000
Adjusted R-squared 0.698081 S.D. dependent var 75.97807
S.E. of regression 41.74780 Akaike info criterion 10.34691
Sum squared resid 106315.6 Schwarz criterion 10.44811
Log likelihood -328.1012 Hannan-Quinn criter. 10.38678
F-statistic 73.83254 Durbin-Watson stat 2.186159
Prob(F-statistic) 0.000000

Calculated t-value (absolute term) > critical value
of 2.0. Reject null hypothesis. PGNP is statistically
significant at 5 percent level of significance.

The exact level of significance: the p-value

p value (probability value)
The observed or exact level of significance
The exact probability of committing a Type I error
The lowest significance level at which a null hypothesis can be rejected.

Options:
Reader decide whether to reject H0 at the given p value or
Fix at some level α and reject the null hypothesis if the p value is less than α

Example: Variable Coefficient Std. Error t-Statistic Prob.
The probability of committing
Type I error is 0.25% C 12.93884 28.96658 0.446682 0.6604
X 0.182342 0.052064 3.502275 0.0025

R-squared 0.405272 Mean dependent var 112.3035
Adjusted R-squared 0.372231 S.D. dependent var 32.97140
S.E. of regression 26.12385 Akaike info criterion 9.458214
Sum squared resid 12284.20 Schwarz criterion 9.557787
Log likelihood -92.58214 Hannan-Quinn criter. 9.477651
F-statistic 12.26593 Durbin-Watson stat 2.326386
Prob(F-statistic) 0.002544

TESTING THE OVERALL SIGNIFICANCE

Is Y linearly related to both 2 and 3?
Finding out if all the partial slope coefficients are simultaneously
equal to zero
F-test: measure of overall significance of the estimated regression.
The null hypothesis is a joint hypothesis
0: 2 = 3 = 0
1: ℎ ′ ≠ 0
Compute

Decision: If computed F > critical value of F ; if > ( − 1, − )
-> Reject 0.