INTRODUCTION

• Econometrics

– Literal interpretation : economic measurement; Economic Theory – Mathematics - Statistics

• Mathematical economics – express economic theory in mathematical form (equations)

• Economic statistics – collecting, processing and presenting economic data in form of charts

theories. and tables. Not concerned of using the collected data to test economic

– gives empirical content to most economic theory;

– use of statistical methods to analyze economic data

• Typical goals of econometric analysis

– Estimating relationships between economic variables

– Testing economic theories and hypotheses

– Forecasting economic variables

– Evaluating and implementing government policies

1

Methodology of Econometrics

1. Statement of theory or hypothesis

Marginal propensity to consume (MPC):

Rate of change of consumption for a unit change in income (0,1)

2

Methodology of Econometrics

2. Specification of the Mathematical Model of Consumption

Positive relationship between consumption and income but no

precise functional form.

= 1 + 2

0 < 2 < 1

exact or deterministic relationship

3

Methodology of Econometrics

3. Specification of the Econometric Model of Consumption

Relationship between economic variables are generally inexact.

Other variables may influence consumption expenditure:

• size of family

• ages of members in the family

• To allow for the inexact relationships between economic variables:

= 1 + 2 +

u : disturbance term @ error term

: random (stochastic) variable with well-defined probabilistic properties

: represent other factors that affect consumption but are not taken into

account explicitly.

An example of econometric model

Linear regression model

4

Methodology of Econometrics

4. Obtaining data

Data is needed to obtain the values of 1 2

5

Methodology of Econometrics

5. Estimation of Econometric Model

Obtaining parameter estimates by regression analysis

= −299.5913 + 0.7218

The estimated consumption function:

-fits the data quite well, data points very

close to regression line.

-slope coefficient (MPC) about 0.72

For period 1960-2005, an increase in real

income of one dollar led, on average, to an

increase of about 72 cents in real

consumption expenditure

6

Methodology of Econometrics

6. Hypothesis testing

Is 0.72 statistically less than 1?

Confirmation of economic theories on basis of sample evidence

– statistical inference / hypothesis testing

7

Methodology of Econometrics

7. Forecasting or Prediction

Predict the future value(s) of the dependent, or forecast, variable Y

on the basis of the known or expected future value(s) of the explanatory, or

predictor, variable X

2006 = −299.5913 + 0.7218 11319.4 = 7870.7516

Given the value of GDP for 2006 is 11319.4 billion dollars, the mean, or

average, forecast consumption expenditure is about 7870 billion dollars.

Actual figure : 8044 billion dollars

Forecast error: 174 billion dollars

8

Methodology of Econometrics

8. Use of the Model for control or Policy Purposes

Suppose government target 8750 billion of consumption

expenditure, what is the level of income needed?

8750 = −299.5913 + 0.7218 2006

An income level of about 12537 (billion) dollars, given an MPC

of about 0.72, will produce an expenditure of about 8750 billion

dollars.

An estimated model may be used for control, or policy

purposes.

Control variable X

Target variable Y

9

10

11

12

Log-log

ln( ) = 1 + 2ln( ) +

2 is the elasticity of y with respect to x

2 is approximately the percentage change in y, given a percent change in x

Log-linear

ln( ) = 1 + 2 +

1 unit change in x leads to approximately a 100 2 % change in y

Linear-log

= 1 + 2ln( ) +

1% change in x leads to approximately 2/100 unit change in y

THE NATURE OF REGRESSION ANALYSIS

Regression

main tool of econometrics

Concerned with the study of the dependence of one

variable , the dependent variable, on one or more other

variables, the explanatory variables, with a view to estimating

and/or predicting the (population) mean or average value of the

former in terms of the known or fixed (in repeated sampling)

values of the latter.

TERMINOLOGY AND NOTATION

Dependent variable Explanatory variable

Explained variable Independent variable

Predictand Predictor

Regressand Regressor

Response Stimulus

Endogenous Exogeneous

Outcome Covariate

Controlled variable Control variable

THE NATURE AND SOURCES OF DATA FOR ECONOMIC

A N A LY S I S :

Types of data:

(1) Time series data

• Stationary / not stationary

(2) Cross-section data

(3) Panel, longitudinal or micropanel data

• Balanced panel

• Unbalanced panel

17

18

19

20

TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS

Population Regression Function (PRF)

Regression coefficients

Intercept and slope coefficients

Stochastic specification of PRF

The average consumption expenditure of families with weekly income of $100 is

greater than the average consumption expenditure of families with weekly income

of $80.

However individual family’s consumption expenditure does not necessarily increase

as the income level increases.

• Given the level of income Xi , an individual family’s consumption expenditure is

clustered around the average consumption of all families.

stochastic disturbance @ stochastic error term

• Expenditure of an individual family, given its income level is sum of two

components:

systematic @ deterministic + random @ nonsystematic

Sample Regression Function (SRF)

SAMPLE REGRESSION FUNCTION (SRF)

• SRF in stochastic form

sample residual term

– Primary objective in regression analysis

Estimate the PRF

On the basis of the SRF = 1 + 2 +

Recap:

SRF is an approximation of PRF. How is SRF determined?

Method of estimation: ordinary least squares (OLS)

ESTIMATION: THE METHOD OF ORDINARY LEAST SQUARES

=>

Minimize

Exercise: i

1 52.25 258.30

2 58.32 343.10

3 81.79 425.00

4 119.90 467.50

5 125.80 482.90

6 100.46 487.70

7 121.51 496.50

8 100.08 519.40

9 127.75 543.30

10 104.94 548.70

11 107.48 564.60

12 98.48 588.30

13 181.21 591.30

14 122.23 607.30

15 129.57 611.20

16 92.84 631.00

17 117.92 659.60

18 82.13 664.00

19 182.28 704.20

20 139.13 704.80

i − − ( − )( − ) −

1 52.25 258.30 -60.0535 -286.635 17213.43 82159.62

2 58.32 343.10 -53.9835 -201.835 10895.76 40737.37

3 81.79 425.00 -30.5135 -119.935 3659.64 14384.40

4 119.90 467.50 7.5965 -77.435 -588.23 5996.18

5 125.80 482.90 13.4965 -62.035 -837.26 3848.34

6 100.46 487.70 -11.8435 -57.235 677.86 3275.85

7 121.51 496.50 9.2065 -48.435 -445.92 2345.95

8 100.08 519.40 -12.2235 -25.535 312.13 652.04

9 127.75 543.30 15.4465 -1.635 -25.26 2.67

10 104.94 548.70 -7.3635 3.765 -27.72 14.18

11 107.48 564.60 -4.8235 19.665 -94.85 386.71

12 98.48 588.30 -13.8235 43.365 -599.46 1880.52

13 181.21 591.30 68.9065 46.365 3194.85 2149.71

14 122.23 607.30 9.9265 62.365 619.07 3889.39

15 129.57 611.20 17.2665 66.265 1144.16 4391.05

16 92.84 631.00 -19.4635 86.065 -1675.13 7407.18

17 117.92 659.60 5.6165 114.665 644.02 13148.06

18 82.13 664.00 -30.1735 119.065 -3592.61 14176.47

19 182.28 704.20 69.9765 159.265 11144.81 25365.34

20 139.13 704.80 26.8265 159.865 4288.62 25556.82

2246.07 10898.70 45907.91 251767.87

= = 112.3035 = = 45907 .91 = 0.1823

2 251767 .87

= = 544.935 = − = 12.9388

SRF: = 1 + 2 = . + . 31

SIMPLE REGRESSION IN EVIEWS:

• Step 1: Open Eviews

• Step 2: Click on File/New/Workfile in order to create a new file

• Step 3: Choose the frequency of the data in the case of time series data or Undated or Irregular in the

case of cross-sectional data, and specify the start and end of your data set. Eviews will open a new

window which automatically contains a constant (c) and a residual (resid) series.

• Step 4: On the command line type:

genr x=0 (press enter)

genr y=0 (press enter)

which creates two new series named x and y that contains zeros for every observation.

Open x and y as a group by selecting them and double clicking with the mouse.

• Step 5: Either type the data or copy/paste from Excel.To be able to type (edit) the data of your series

or to paste anything into the Eviews cells, the edit +/- button must be pressed.After editing the series

press the edit +/- button again to lock or secure the data.

• Step 6: Once the data have been entered into Eviews, the regression line may be estimated either by

typing

ls y c x (press enter)

on the command line, or by clicking on Quick/Estimate equation and then writing your equation (y c x) in

the new window.

32

Dependent Variable: Y

Method: Least Squares

Date: 09/28/16 Time: 15:40

Sample: 1 20

Included observations: 20

Variable Coefficient Std. Error t-Statistic Prob.

C 12.93884 28.96658 0.446682 0.6604

X 0.182342 0.052064 3.502275 0.0025

R-squared 0.405272 Mean dependent var 112.3035

Adjusted R-squared 0.372231 S.D. dependent var 32.97140

S.E. of regression 26.12385 Akaike info criterion 9.458214

Sum squared resid 12284.20 Schwarz criterion 9.557787

Log likelihood -92.58214 Hannan-Quinn criter. 9.477651

F-statistic 12.26593 Durbin-Watson stat 2.326386

Prob(F-statistic) 0.002544

33

THE CLASSICAL LINEAR REGRESSION MODEL

= 1 + 2 +

Assumptions (pertain to PRF):

1. Linear in the parameters.

2. Fixed X values in repeated samples (fixed regressor)

3. Zero mean value of disturbance ; ( ) = 0

34

THE CLASSICAL LINEAR REGRESSION MODEL

4. Constant variance of (homoscedasticity); = 2

= 2 heteroscedasticity

35

THE CLASSICAL LINEAR REGRESSION MODEL

5. No autocorrelation between the disturbances ; , = 0

6. The number of observations n must be greater than the number of parameters to be estimated

7. There must be variation in the values of the X variables

8. No exact colinearity between the X variables

9. There is no specification bias

36

Properties of Least Square Estimators

The Gauss-Markov Theorem

Given the assumptions of the classical linear regression model, the

least-squares estimators, in the class of unbiased linear estimators,

have minimum variance, that is they are BLUE (Best linear unbiased

estimator)

1. Linear function of a random variable, such as the dependent

variable Y in the regression model.

2. Unbiased: its average of expected value is equal to the true

value.

3. Minimum variance in the class of all such linear unbiased

estimators -> efficient estimator

37

MULTIPLE REGRESSION ANALYSIS

• 1 ∶

• 2 3: /

• p e2r: Muneiatscuhreantghee change in the mean value of Y

constant. in 2, holding the value of 3

•

•

ℎ 2 3

•

CM: child mortality (the number of deaths of children under five per 1000 live births)

PGNP: per capita GNP in 1980

FLR: female literacy rate (in percent)

• As PGNP increases by a dollar, on average, child mortality decreases by 0.0056 units, holding FLR constant.

• As PGNP increases by a thousand dollar, on average, the number of deaths of children under age 5 decreases by

about 5.6 per thousand live births, holding female literacy rate constant.

• As female literacy rate increases by one percentage point, on average, the number of deaths of children under

age 5 decreases by about 2.23 per thousand live births.

• 71 percent of variation in child mortality is explained by PGNP and FLR.

REGRESSION ON STANDARDIZED VARIABLES

Standardized variables with mean 0 and unit variance

Compare directly all standardized regressors.

Measure of relative strength of regressors.

Larger values of beta coefficient contributes more latively

to the explanation of Y.

-> regression through origin since 1 = − 2 = 0

1∗, 2∗:

Interpretation: If the standardized independent variable increases by one standard deviation, on average,

the dependent variable increases by 2∗ standard deviation units.

*Standardized variables

CM: child mortality (the number of deaths of children under five per 1000 live births)

PGNP: per capita GNP in 1980

FLR: female literacy rate (in percent)

• A s.d increase in PGNP leads, on average, to a 0.2026 s.d. decrease in CM, holding FLR constant.

• A s.d. increase in FLR leads, on average, to a 0.7639 s.d. decrease in CM, holding PGNP constant.

• Female literacy has more impact on child mortality than per capita GNP.

Hypotheses Testing

Assume a random variable X with a known PDF, where is the parameter of distribution.

Having a random sample size n, obtain the point estimator .

Since the true value of is rarely known, hypothesized a specific numerical value of , which is

denoted as ∗.

Question arised…

Is the estimator “compatible” with the hypothesized value of ? Is = ∗?

Null hypothesis vs Alternative hypothesis

To test the null hypothesis:

• Use the sample information to obtain the test statistic (point estimator of the unknown parameter)

• Find out the sampling distribution of the test statistic

• Use confidence interval or test of significance approach to test the null hypothesis

Example:

Labor economics: quantitative impact of education (X: number of years schooling) on wages (Y)

n = 13 Dependent Variable: WAGE

= 1 + 2 Method: Least Squares

Sample: 1 13

Included observations: 13

Variable Coefficient Std. Error t-Statistic Prob.

= −0.0144 + 0.724 C -0.014453 0.874624 -0.016525 0.9871

EDUC 0.724097 0.069581 10.40648 0.0000

Postulate 2 = 0 R-squared 0.907791 Mean dependent var 8.674708

Adjusted R-squared 0.899409 S.D. dependent var 2.959706

S.E. of regression 0.938704 Akaike info criterion 2.852004

Sum squared resid 9.692810 Schwarz criterion 2.938920

0: 2 = 0 Log likelihood -16.53803 Hannan-Quinn criter. 2.834139

1: 2 ≠ 0 F-statistic 108.2948 Durbin-Watson stat 1.737984

Prob(F-statistic) 0.000000

Theory of hypothesis testing is concerned with developing rules or procedures for deciding whether to

reject or do not reject the null hypothesis.

Example:

In a trial a jury must decide between two hypotheses.

H0:The defendant is innocent

H1:The defendant is guilty.

• The jury does not know which hypothesis is true.They must make a decision on the basis of evidence

presented.

• In the language of statistics, convicting the defendant is called rejecting the null hypothesis in favor of the

alternative hypothesis.

• That is, the jury is saying that there is enough evidence to conclude that the defendant is guilty (i.e.,

there is enough evidence to support the alternative hypothesis).

• If the jury acquits it is stating that there is not enough evidence to support the alternative hypothesis.

• The jury is not saying that the defendant is innocent, only that there is not enough evidence to

support the alternative hypothesis.That is why it is preferable to state ‘do not reject’ rather than

‘accept’.

There are two possible errors:

A Type I error occurs when we reject a true null hypothesis.

That is, a Type I error occurs when the jury convicts an innocent person.

-> probability of rejecting the true hypothesis

-> α : probability of committing Type I error

A Type II error occurs when we do not reject a false null hypothesis.

That occurs when a guilty defendant is acquitted.

-> probability of accepting the false hypothesis

-> β : probability of committing Type II error

The two probabilities are inversely related. Decreasing one increases the other.

Classical approach: a type I error is likely to be more serious in practice than a type II error.

Generally follows the practice of setting the value of α at 1 or 5 or at most 10 percent.

The dilemma of choosing the appropriate value of can be avoided by using the p value of the test

statistic.

TESTING ABOUT INDIVIDUAL REGRESSION COEEFICIENTS

Dependent Variable: CM Example – test of significance approach:

Method: Least Squares

Sample: 1 64

Included observations: 64

Variable Coefficient Std. Error t-Statistic Prob.

C 263.6416 11.59318 22.74109 0.0000

PGNP -0.005647 0.002003 -2.818703 0.0065

FLR -2.231586 0.209947 -10.62927 0.0000

R-squared 0.707665 Mean dependent var 141.5000

Adjusted R-squared 0.698081 S.D. dependent var 75.97807

S.E. of regression 41.74780 Akaike info criterion 10.34691

Sum squared resid 106315.6 Schwarz criterion 10.44811

Log likelihood -328.1012 Hannan-Quinn criter. 10.38678

F-statistic 73.83254 Durbin-Watson stat 2.186159

Prob(F-statistic) 0.000000

Calculated t-value (absolute term) > critical value

of 2.0. Reject null hypothesis. PGNP is statistically

significant at 5 percent level of significance.

The exact level of significance: the p-value

p value (probability value)

The observed or exact level of significance

The exact probability of committing a Type I error

The lowest significance level at which a null hypothesis can be rejected.

Options:

Reader decide whether to reject H0 at the given p value or

Fix at some level α and reject the null hypothesis if the p value is less than α

Example: Variable Coefficient Std. Error t-Statistic Prob.

The probability of committing

Type I error is 0.25% C 12.93884 28.96658 0.446682 0.6604

X 0.182342 0.052064 3.502275 0.0025

R-squared 0.405272 Mean dependent var 112.3035

Adjusted R-squared 0.372231 S.D. dependent var 32.97140

S.E. of regression 26.12385 Akaike info criterion 9.458214

Sum squared resid 12284.20 Schwarz criterion 9.557787

Log likelihood -92.58214 Hannan-Quinn criter. 9.477651

F-statistic 12.26593 Durbin-Watson stat 2.326386

Prob(F-statistic) 0.002544

TESTING THE OVERALL SIGNIFICANCE

Is Y linearly related to both 2 and 3?

Finding out if all the partial slope coefficients are simultaneously

equal to zero

F-test: measure of overall significance of the estimated regression.

The null hypothesis is a joint hypothesis

0: 2 = 3 = 0

1: ℎ ′ ≠ 0

Compute

Decision: If computed F > critical value of F ; if > ( − 1, − )

-> Reject 0.