Advanced Regression
Poisson Nega)ve
Regression Binomial
MReugl)rensosmioina l
Zero Inflated
AGENDA
© 2013 ExcelR Solutions. All Rights Reserved
Multinomial Regression
• Logis'c regression (Binomial distribu'on) is used when output has ‘2’ categories
• Mul'nomial regression (classifica'on model) is used when output has > ‘2’ categories
• Extension to logis'c regression
• No natural ordering of categories
Mode of Car Carpool Bus Rail All modes
transport
Count 218 32 81 122 453
• RPreosbpaobnilsiety v ariable0 h.4a8s > ‘2’ catego0r.0ie7s & hence w0e.1 a8p ply mul'lo0g.2it7 1
• Understand the impact of cost & 'me on the various modes of transport
© 2013 ExcelR Solutions. All Rights Reserved
Multinomial Regression
• Whether we have ‘Y’ (response) or ‘X’ (predictor), which is categorical with ‘s’ categories
ü Lowest in numerical / lexicographical value is chosen as baseline / reference
ü Missing level in output is baseline level
ü We can choose the baseline level of our choice based on ‘relevel’ func'on in R
ü Model formulates the rela'onship between transformed (logit) Y & numerical X linearly
ü Modeling quan'ta've variables linearly might not always be correct
© 2013 ExcelR Solutions. All Rights Reserved
Multinomial Regression - Output
Itera'on History:
• Itera've procedure is used to compute maximum likelihood es'mates
• # itera'ons & convergence status is provided
• -2logL = 2 * nega've log likelihood
• -2logL has χ2 distribu'on, which is used for hypothesis tes'ng of goodness of fit
# parameters = 27
© 2013 ExcelR Solutions. All Rights Reserved
Multinomial Regression - Output
• ‘car’ has been chosen as baseline
• x = vector represen'ng the values of all inputs
Log(P(choice = carpool | x) / P(choice = car | x) = β20 + β21 * cost.car + β22 * cost.carpool + …………….
This equa'on compares the log of probabili'es of carpool to car
• The regression coefficient 0.636 indicates that for a ‘1’ unit increases the ‘cost.car’, the log odds of ‘carpool’ to ‘car’
increases by 0.636
• Intercept value does not mean anything in this context
• If we have a categorical X also, say Gender (female = 0, male = 1), then regression coefficient (say 0.22) indicates
that rela've to females, males increase the log odds of ‘carpool’ to ‘car’ by 0.22
© 2013 ExcelR Solutions. All Rights Reserved
Probability
• Let p = p(x | A) be the probability of any event (say airi'on) under condi'on A (say
gender = female)
O dds
• Then p(x | A) ÷ (1 - p(x | A) is called the odds associated with the event
Odds Ratio
• If there are two condi'ons A (gender = female) & B (gender = male) then the ra'o
p(x | A) ÷ (1 - p(x | A) / p(x | B) ÷ (1 - p(x | B) is called as odds ra'o of A with respect to B
Relative Risk
• p(x | A) ÷ p(x | B) is called as rela've risk
hips://en.wikipedia.org/wiki/Rela've_risk
© 2013 ExcelR Solutions. All Rights Reserved
Odds Ratio
• Odds ra'o is computed from the coefficients in the linear model equa'on by simply
exponen'a'ng
• Exponen'ated regression coefficients are odds ra'o for a unit change in a predictor
variable
• The odds ra'o for a unit increase in cost.car is 1.88 for choosing carpool vs car
© 2013 ExcelR Solutions. All Rights Reserved
Goodness of fit
Linear GLM
Analysis of Variance Analysis of Deviance
Residual Deviance Residual Sum of Squares
OLS Maximum Likelihood
• Residual Deviance is -2 log L
• Adding more parameters to the model will reduce Residual Deviance even if it is not
going to be useful for predic'on
• In order to control this, penalty of “2 * number of parameters” is added to to
Residual deviance
• This penalized value of -2 log L is called as AIC criterion
• AIC = -2 log L + 2 * number of parameters
Note: “Mul'logit Model with Interac(on”
© 2013 ExcelR Solutions. All Rights Reserved