The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

ExcelR is now offering data science course in chennai data science is global and it is comprehensive in the market . ExcelR is consider best data science institute where 400+ were placed in multinational companies .Here the total life cycle is covered in course

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by sanghavipatlori45, 2019-05-03 06:45:37

data scientist course in chennai

ExcelR is now offering data science course in chennai data science is global and it is comprehensive in the market . ExcelR is consider best data science institute where 400+ were placed in multinational companies .Here the total life cycle is covered in course

Keywords: data science course,data science training,data science course chenaai,certification

Advanced Regression

Poisson Nega)ve
Regression Binomial
MReugl)rensosmioina l
Zero Inflated

AGENDA
© 2013 ExcelR Solutions. All Rights Reserved

Multinomial Regression

•  Logis'c regression (Binomial distribu'on) is used when output has ‘2’ categories

•  Mul'nomial regression (classifica'on model) is used when output has > ‘2’ categories

•  Extension to logis'c regression

•  No natural ordering of categories

Mode of Car Carpool Bus Rail All modes
transport

Count 218 32 81 122 453

•  RPreosbpaobnilsiety v ariable0 h.4a8s > ‘2’ catego0r.0ie7s & hence w0e.1 a8p ply mul'lo0g.2it7 1

•  Understand the impact of cost & 'me on the various modes of transport

© 2013 ExcelR Solutions. All Rights Reserved

Multinomial Regression

•  Whether we have ‘Y’ (response) or ‘X’ (predictor), which is categorical with ‘s’ categories
ü  Lowest in numerical / lexicographical value is chosen as baseline / reference
ü  Missing level in output is baseline level
ü  We can choose the baseline level of our choice based on ‘relevel’ func'on in R
ü  Model formulates the rela'onship between transformed (logit) Y & numerical X linearly
ü  Modeling quan'ta've variables linearly might not always be correct

© 2013 ExcelR Solutions. All Rights Reserved

Multinomial Regression - Output

Itera'on History:
•  Itera've procedure is used to compute maximum likelihood es'mates
•  # itera'ons & convergence status is provided
•  -2logL = 2 * nega've log likelihood
•  -2logL has χ2 distribu'on, which is used for hypothesis tes'ng of goodness of fit

# parameters = 27

© 2013 ExcelR Solutions. All Rights Reserved

Multinomial Regression - Output

•  ‘car’ has been chosen as baseline
•  x = vector represen'ng the values of all inputs

Log(P(choice = carpool | x) / P(choice = car | x) = β20 + β21 * cost.car + β22 * cost.carpool + …………….

This equa'on compares the log of probabili'es of carpool to car

•  The regression coefficient 0.636 indicates that for a ‘1’ unit increases the ‘cost.car’, the log odds of ‘carpool’ to ‘car’
increases by 0.636

•  Intercept value does not mean anything in this context

•  If we have a categorical X also, say Gender (female = 0, male = 1), then regression coefficient (say 0.22) indicates

that rela've to females, males increase the log odds of ‘carpool’ to ‘car’ by 0.22

© 2013 ExcelR Solutions. All Rights Reserved

Probability

•  Let p = p(x | A) be the probability of any event (say airi'on) under condi'on A (say
gender = female)

O dds
•  Then p(x | A) ÷ (1 - p(x | A) is called the odds associated with the event

Odds Ratio

•  If there are two condi'ons A (gender = female) & B (gender = male) then the ra'o
p(x | A) ÷ (1 - p(x | A) / p(x | B) ÷ (1 - p(x | B) is called as odds ra'o of A with respect to B

Relative Risk

•  p(x | A) ÷ p(x | B) is called as rela've risk

hips://en.wikipedia.org/wiki/Rela've_risk

© 2013 ExcelR Solutions. All Rights Reserved

Odds Ratio

•  Odds ra'o is computed from the coefficients in the linear model equa'on by simply
exponen'a'ng

•  Exponen'ated regression coefficients are odds ra'o for a unit change in a predictor
variable

•  The odds ra'o for a unit increase in cost.car is 1.88 for choosing carpool vs car

© 2013 ExcelR Solutions. All Rights Reserved

Goodness of fit

Linear GLM
Analysis of Variance Analysis of Deviance
Residual Deviance Residual Sum of Squares
OLS Maximum Likelihood

•  Residual Deviance is -2 log L
•  Adding more parameters to the model will reduce Residual Deviance even if it is not

going to be useful for predic'on
•  In order to control this, penalty of “2 * number of parameters” is added to to

Residual deviance
•  This penalized value of -2 log L is called as AIC criterion
•  AIC = -2 log L + 2 * number of parameters

Note: “Mul'logit Model with Interac(on”

© 2013 ExcelR Solutions. All Rights Reserved


Click to View FlipBook Version