Interpreting Logistic Regression Output from R

I, Jyotika Varmani, tutor students of Psychology at all levels. I reside in Mumbai and tutor students online. You can contact me personally on my e-mail id jyotikapsychology@gmail.com or call/message me on 9892507784 for enquiries.


---

R is being popularly taught across statistical courses. Psychology curriculi across universities are quickly catching up with this trend. R, though very versatile, is a software that is not as user-friendly as some other ones commonly available. Therefore, students ought to take out the time for understanding the input and output mechanisms of this software. In this post, I share with you the interpretation of a sample output generated from the software - that of logistic regression for which R is commonly employed.


-----


Call:

glm(formula = pass_fail~memory, family=binomial())


Deviance Residuals:


Min          1Q              Median      3Q            Max  

-1.68396   -0.38195    -0.16492    -0.05671  2.48238


Coefficients:


                     Estimate       Std. Error      z value    Pr(>|z|)

(Intercept)   -1.5375297   3.2927943    -0.467     0.6405

pass_fail      0.0027742    0.0041153     0.674     0.5002

--


Signif. codes

0 '***'   0.001 '**'  0.01 '*'  0.05 '.'  0.1 ' '  1


(Dispersion parameter for binomial family taken to be 1)


Null deviance: 1195.8 on 11731 degrees of freedom

Residual deviance: 8765.1 on 11724 degrees of freedom

AIC: 8781.1

Number of Fisher scoring reiterations : 5


-----


Let's understand this output, one line at a time:


Right at the top, we see the 'Call' statement. Here, R is simply telling us what it has been called on to do. This is very useful to a reader who is directly reading the output and does not know what the input has been.


On the next line, we see the letters, 'glm' which stands for general linear model. Logistic regression is a type of a generalized linear model with the specification that it involves an independent variable measured on either a dichotomous or a continuous scale and a dependent variable measured on a dichotomous scale. In the given example, R has been instructed to regress the outcome (dependent) variable 'pass_fail' on the predictor (independent) variable, 'memory'. (It appears that participants in the study from which data for the present example has been obtained were measured on their memory on a dichotomous or continuous scale first and then asked to answer some test on which they either passed on failed). 'Formula' in the line suggests which variable is to be regressed on which - the variable preceding the tilde (~) being the outcome variable and the one following it being the predictor variable. Finally, since R uses a common 'glm' command for all general linear models, we need to instruct it to specifically carry out logistic regression in a given case. This is done with the help of the parameter 'family' within the 'glm' command. In the present case, the specification 'family=binomial())' tells R that the outcome variable comes from a binomial distribution, that is, it can take on only two values, '0' or '1'.


On the next line, we see, 'Deviance Residuals,' and this is from where R starts delivering it's calculated output to us. Here, R tells us how much the residuals deviate from a normal distribution. As we know, the assumption of normality is fundamental to inferential statistics and to check for normality, we assess the distribution of errors or residuals of our model. R presents us with the median of the distribution of residuals as also it's range and inter-quartile range. In the present example, as we check for the shape of the distribution, we find that the median (-0.16492) is close to 0 and as we know, the median of a normal distribution is 0. Further, we see that the distances from the median to the first quartile on the left and to the third quartile on the right is unequal as is the distance between the median and the minimum score on the distribution; and between the median and the maximum score on the distribution. The distribution is therefore, not symmetrical, as would be expected from a perfectly normal distribution. Whether this asymmetry is acceptable is a matter of further investigation and cannot be concluded from the present output. What the output does suggest, however, is that further analysis for the assumption of normality is warranted for this model.


On the next line, begins the output concerning coefficients. Beneath this, we see the estimate of the intercept (-1.5375297). It is called an estimate because as we know, this statistic is estimating the intercept for the population i.e. the parameter. The coefficient of the intercept tells us what the value of the dependent variable is when the value of the independent variable is 0. In this case, the given estimate is the log odds for participants passing the test when they score a zero on memory. The standard error for this estimate (3.2927943) is large, which is reflected in its insignificant z-value. These log odds can be converted to an odds ratio manually or using R for clear interpretation. That would tell us how much more or less participants are likely to pass the test rather than fail it when they score a zero on the memory test. On the line beneath this, we find the log odds (0.0027742) for participants passing the test, given a rise of one unit in memory scores assuming that the predictor variable is continuous. As seen, the figure is miniscule with a tiny standard error (0.0041153) which results in an insignificant z-value. Assuming that the predictor is dichotomous instead makes this coefficient bring about the log odds between passing and failing which again can be converted into an odds ratio to determine the chances of passing over failing if a person has a memory score of 1, which in this case turns out to be insignificant. Though the significance of z-values is clear from the probabilities supplied in the last column, R also provides significance separately on the next line. Since none of our coefficients are significant, we see no asterisks placed next to their probability values as suggested on this line.


On the next line, we see a comment on the dispersion parameter assumed for the binomial distribution. R always assumes the dispersion of binomial and poisson distributions to be 1 since this is known to be the spread of exponential-family distributions.


On the next two lines, R gives us a null deviance and a residual deviance based on the degrees of freedom of our model. We need to compare these to derive meaning from them and so we discuss them together. The null deviance (1195.8) shows us that how much difference is there between the predicted values of passing and failing and their actual values if the logg odds of the intercept are taken alone. Similarly, the residual deviance (8765.1) shows us that if we take the intercept and the predictor variable - memory scores together, how much difference is there between the predicted values of passing and failing and their actual values. Comparing these, we see that there is much more deviance when we include the predictor variable into our model meaning that combined with the insignificance of the predictor coefficient, the predictor variable is ineffective in predicting passing or failing in the exam. The degree of freedom presented in each case indicates the number of observations used to derive the intercept and intercept+predictor models respectively and as we know these degrees reduce as variables are added to the base intercept-only model.


On the next line we see the AIC or the Akaike Information Criterion (8781.1). This information is not useful in and of itself. It is useful in comparing a given model with other available models. Since there is only one model presented in the present example, this information is not of use to us.


Finally, we see the number of Fisher scoring reiterations. This is the number of iterations or repetitive calculations used to fit the model in terms of the maximum likelihood algorithm used. This number is always optimal R iterates only as much is useful in deriving gains practically.


-----

(Search terms - statistics for psychology, psychological statistics, R for psychology, R software, statistics in R, R software for psychology, statistics for Masters' in Psychology, Statistics for Bachelors' in psychology, Mumbai University Statistics in Psychology)



Comments

  1. Thanks for amazing posts

    ReplyDelete
  2. Mam we find it very difficult to read table in study papers for our dissertation. Please make a guide for this also.

    ReplyDelete
    Replies
    1. Thank you for visiting my blog. Your request is noted.

      Delete
  3. Hey...iim doing a logistics regression analysis for my research paper. Can you help? I'm willing to pay any charges?

    ReplyDelete
    Replies
    1. Kindly contact me on my phone number or email id to carry forward this discussion

      Delete

Post a Comment

I would love to have your feedback or suggestions, or answer any of your queries. Feel free to express yourself below. I will get back to you as soon as possible.

Popular posts from this blog

Andrade (doodling)

A Level Psychology of Abnormality: Explanation of Phobias

Dement and Kleitman Research