2 years ago

#48745

test-img

erikuser23

Logistic regression using statsmodel formula

I am new to using Python and had a simple question on using statsmodels. Any help would be greatly appreciated! So I have an example where I want to look at the association between variable Y and disease_A. Y is a dummy variable (0,1), age is continuous, race is categorical with 3 levels (1=white,2=black,3=other), and disease is dummy variable(0,1). My questions are:

  1. For a logistic regression model, do I have to make the dummy variables into the format "C(variable, Treatment)" like I did for categorical variables?
  2. Does the code as it is written below seem correct for a simple logistic regression model using statsmodels if I wanted to control for age, race, and sex as I looked at the association between variable Y and the presence of disease_A?
  3. Is it necessary to add a constant to the regression if all of my categorical variables have a reference value?

Code:

import pandas as pd
import statsmodels.formula.api as smf
import numpy as np
logit_model = smf.logit("Y ~ age + sex + C(race, Treatment) + disease_A", data=df)
result=logit_model.fit()
print(result.summary())
np.exp(result.params)
params = result.params
conf = result.conf_int()
conf['Odds Ratio'] = params
conf.columns = ['5%', '95%', 'Odds Ratio']
print(np.exp(conf))

python

logistic-regression

statsmodels

0 Answers

Your Answer

Accepted video resources