2 years ago
#48745
erikuser23
Logistic regression using statsmodel formula
I am new to using Python and had a simple question on using statsmodels. Any help would be greatly appreciated! So I have an example where I want to look at the association between variable Y and disease_A. Y is a dummy variable (0,1), age is continuous, race is categorical with 3 levels (1=white,2=black,3=other), and disease is dummy variable(0,1). My questions are:
- For a logistic regression model, do I have to make the dummy variables into the format "C(variable, Treatment)" like I did for categorical variables?
- Does the code as it is written below seem correct for a simple logistic regression model using statsmodels if I wanted to control for age, race, and sex as I looked at the association between variable Y and the presence of disease_A?
- Is it necessary to add a constant to the regression if all of my categorical variables have a reference value?
Code:
import pandas as pd
import statsmodels.formula.api as smf
import numpy as np
logit_model = smf.logit("Y ~ age + sex + C(race, Treatment) + disease_A", data=df)
result=logit_model.fit()
print(result.summary())
np.exp(result.params)
params = result.params
conf = result.conf_int()
conf['Odds Ratio'] = params
conf.columns = ['5%', '95%', 'Odds Ratio']
print(np.exp(conf))
python
logistic-regression
statsmodels
0 Answers
Your Answer