2 years ago
#70689
RasM10
Effect of SMOTE on Random Forest and Logistic Regression on a Cell2Cell churn dataset
I am doing an analysis of the effect of SMOTE on the performance of Random Forest and Logistic Regression. I have the following data from kaggle. The data consists of around 50000 observations and 58 variables. I trained four models on it:
- Random Forest
- Random Forest with SMOTE
- Logistic Regression
- Logistic Regression with SMOTE
I got the following results:
𝐺 − 𝑀𝑒𝑎𝑛 = sqrt(𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡y)
Question: What causes the Logistic Regression to improve a lot with SMOTE and what causes the Random Forest to not improve so much?
My thought was that it may be because of the high dimensionality but I would expect the Random Forest to do better than the Logistic Regression.
python
logistic-regression
random-forest
smote
0 Answers
Your Answer