2 years ago

#70689

test-img

RasM10

Effect of SMOTE on Random Forest and Logistic Regression on a Cell2Cell churn dataset

I am doing an analysis of the effect of SMOTE on the performance of Random Forest and Logistic Regression. I have the following data from kaggle. The data consists of around 50000 observations and 58 variables. I trained four models on it:

  1. Random Forest
  2. Random Forest with SMOTE
  3. Logistic Regression
  4. Logistic Regression with SMOTE

I got the following results:

enter image description here

𝐺 − 𝑀𝑒𝑎𝑛 = sqrt(𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡y)

Question: What causes the Logistic Regression to improve a lot with SMOTE and what causes the Random Forest to not improve so much?

My thought was that it may be because of the high dimensionality but I would expect the Random Forest to do better than the Logistic Regression.

python

logistic-regression

random-forest

smote

0 Answers

Your Answer

Accepted video resources