2 years ago

#34046

test-img

S44

Impute values with MissForest using missingpy for categorical/object variables

I have a truncated df with the following data types:

#   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   A                                      8061 non-null   object 
 1   B                                      4906 non-null   object 
 2   C                                      1092 non-null   object 
 3   D                                      11008 non-null  float64

I want to use the missing forest imputation algorithm on my data to fill in the nulls. According to my understanding, this method works well with numeric and categorical data. However when I run the following snippet:

import sklearn.neighbors._base
sys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base

from missingpy import MissForest

imputer = MissForest()

impute_df = df.drop('C', axis = 1)
imputed_df = imputer.fit_transform(impute_df)

I'm getting the following error:

'could not convert string to float'

Here is a truncated constructor containing objects, which gives me the same error:

my_dict = {'Age_Group__c': {0: None, 1: '65+', 2: '65+', 3: '45-54', 4: '35-44'},
 'Difficulty': {0: None,
  1: None,
  2: 'Difficult',
  3: None,
  4: 'Easy'},
 'Consulted_Doctor__c': {0: None, 1: None, 2: None, 3: None, 4: None},
 'Income__c': {0: None, 1: '15k-35k', 2: None, 3: '75k-100k', 4: '75k-100k'},
 'Relationship_Status__c': {0: None,
  1: 'Single/Dating',
  2: 'In a relationship/Married',
  3: None,
  4: None},
 'Email_Domain__c': {0: 'yahoo.com',
  1: 'gmail.com',
  2: 'gmail.com',
  3: 'ymail.com',
  4: 'hotmail.com'},
 'Gender__c': {0: 'Male', 1: 'Female', 2: 'Female', 3: 'Male', 4: 'Female'},
 'Marital_Status__c': {0: None,
  1: 'Married',
  2: None,
  3: 'Single',
  4: 'Married'}}

pd.DataFrame.from_dict(my_dict)

python

pandas

loops

machine-learning

imputation

0 Answers

Your Answer

Accepted video resources