2 years ago
#34046
S44
Impute values with MissForest using missingpy for categorical/object variables
I have a truncated df with the following data types:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 8061 non-null object
1 B 4906 non-null object
2 C 1092 non-null object
3 D 11008 non-null float64
I want to use the missing forest imputation algorithm on my data to fill in the nulls. According to my understanding, this method works well with numeric and categorical data. However when I run the following snippet:
import sklearn.neighbors._base
sys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base
from missingpy import MissForest
imputer = MissForest()
impute_df = df.drop('C', axis = 1)
imputed_df = imputer.fit_transform(impute_df)
I'm getting the following error:
'could not convert string to float'
Here is a truncated constructor containing objects, which gives me the same error:
my_dict = {'Age_Group__c': {0: None, 1: '65+', 2: '65+', 3: '45-54', 4: '35-44'},
'Difficulty': {0: None,
1: None,
2: 'Difficult',
3: None,
4: 'Easy'},
'Consulted_Doctor__c': {0: None, 1: None, 2: None, 3: None, 4: None},
'Income__c': {0: None, 1: '15k-35k', 2: None, 3: '75k-100k', 4: '75k-100k'},
'Relationship_Status__c': {0: None,
1: 'Single/Dating',
2: 'In a relationship/Married',
3: None,
4: None},
'Email_Domain__c': {0: 'yahoo.com',
1: 'gmail.com',
2: 'gmail.com',
3: 'ymail.com',
4: 'hotmail.com'},
'Gender__c': {0: 'Male', 1: 'Female', 2: 'Female', 3: 'Male', 4: 'Female'},
'Marital_Status__c': {0: None,
1: 'Married',
2: None,
3: 'Single',
4: 'Married'}}
pd.DataFrame.from_dict(my_dict)
python
pandas
loops
machine-learning
imputation
0 Answers
Your Answer