1 year ago

#72770

test-img

Berkay Sunal

Fuzzy matching Countries in R

for an assignment I have to use fuzzy matching in R to merge two different datasets that both had a "Country" column. The first dataset is from Kaggle(Countries dataset) while the other is from ISO 3166 standard. I already use fuzzy matching it worked well. I add both data sets a new column that counts a number of observations(it is a must for fuzzy matching as far as I understand) 1 from their respectable lengths. That I named "Observation number" For my first dataset, there are 227 observations and for the ISO dataset, there are 249 observations.

I want to create a new dataset that includes columns from my first dataset(I had to use this data set specifically it has columns like migration, literacy, etc) and Country codes from the ISO dataset. I couldn't manage to do it. fuzzy matching output gave me how the first data set's observation numbers change in the ISO dataset. (For example in the first dataset countries ordered such as Afghanistan, Albania, Algeria.... whilst in ISO order in Albania, Algeria, Afghanistan) so for that fuzzy match output gave me 3,1,2... I understand this means 3rd observation in the ISO dataset is 1st in the Countries dataset.

I want to create a new data set that has all the information on the Countries datasets ordered withrespect to ISO datasets' Country columns' order.

However i cannot do it using

a=(Result1$matches)$observationnumber
#gives me vector a, where can I find i'th observation of Country dataset in ISO dataset 

countryorderedlikeISO <- countries.of.the.world[match(c(a), countries.of.the.world$observation),]

It seems to ignore the countries that are present in ISO but not in the country dataset.

What can I do? I want this new dataset to be in ISO's length, with NA values for observations that are present in ISO but not in Country.

r

sorting

dataset

data-cleaning

fuzzyjoin

0 Answers

Your Answer

Accepted video resources