1 year ago
#67375
Nancy Gomez
Comparing 2 dataframes by ID
I am very new to Python. I want to compare two dataframes. They both have the same columns, first column is the key variable (ID). My goal is to print the differences.
For example:
import pandas as pd
import numpy as np
dframe1 = {'ID': [1, 2, 3, 4, 5], 'Apple': ['C', 'B', 'C', 'A', 'E'], 'Pear': [2, 3, 5, 6, 7]}
dframe2 = {'ID': [4, 2, 1, 3], 'Apple': ['A', 'C', 'C', 'C'], 'Pear': [6, 'NA', 'NA', 5]}
df1 = pd.DataFrame(dframe1)
df2 = pd.DataFrame(dframe2)
import datacompy
compare=datacompy.Compare(
df1,
df2,
df1_name='Reference',
df2_name='Test',
on_index=True
)
print(compare.report())
This produces a comparison report but I want my output to be like the following. Columns of my desired output:
out1 = {'var.x': ['Apple', 'Pear', 'Pear'], 'var.Y': ['Apple', 'Pear', 'Pear'], 'ID': [2, 1, 2],'values.x': ['B', '2', '3'], 'values.Y': ['C','NA','NA'],'row.x': [2, 1, 4], 'row.y': [2, 3, 1]}
outp = pd.DataFrame(out1)
print(outp)
Thanks a lot for your support.
python
jupyter-notebook
compare
0 Answers
Your Answer