1 year ago

#67375

test-img

Nancy Gomez

Comparing 2 dataframes by ID

I am very new to Python. I want to compare two dataframes. They both have the same columns, first column is the key variable (ID). My goal is to print the differences.

For example:

import pandas as pd
import numpy as np
dframe1 = {'ID': [1, 2, 3, 4, 5], 'Apple': ['C', 'B', 'C', 'A', 'E'], 'Pear': [2, 3, 5, 6, 7]}
dframe2 = {'ID': [4, 2, 1, 3], 'Apple': ['A', 'C', 'C', 'C'], 'Pear': [6, 'NA', 'NA', 5]}
df1 = pd.DataFrame(dframe1)  
df2 = pd.DataFrame(dframe2)  

import datacompy
compare=datacompy.Compare(
    df1,
    df2,
    df1_name='Reference',
    df2_name='Test',
    on_index=True
)
print(compare.report())

This produces a comparison report but I want my output to be like the following. Columns of my desired output:

out1 = {'var.x': ['Apple', 'Pear', 'Pear'], 'var.Y': ['Apple', 'Pear', 'Pear'], 'ID': [2, 1, 2],'values.x': ['B', '2', '3'], 'values.Y': ['C','NA','NA'],'row.x': [2, 1, 4], 'row.y': [2, 3, 1]}

outp = pd.DataFrame(out1) 
print(outp)

Thanks a lot for your support.

python

jupyter-notebook

compare

0 Answers

Your Answer

Accepted video resources