Home > Back-end >  Generating a dataframe based off the diff between two dataframes
Generating a dataframe based off the diff between two dataframes

Time:10-27

I have 2 data frames that look like this

Df1 
City    Code    ColA    Col..Z
LA      LAA  
LA      LAB
LA      LAC


Df2 
Code    ColA    Col..Z
LA      LAA 
NY      NYA
CH      CH1

What I'm trying to do have the result of

df3 
Code ColA   Col..Z
NY  NYA
CH  CH1

Normally I would loop through each row in df2 and say:

Df3 = If df2.row['Code'] in df1 then drop it.

But I want to find a pythonic pandas way to do it instead of looping through the dataframe. I was looking at examples using joins or merging but I cant seem to work it out.

CodePudding user response:

To keep only the different items in df2 based on the code column, you can do something like this, using drop_duplicates :

df2[df2.code.isin(
    # the different values in df2's 'code' column
    pd.concat([df1.code, df2.code]).drop_duplicates(keep=False)
)]

CodePudding user response:

This Df3 = If df2.row['Code'] in df1 then drop it. translates to

df3 = df2[~df2['Code'].isin(df1['City'] ]

CodePudding user response:

There is a df1

df2 = pd.read_clipboard()
df2

df2

df1.compare(df2).drop('self', axis=1, level=1).droplevel(1, axis=1)

df compare diff

(And I'm making an assumption you had a typo in your dataframes with the City col missing from df2?)

  • Related