Comparing two df's of different lengths, receiving "Can only compare identically-labeled S-CodePudding

Hi I am trying to compare two df's. If the same ID appears in df2 I want to add it to df1 to a "count" column.

After this, I want to then add all rows from df2 with unique ID's to df1.

The first time I run the code it works (df1 count is increased and the new ID's are added as new rows), the second time however the "Can only compare identically-labeled Series objects" error is raised. I assume this is due to the number of rows of df1 differing to df2. Is there a way to resolve this or another approach I should take?

import pandas as pd

boxes1 = {'ID': ['1','2','3','4','5','6','7','8'],
         'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
        }

boxes2 = {'ID': ['9','10','1','4','5','14','7','8'],
         'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
        }

df1 = pd.DataFrame(boxes1, columns= ['ID','Shape'])
df2 = pd.DataFrame(boxes2, columns= ['ID','Shape'])


df1['count'] = 0
df2['count'] = 0

#Compares dfs and adds 1 to df1 if same ID appears
df1.loc[(df3["ID"] == df1["ID"]), 'count']  = 1

#Compares dfs and adds any unique IDs to df1
df1 = pd.concat([df1, df3]).drop_duplicates('ID').reset_index(drop=True)

CodePudding user response：

You can use .isin() which does not require same number of rows and index for df1 and df2, as follows:

replace your code df1.loc[(df3["ID"] == df1["ID"]), 'count'] = 1 to the following code:

df1.loc[df1['ID'].isin(df2['ID']), 'count']  = 1

Result:

print(df1)

  ID      Shape  count
0  1  Rectangle      1
1  2  Rectangle      0
2  3     Square      0
3  4  Rectangle      1
4  5     Square      1
5  6     Square      0
6  7     Square      1
7  8  Rectangle      1

Then, after your last line of code:

(modified df3 to df2):

df1 = pd.concat([df1, df2]).drop_duplicates('ID').reset_index(drop=True)

Result:

print(df1)

    ID      Shape  count
0    1  Rectangle      1
1    2  Rectangle      0
2    3     Square      0
3    4  Rectangle      1
4    5     Square      1
5    6     Square      0
6    7     Square      1
7    8  Rectangle      1
8    9  Rectangle      0
9   10  Rectangle      0
10  14     Square      0

CodePudding user response：

You can use groupby_agg:

df1 = pd.concat([df1, df2]) \
        .groupby('ID', sort=False) \
        .agg({'ID': 'first', 'Shape': 'first', 'count': lambda x: len(x)-1}) \
        .reset_index(drop=True)

>>> out
    ID      Shape  count
0    1  Rectangle      1
1    2  Rectangle      0
2    3     Square      0
3    4  Rectangle      1
4    5     Square      1
5    6     Square      0
6    7     Square      1
7    8  Rectangle      1
8    9  Rectangle      0
9   10  Rectangle      0
10  14     Square      0