Hi I am trying to compare two df's. If the same ID appears in df2 I want to add it to df1 to a "count" column.
After this, I want to then add all rows from df2 with unique ID's to df1.
The first time I run the code it works (df1 count is increased and the new ID's are added as new rows), the second time however the "Can only compare identically-labeled Series objects" error is raised. I assume this is due to the number of rows of df1 differing to df2. Is there a way to resolve this or another approach I should take?
import pandas as pd
boxes1 = {'ID': ['1','2','3','4','5','6','7','8'],
'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
}
boxes2 = {'ID': ['9','10','1','4','5','14','7','8'],
'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
}
df1 = pd.DataFrame(boxes1, columns= ['ID','Shape'])
df2 = pd.DataFrame(boxes2, columns= ['ID','Shape'])
df1['count'] = 0
df2['count'] = 0
#Compares dfs and adds 1 to df1 if same ID appears
df1.loc[(df3["ID"] == df1["ID"]), 'count'] = 1
#Compares dfs and adds any unique IDs to df1
df1 = pd.concat([df1, df3]).drop_duplicates('ID').reset_index(drop=True)
CodePudding user response:
You can use .isin()
which does not require same number of rows and index for df1
and df2
, as follows:
replace your code df1.loc[(df3["ID"] == df1["ID"]), 'count'] = 1
to the following code:
df1.loc[df1['ID'].isin(df2['ID']), 'count'] = 1
Result:
print(df1)
ID Shape count
0 1 Rectangle 1
1 2 Rectangle 0
2 3 Square 0
3 4 Rectangle 1
4 5 Square 1
5 6 Square 0
6 7 Square 1
7 8 Rectangle 1
Then, after your last line of code:
(modified df3
to df2
):
df1 = pd.concat([df1, df2]).drop_duplicates('ID').reset_index(drop=True)
Result:
print(df1)
ID Shape count
0 1 Rectangle 1
1 2 Rectangle 0
2 3 Square 0
3 4 Rectangle 1
4 5 Square 1
5 6 Square 0
6 7 Square 1
7 8 Rectangle 1
8 9 Rectangle 0
9 10 Rectangle 0
10 14 Square 0
CodePudding user response:
You can use groupby_agg
:
df1 = pd.concat([df1, df2]) \
.groupby('ID', sort=False) \
.agg({'ID': 'first', 'Shape': 'first', 'count': lambda x: len(x)-1}) \
.reset_index(drop=True)
>>> out
ID Shape count
0 1 Rectangle 1
1 2 Rectangle 0
2 3 Square 0
3 4 Rectangle 1
4 5 Square 1
5 6 Square 0
6 7 Square 1
7 8 Rectangle 1
8 9 Rectangle 0
9 10 Rectangle 0
10 14 Square 0