I have two dataframes I need to match by row. Where a match occurs I need to increment the value 1 in of a field in df1. df2 has mulitple matches to df1. I don't want to merge the dataframes, just update df1 based off a match to df2.
The basic logic in my head is read the first row of df1, then try to match TRANID to each row of df2. When a match occurs, add 1 to the NUMINSTS value. Then loop back and do the same for the next row on df1. I'm just not sure how to approach this in Python/Pandas.
I'm an old COBOL programmer and am just learning Python/Pandas so any help is greatly appreciated.
Input Data
df1:
TRANID NUMINSTS
60000022 22
60000333 6
70000001 15
70000233 60
df2:
TRANID
60000333
70000233
70000233
Output
df3:
TRANID NUMINSTS
60000022 22
60000333 7 #incremented by 1
70000001 15
70000233 62 #incremented by 2
CodePudding user response:
We can filter based on the values in df2 and keep adding or changing values in df1.
import pandas as pd
df1 = pd.DataFrame({"TRAINID":["60000022", "60000333", "70000001", "70000233"], "NUMINSTS":[22,6,15,60]})
df2 = pd.DataFrame({"TRAINID":[ "60000333", "70000233", "70000233"]})
def add_num(df1,df2):
for id in list(df2["TRAINID"]):
df1.loc[df1["TRAINID"] == id, "NUMINSTS"] = 1
return df1
df3 = add_num(df1,df2)
print(df3)
CodePudding user response:
You want two cases:
- Tranid exists in df1
- Tranid doesn't exist in df1.
Here is your code:
import pandas as pd
df1=pd.DataFrame({'tranid':[1,2,3],'numinst':[2,4,6]})
df2=pd.DataFrame({'tranid':[1,2,4]})
tranvalues=df1['tranid']
for i in range(len(df2)):
if df2['tranid'][i] in tranvalues:
df1['numinst'][df1['tranid']==df2['tranid'][i]]=df1['numinst'] 1
else:
df1.loc[len(df1.index)]=[df2['tranid'][i],1]
CodePudding user response:
You may try:
df1 = pd.DataFrame({'TRANID':[60000022, 60000333, 70000001, 70000233],
'NUMINSTS':[22,6,15,60]})
df1:
TRANID NUMINSTS
0 60000022 22
1 60000333 6
2 70000001 15
3 70000233 60
df2 = pd.DataFrame({'TRANID':[60000333, 70000233, 70000233]})
df2:
TRANID
0 60000333
1 70000233
2 70000233
Build a dictionary of counts of TRANID values from df2:
d = df2['TRANID'].value_counts().to_dict()
Copy df3 from df1 and update the NUMINSTS column like if the TRANID is in the above dictionary , increment by the value count otherwise keep it the same:
df3 = df1.copy()
df3['NUMINSTS'] = df3.apply(
lambda row:
row['NUMINSTS'] d[row['TRANID']] if row['TRANID'] in d else row['NUMINSTS'], axis=1)
If you don't want the rows that don't match, you could replace None
like below and then drop those with None
values:
df3['NUMINSTS'] = df3.apply(
lambda row:
row['NUMINSTS'] d[row['TRANID']] if row['TRANID'] in d else None, axis=1)
df3.dropna(subset=['NUMINSTS'], inplace=True)
df3['NUMINSTS'] = df3['NUMINSTS'].astype(int)
df3.reset_index(inplace=True,drop=True)
Output df3:
TRANID NUMINSTS
0 60000333 7
1 70000233 62