I'm trying to create a new category column from two different Pandas dataframes that contain the same columns using Pandas.
The new column looks at both Df1 and Df2's 'Loan Code/Number' column, determines if they're the same then determines if the 'Del_Cat' category changed from Df1 to Df2, and if it changed categories, it will return a new result.
DF1:
Loan Code/Number | Days Delinquent | Del_Cat |
---|---|---|
1147623994 | -25 | Current |
1501719058 | -5 | Current |
1501719696 | 77 | 61-90 |
1502624989 | 87 | 61-90 |
1502625152 | 16 | Current |
DF2:
Loan Code/Number | Days Delinquent | Del_Cat |
---|---|---|
1147623994 | -22 | Current |
1801719152 | 37 | 31-60 |
1501719696 | 84 | 61-90 |
1602624414 | -6 | Current |
1502625152 | 55 | 31-60 |
I've tried creating a function that loops over the values in the tables and determines if the loan number is the same, and if it is then to check the values in the 'Del_Cat' column, bringing back a new value that states if the value has changed:
def del_cat_migration(df1, df2):
if [df1['Loan Code/Number'] == df2['Loan Code/Number']]:
if (df1['Del_Cat']) == 'Current' & (df2['Del_Cat']) == 'Current':
return 'Same - Current'
elif (df1['Del_Cat']) == 'Current' & (df2['Del_Cat']) == '31-60':
return 'Current to 31-60'
elif (df1['Del_Cat']) == '31-60' & (df2['Del_Cat']) == '31-60':
return 'Same - 31-60'
elif (df1['Del_Cat']) == '31-60' & (df2['Del_Cat']) == '61-90':
return '31-60 to 61-90'
elif (df1['Del_Cat']) == '61-90' & (df2['Del_Cat']) == '61-90':
return 'Same - 61-90'
elif (df1['Del_Cat']) == '61-90' & (df2['Del_Cat']) == 'Current':
return '61-90 to Current'
else:
return 'Other'
I then try to apply this new data to a new column in df2:
df2['Del_Migration_Cat'] = df2.apply(del_cat_migration(df1, df2), axis = 1)
I then get the following error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
CodePudding user response:
There's ways to do such things in pandas:
def del_cat_migration(row):
if row[0] == row[1]:
return f"Same - {row[0]}"
return f"{row[0]} to {row[1]}"
df3 = pd.concat([df1["Del_Cat"], df2["Del_Cat"]], axis=1)
df3
looks like
Del_Cat Del_Cat
0 Current Current
1 Current 31-60
2 61-90 61-90
3 61-90 Current
4 Current 31-60
Now apply the function, row-wise (axis=1
):
>>> df3.apply(del_cat_migration, axis=1)
0 Same - Current
1 Current to 31-60
2 Same - 61-90
3 61-90 to Current
4 Current to 31-60
CodePudding user response:
The cause: you cannot do the comparison in if statement between two columns of a dataframe because the result of this comparison is a boolean series [True, False, ....] which is invalid condition, therefore, you should use a.empty, a.bool(), a.item(), a.any() or a.all()
as indicated by the error log. The suggested option is invalid because you need only to compare between each entry in df1
and the corresponding entry in df2
. I fixed this issue as in the solution below and also correct some syntax errors in the function.
But again, your function will have a problem with df2.apply() will iterate through each row and call the function in each row with the entire dataframes which is not valid operation, as well.
def del_cat_migration(df1, df2):
bool_index = (df1['Loan Code/Number'] == df2['Loan Code/Number'])
for i in df1.index[bool_index]:
if (df1.loc[i,'Del_Cat'] == 'Current') & (df2.loc[i,'Del_Cat'] == 'Current'):
df2.loc[i,'Del_Migration_Cat'] = 'Same - Current'
elif (df1.loc[i,'Del_Cat'] == 'Current') & (df2.loc[i,'Del_Cat'] == '31-60'):
df2.loc[i,'Del_Migration_Cat'] = 'Current to 31-60'
elif (df1.loc[i,'Del_Cat'] == '31-60') & (df2.loc[i,'Del_Cat'] == '31-60'):
df2.loc[i,'Del_Migration_Cat'] = 'Same - 31-60'
elif (df1.loc[i,'Del_Cat'] == '31-60') & (df2.loc[i,'Del_Cat'] == '61-90'):
df2.loc[i,'Del_Migration_Cat'] = '31-60 to 61-90'
elif (df1.loc[i,'Del_Cat'] == '61-90') & (df2.loc[i,'Del_Cat'] == '61-90'):
df2.loc[i,'Del_Migration_Cat'] = 'Same - 61-90'
elif (df1.loc[i,'Del_Cat'] == '61-90') & (df2.loc[i,'Del_Cat'] == 'Current'):
df2.loc[i,'Del_Migration_Cat'] = '61-90 to Current'
else:
df2.loc[i,'Del_Migration_Cat'] = 'Other'
del_cat_migration(df1, df2)
But I prefer to follow the answer of @fsimonjetz, however, you need for an extra dataframe df3.
CodePudding user response:
first, let's merge df's on 'Loan Code/Number' column, then we can apply function:
df3 = pd.merge(df1, df2[['Loan Code/Number','Del_Cat']],
on='Loan Code/Number',
how='left',
suffixes=(None, '_2'))
print(df3)
'''
Loan Code/Number Days Delinquent Del_Cat Del_Cat_2
0 1147623994 -25 Current Current
1 1501719058 -5 Current NaN
2 1501719696 77 61-90 61-90
3 1502624989 87 61-90 NaN
4 1502625152 16 Current 31-60
'''
df3 = (df3.assign(Del_Cat=df3.apply(lambda x: f"same {x['Del_Cat']}"
if x['Del_Cat']==x['Del_Cat_2']
else 'Other' if x['Del_Cat_2']!=x['Del_Cat_2']
else f"{x['Del_Cat']} to {x['Del_Cat_2']}", axis=1)).
drop(columns='Del_Cat_2'))
print(df3)
'''
Loan Code/Number Days Delinquent Del_Cat
0 1147623994 -25 same Current
1 1501719058 -5 Other
2 1501719696 77 same 61-90
3 1502624989 87 Other
4 1502625152 16 Current to 31-60