I am currently writing a function in pandas to try to check rows in a column to see if they are not null. If they are not null, I want something to be outputed to a new column and for this case it would be 'Financing'. Basically if a row has a value for loan funded date, I want the phrase Financing to be printed to a new column called Payment Type. On top of that, I have a column called Claim Approved Date that I want the same thing to be done but output 'Insurance'. Lastly, if both columns are null I would like Cash/Credit to be outputed to Payment Type. For some reason I cannot get this to work. I want it done as a function but I can't seem to perform my desired outcome using this method.
def typepayment(x):
m = x['Loan Funded Date'].notna()
b = x['Claim Approved Date'].notna()
for a in m:
if a == True:
x['Payment Type'] = 'Financing'
for d in b:
if d == True:
x['Payment Type'] = 'Insurance'
# If loan funded date is false and claim approved date is false
# Payment Type is Cash/Credit
return x
df2 = df.apply(typepayment, axis = 1)
df2
This is what I got so far but I have a feeling it is extremely off. Any pointers would be very appreciated! Thank you so much.
CodePudding user response:
Instead of iterating over the rows you can use:
Edit: Just read the last part about 'Cash/Credit' when both are null.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Loan Funded Date': [1, np.nan, np.nan], 'Claim Approved Date': [np.nan, 1, np.nan]})
df['Payment Type'] = (
df['Loan Funded Date']
.notnull()
.replace({True: 'Financing', False: np.nan})
.combine_first(
df['Claim Approved Date']
.notnull()
.replace({True: 'Insurance', False: 'Cash/Credit'})
)
)
Output:
Loan Funded Date Claim Approved Date Payment Type
0 1.0 NaN Financing
1 NaN 1.0 Insurance
2 NaN NaN Cash/Credit
CodePudding user response:
IIUC, there are two approaches
1. Using Apply
import numpy as np
import pandas as pd
def typepayment(x):
is_na = x.isna() # Mask to see which fields have NaN
if not is_na['Loan Funded Date']:
return "Financing"
elif not is_na['Claim Approved Date']:
return "Insurance"
else:
return 'Cash/Credit'
# Usage
df['Payment_Type'] = df.apply(typepayment, axis = 1)
Test
data = {'Loan Funded Date':["01-02-2022", np.nan, np.nan],
'Claim Approved Date':['03--17-2022', '04-15-2022', np.nan]}
df = pd.DataFrame(data)
df['Payment_Type'] = df.apply(typepayment, axis = 1)
print(df)
Output
Loan Funded Date Claim Approved Date Payment_Type
0 01-02-2022 03--17-2022 Financing
1 NaN 04-15-2022 Insurance
2 NaN NaN Cash/Credit
2. Using numpy.where
Note: This method is faster than apply on large datasets
df['Payment_Type'] = np.where(~df['Loan Funded Date'].isna(),
"Financing",
np.where(~df['Claim Approved Date'].isna(), "Insurance", "Cash/Credit"))
Test - Output same as Method 1