toto =
C1Date | C1Type | C2Date | C2Type | ..... | C10Date | CType10 | PolDate |
---|---|---|---|---|---|---|---|
dd-mm-yyyy | :Proposer | NaT | NaN | NaT | NaN | dd-mm-yyyy | |
dd-mm-yyyy | :Proposer | NaT | NaN | NaT | NaN | dd-mm-yyyy | |
dd-mm-yyyy | :Other | dd-mm-yyyy | Proposer | NaT | NaN | dd-mm-yyyy | |
dd-mm-yyyy | :Proposer | NaT | NaN | NaT | NaN | dd-mm-yyyy | |
dd-mm-yyyy | :Other | dd-mm-yyyy | Other | NaT | NaN | dd-mm-yyyy |
where C
refers to Claim
and so on. i.e there is upto 10 Claims
on a row.
I need to identify if any of the Claims
is from the Proposer
and for those claims did they occur within 3 years from the PolDate
(PolDate
is always greater than any Cdate
)
I was able to do the following but I cannot get the date subtraction to work within the loop:
CLM = {}
for i in range(1 , 11):
CLM[i] = toto.loc[toto[f'C{i}Type'] == 'Proposer']
#can't get this date subtraction to work within the loop. But can do the subtraction outside of the loop.
CLM[i]['diff'] = (CLM[i]['PolDate'].sub(CLM[i][f'C{i}Date'],
axis=0)).dt.days
use_cols = ['CustomerID', f'C{i}Type', f'C{1}Date', 'PolDate ']
CLM[i] = CLM[i][use_cols]
print("Claim:" f'{i}' " " str(CLM[i].shape))
ERROR:
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
Also, cannot get the 3-year comparison to work:
if (CLM[1]['diff'] > 1095):
#1095 = (365 * 3):
CLM[1]['CLMLAST3'] = 0
else:
CLM[1]['diff'] = 1
ERROR:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
CodePudding user response:
In short, try this, it works for me: (Don't know much pandas so maybe efficiency is your domain, I just removed the errors from the posted code)
CLM = {}
for i in range(1 , 11):
CLM[i] = toto.loc[toto[f'C{i}Type'] == 'Proposer']
#can't get this date subtraction to work within the loop. But can do the subtraction outside of the loop.
**CLM.get(i).loc[:, 'diff'] = (pd.to_datetime(CLM[i]['PolDate'],format='%d-%m-%Y').sub(pd.to_datetime(CLM[i][f'C{i}Date'],format='%d-%m-%Y'))).dt.days**
use_cols = ['CustomerID', f'C{i}Type', f'C{1}Date', 'PolDate ']
CLM[i] = CLM[i][use_cols]
print("Claim:" f'{i}' " " str(CLM[i].shape))
Points to note:
The warning "A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead" is also there in this code too. Because CLM[i]['diff'] is different from CLM[i].loc['diff']. See here: https://stackoverflow.com/a/53954986/5772008
CLM[i]['PolDate'] is a "list" of strings, so you don't subtract a string from a string, but you can subtract a pandas datetime object from another. So, converted them first to a datetime object, then subtracted.
Same is with your extra question where you compare a list with a value, see this https://stackoverflow.com/a/53830333/5772008 In short, you most probably want this: "if (CLM[1]['diff'].all() > 1095)", so it compares every value in the series, not the whole series with one value.