Home > database >  Doing a loop calculation
Doing a loop calculation

Time:10-30

toto =

C1Date C1Type C2Date C2Type ..... C10Date CType10 PolDate
dd-mm-yyyy :Proposer NaT NaN NaT NaN dd-mm-yyyy
dd-mm-yyyy :Proposer NaT NaN NaT NaN dd-mm-yyyy
dd-mm-yyyy :Other dd-mm-yyyy Proposer NaT NaN dd-mm-yyyy
dd-mm-yyyy :Proposer NaT NaN NaT NaN dd-mm-yyyy
dd-mm-yyyy :Other dd-mm-yyyy Other NaT NaN dd-mm-yyyy

where C refers to Claim and so on. i.e there is upto 10 Claims on a row.

I need to identify if any of the Claims is from the Proposer and for those claims did they occur within 3 years from the PolDate (PolDate is always greater than any Cdate)

I was able to do the following but I cannot get the date subtraction to work within the loop:

CLM = {}

for i in range(1 , 11):
    

    CLM[i] = toto.loc[toto[f'C{i}Type'] == 'Proposer']
    
    #can't get this date subtraction to work within the loop. But can do the subtraction outside of the loop.

    CLM[i]['diff'] = (CLM[i]['PolDate'].sub(CLM[i][f'C{i}Date'], 
    axis=0)).dt.days
   
    use_cols = ['CustomerID',  f'C{i}Type', f'C{1}Date', 'PolDate  ']
    CLM[i] = CLM[i][use_cols]
    
    print("Claim:"   f'{i}'  " "  str(CLM[i].shape))

ERROR:

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

Also, cannot get the 3-year comparison to work:

if (CLM[1]['diff'] > 1095): 
    #1095 = (365 * 3):
    CLM[1]['CLMLAST3'] = 0
else:
    CLM[1]['diff'] = 1

ERROR:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

CodePudding user response:

In short, try this, it works for me: (Don't know much pandas so maybe efficiency is your domain, I just removed the errors from the posted code)

CLM = {}

for i in range(1 , 11):
    

    CLM[i] = toto.loc[toto[f'C{i}Type'] == 'Proposer']
    
    #can't get this date subtraction to work within the loop. But can do the subtraction outside of the loop.

    **CLM.get(i).loc[:, 'diff'] = (pd.to_datetime(CLM[i]['PolDate'],format='%d-%m-%Y').sub(pd.to_datetime(CLM[i][f'C{i}Date'],format='%d-%m-%Y'))).dt.days**
   
    use_cols = ['CustomerID',  f'C{i}Type', f'C{1}Date', 'PolDate  ']
    CLM[i] = CLM[i][use_cols]
    
    print("Claim:"   f'{i}'  " "  str(CLM[i].shape))

Points to note:

  1. The warning "A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead" is also there in this code too. Because CLM[i]['diff'] is different from CLM[i].loc['diff']. See here: https://stackoverflow.com/a/53954986/5772008

  2. CLM[i]['PolDate'] is a "list" of strings, so you don't subtract a string from a string, but you can subtract a pandas datetime object from another. So, converted them first to a datetime object, then subtracted.

Same is with your extra question where you compare a list with a value, see this https://stackoverflow.com/a/53830333/5772008 In short, you most probably want this: "if (CLM[1]['diff'].all() > 1095)", so it compares every value in the series, not the whole series with one value.

  • Related