I have a dataframe with 3 columns (id, date1,date2)
data = [['C', '05/06/2021','07/09/2021'],['A', '15/04/2021','08/09/2021'],['A','15/10/2021','09/12/2021'],['C', '03/07/2021','10/09/2021'],['C', '13/07/2021','11/09/2021'],['C', '25/10/2021','12/12/2021'],['C', '26/09/2021','07/12/2021'],['C', '10/08/2021','07/12/2021'],['C', '28/07/2021','13/12/2021'],['A', '15/05/2021','13/12/2021'], ['C', '13/06/2021','13/12/2021'],['A', '17/05/2021','13/12/2021'],['C', '27/06/2021','13/12/2021'], ['B', '18/06/2021','13/12/2021']]
df_test = pd.DataFrame(data, columns = ['id', 'date1', 'date2'])
df_test['date1'] = pd.to_datetime(df_test['date1'],dayfirst=True)
df_test['date2'] = pd.to_datetime(df_test['date2'],dayfirst=True)
I want to calculate difference between date2-date1 and calculate means based on days filter >=100 I have 2 methods the first one works but not the second doesn't... How could i fix it ?
The first one works
df_final=(df_test
.sort_values(by='id')
.assign(diffe=df_test['date2']- df_test['date1']
)
)
and
test=df_final.loc[df_final['diffe']>=pd.Timedelta(100, 'D')]
test['diffe'].mean()
Second method
df_final=(df_test
.sort_values(by='id') # Classe les numeros sont dans l'ordre
.assign(diffe=df_test['date2']- df_test['date1']
)
.loc[df_reservation_delay['diffe']>=pd.Timedelta(100, 'D')]
.mean()
)
I have an error (KeyError: 'diffe') ? Do you have an idea ?
Have a nice day
CodePudding user response:
You need to use a callable as your column does not exist yet.
Also, better explicitly provide the column name in loc
to avoid a FutureWarning
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
df_final=(df_test
.sort_values(by='id') # Classe les numeros sont dans l'ordre
.assign(diffe=df_test['date2']- df_test['date1']
)
.loc[lambda d: d['diffe']>=pd.Timedelta(100, 'D'), 'diffe']
.mean()
)
output: Timedelta('169 days 09:00:00')