I have a data frame that has one column which might have more than one date values. I only want to keep max value among the list. FYI: row 5 has 2 dates in it and I only want one date(no time)
Data:
date | calls | pres | pat
2021-08-17 13:20:23 | 1 | 1 | 1
2021-08-13 13:47:16 | 0 | 1 | 1
2021-08-16 11:14:08 | 1 | 1 | 1
2021-08-10 20:35:33 | 1 | 1 | 1
2021-08-31 14:32:13 | 2021-08-31 14:30:40 | 0 | 2 | 1
I am trying this:
y = df_pvt['date']
for i in range(len(y)):
try:
pick_date_i = y[i] = pd.to_datetime(y[i])
pick_date_i = pick_date_i.max()
except KeyError as e:
pass
pick_date_i = pick_date_i.max()
But its not working, how can I correct it? I am expecting only one max date in one row.
CodePudding user response:
Idea is split values by |
, remove possible spaces and convert to datetetimes, last use max
:
print (df_pvt)
date calls pres pat
0 2021-08-17 13:20:23 1 1 1
1 2021-08-13 13:47:16 0 1 1
2 2021-08-16 11:14:08 1 1 1
3 2021-08-10 20:35:33 1 1 1
4 2021-08-31 14:32:13 | 2021-08-31 14:30:40 0 2 1
f = lambda x: pd.to_datetime([y.strip() for y in x.split('|')]).max()
df_pvt['date'] = df_pvt['date'].apply(f)
print (df_pvt)
date calls pres pat
0 2021-08-17 13:20:23 1 1 1
1 2021-08-13 13:47:16 0 1 1
2 2021-08-16 11:14:08 1 1 1
3 2021-08-10 20:35:33 1 1 1
4 2021-08-31 14:32:13 0 2 1