Lets say i have a dataframe like this:
col1 col2
0 data1 2021-03-21 19:58:09
1 data1 2021-03-22 19:58:09
2 data1 2021-03-23 19:58:09
3 data2 2021-03-20 19:58:09
4 data2 2021-03-21 19:58:09
5 data2 2021-03-22 19:58:09
6 data3 2021-03-19 19:58:09
7 data3 2021-03-20 19:58:09
8 data1 2021-03-24 19:58:09
How can i compare for example all the timestamps in col2 from data1 in col1 and create another column where i put 1 for the latest timestamp and 0 for the others:
col1 col2 latest
0 data1 2021-03-21 19:58:09 0
1 data1 2021-03-22 19:58:09 0
2 data1 2021-03-23 19:58:09 0
3 data2 2021-03-20 19:58:09 0
4 data2 2021-03-21 19:58:09 0
5 data2 2021-03-22 19:58:09 1
6 data3 2021-03-19 19:58:09 0
7 data3 2021-03-20 19:58:09 1
8 data1 2021-03-24 19:58:09 1
CodePudding user response:
Do a groupby
transform
, and compare the values with the max:
df['col2'] = pd.to_datetime(df['col2'])
latest = df.col2 == df.groupby('col1', sort = False).col2.transform('max')
df.assign(latest = latest.astype(int))
col1 col2 latest
0 data1 2021-03-21 19:58:09 0
1 data1 2021-03-22 19:58:09 0
2 data1 2021-03-23 19:58:09 0
3 data2 2021-03-20 19:58:09 0
4 data2 2021-03-21 19:58:09 0
5 data2 2021-03-22 19:58:09 1
6 data3 2021-03-19 19:58:09 0
7 data3 2021-03-20 19:58:09 1
8 data1 2021-03-24 19:58:09 1
CodePudding user response:
This should work
data.loc[data.groupby("col1")["col2"].tail(1).index, "latest"] = 1
data.fillna(0, inplace = True)
Or in one line, inspired by the other answer.
df.assign(latest=lambda d: (d["col2"] == d.groupby('col1')["col2"].transform('max')).astype(int))