Home > Back-end >  Compare timestamps from a column and put 1 to another column for the latest timestamp in pandas
Compare timestamps from a column and put 1 to another column for the latest timestamp in pandas

Time:12-11

Lets say i have a dataframe like this:

     col1              col2
0    data1       2021-03-21 19:58:09
1    data1       2021-03-22 19:58:09
2    data1       2021-03-23 19:58:09
3    data2       2021-03-20 19:58:09
4    data2       2021-03-21 19:58:09
5    data2       2021-03-22 19:58:09
6    data3       2021-03-19 19:58:09
7    data3       2021-03-20 19:58:09
8    data1       2021-03-24 19:58:09

How can i compare for example all the timestamps in col2 from data1 in col1 and create another column where i put 1 for the latest timestamp and 0 for the others:

     col1              col2                latest
0    data1       2021-03-21 19:58:09          0
1    data1       2021-03-22 19:58:09          0
2    data1       2021-03-23 19:58:09          0
3    data2       2021-03-20 19:58:09          0
4    data2       2021-03-21 19:58:09          0
5    data2       2021-03-22 19:58:09          1
6    data3       2021-03-19 19:58:09          0
7    data3       2021-03-20 19:58:09          1
8    data1       2021-03-24 19:58:09          1

CodePudding user response:

Do a groupby transform, and compare the values with the max:

df['col2'] = pd.to_datetime(df['col2'])
latest = df.col2 == df.groupby('col1', sort = False).col2.transform('max')
df.assign(latest = latest.astype(int))
 
    col1                col2  latest
0  data1 2021-03-21 19:58:09       0
1  data1 2021-03-22 19:58:09       0
2  data1 2021-03-23 19:58:09       0
3  data2 2021-03-20 19:58:09       0
4  data2 2021-03-21 19:58:09       0
5  data2 2021-03-22 19:58:09       1
6  data3 2021-03-19 19:58:09       0
7  data3 2021-03-20 19:58:09       1
8  data1 2021-03-24 19:58:09       1

CodePudding user response:

This should work

data.loc[data.groupby("col1")["col2"].tail(1).index, "latest"] = 1
data.fillna(0, inplace = True)

Or in one line, inspired by the other answer.

df.assign(latest=lambda d: (d["col2"] == d.groupby('col1')["col2"].transform('max')).astype(int))
  • Related