I have a dataframe with time series data, as follows:
time user location reference
2/15/2022 22:09 u1 1 no
2/15/2022 23:09 u1 2 yes
2/16/2022 06:09 u1 2 yes
2/16/2022 07:09 u1 1 no
2/16/2022 22:08 u2 1 no
2/16/2022 23:08 u2 4 yes
2/16/2022 23:58 u2 2 no
And what i want to do is create 2 new columns to extract the location visited before and after visiting the reference location, for each user, for all days Output:
time user location reference before_reference after_reference
2/15/2022 22:09 u1 1 no yes no
2/15/2022 23:09 u1 2 yes no no
2/16/2022 06:09 u1 2 yes no no
2/16/2022 07:09 u1 1 no no yes
2/16/2022 22:08 u2 1 no yes no
2/16/2022 23:08 u2 4 yes no no
2/16/2022 23:58 u2 2 no no yes
Note that the users may be mixed and not followed!
CodePudding user response:
IIUC, you could perform a groupby
shift
:
# first ensure values are sorted by time
df['time'] = pd.to_datetime(df['time'])
df = df.sort_values(by=['user', 'time'])
# use booleans
s = df['reference'].eq('yes')
# group by user
g = s.groupby(df['user'])
# compute the shifts and exclude reference == yes, then remap to yes/no
d = {True: 'yes', False: 'no'}
df['before_reference'] = (~s & g.shift(-1, fill_value=False)).map(d)
df['after_reference'] = (~s & g.shift(fill_value=False)).map(d)
output:
time user location reference before_reference after_reference
0 2/15/2022 22:09 u1 1 no yes no
1 2/15/2022 23:09 u1 2 yes no no
2 2/16/2022 06:09 u1 2 yes no no
3 2/16/2022 07:09 u1 1 no no yes
4 2/16/2022 22:08 u2 1 no yes no
5 2/16/2022 23:08 u2 4 yes no no
6 2/16/2022 23:58 u2 2 no no yes