I have a pandas dataframe that looks like this:
import pandas as pd
d = {'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10']}
df = pd.DataFrame(data=d)
df
date
0 2021-01-01
1 2021-01-02
2 2021-01-03
3 2021-01-04
4 2021-01-05
5 2021-01-06
6 2021-01-07
7 2021-01-08
8 2021-01-09
9 2021-01-10
I would to add a new column time
to this df to indicate when an event starts with time = 0
. The rows after the start date should increment while the rows before the start date should decrement. For example, if an event starts on '2021-01-05', my desired output is this:
date time
0 2021-01-01 -5
1 2021-01-02 -4
2 2021-01-03 -3
3 2021-01-04 -2
4 2021-01-05 -1
5 2021-01-06 0
6 2021-01-07 1
7 2021-01-08 2
8 2021-01-09 3
9 2021-01-10 4
I know I can set time = 0 for 2021-01-06
with the code below, but I struggle how to increment and decrement the numbers from there. Any ideas?
df.loc[df['date'] == '2021-01-06', 'time'] = 0
CodePudding user response:
Doing
df['out'] = (pd.to_datetime(df.date) - pd.to_datetime('2021-01-06')).dt.days
Out[20]:
0 -5
1 -4
2 -3
3 -2
4 -1
5 0
6 1
7 2
8 3
9 4
Name: date, dtype: int64
CodePudding user response:
You can use cumcount() to get the expected output posted
import pandas as pd
d = {'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10']}
df = pd.DataFrame(data=d)
df['Control'] = 1
date_lookback_location = df.loc[df['date'] == '2021-01-06'].index.tolist()[0]
df['time'] = df.sort_values(['date'], ascending=True).groupby(['Control']).cumcount() -date_lookback_location
df[['date', 'time']]
CodePudding user response:
If your date
column doesn't duplicate, you can try
df['time'] = range(len(df))
df['time'] = df['time'] - df.set_index('date').loc['2021-01-06', 'time']
print(df)
date time
0 2021-01-01 -5
1 2021-01-02 -4
2 2021-01-03 -3
3 2021-01-04 -2
4 2021-01-05 -1
5 2021-01-06 0
6 2021-01-07 1
7 2021-01-08 2
8 2021-01-09 3
9 2021-01-10 4
Or
df['time'] = df.index.values - df['date'].tolist().index('2021-01-06')