Home > database >  Set incrementing value of one column based on value in another column in pandas dataframe
Set incrementing value of one column based on value in another column in pandas dataframe

Time:05-26

I have a pandas dataframe that looks like this:

import pandas as pd
d = {'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10']}
df = pd.DataFrame(data=d)
df

    date
0   2021-01-01
1   2021-01-02
2   2021-01-03
3   2021-01-04
4   2021-01-05
5   2021-01-06
6   2021-01-07
7   2021-01-08
8   2021-01-09
9   2021-01-10

I would to add a new column time to this df to indicate when an event starts with time = 0. The rows after the start date should increment while the rows before the start date should decrement. For example, if an event starts on '2021-01-05', my desired output is this:

    date        time
0   2021-01-01  -5
1   2021-01-02  -4
2   2021-01-03  -3
3   2021-01-04  -2
4   2021-01-05  -1
5   2021-01-06   0
6   2021-01-07   1
7   2021-01-08   2
8   2021-01-09   3
9   2021-01-10   4

I know I can set time = 0 for 2021-01-06 with the code below, but I struggle how to increment and decrement the numbers from there. Any ideas?

df.loc[df['date'] == '2021-01-06', 'time'] = 0

CodePudding user response:

Doing

df['out'] = (pd.to_datetime(df.date) - pd.to_datetime('2021-01-06')).dt.days
Out[20]: 
0   -5
1   -4
2   -3
3   -2
4   -1
5    0
6    1
7    2
8    3
9    4
Name: date, dtype: int64

CodePudding user response:

You can use cumcount() to get the expected output posted

import pandas as pd
d = {'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10']}
df = pd.DataFrame(data=d)
df['Control'] = 1
date_lookback_location = df.loc[df['date'] == '2021-01-06'].index.tolist()[0]
df['time'] = df.sort_values(['date'], ascending=True).groupby(['Control']).cumcount() -date_lookback_location
df[['date', 'time']]

CodePudding user response:

If your date column doesn't duplicate, you can try

df['time'] = range(len(df))
df['time'] = df['time'] - df.set_index('date').loc['2021-01-06', 'time']
print(df)

         date  time
0  2021-01-01    -5
1  2021-01-02    -4
2  2021-01-03    -3
3  2021-01-04    -2
4  2021-01-05    -1
5  2021-01-06     0
6  2021-01-07     1
7  2021-01-08     2
8  2021-01-09     3
9  2021-01-10     4

Or

df['time'] = df.index.values - df['date'].tolist().index('2021-01-06')
  • Related