Home > OS >  set t=0 if the values from two columns are equal, and numerate -t, t
set t=0 if the values from two columns are equal, and numerate -t, t

Time:08-20

I have panel data and an event date, i want to create an additional column "t" where if the date and the event date are equal, t=0, and any periods before is -1, -2, ... to the very first period and 1, 2, ... to the past period per unit

my df

id   date   eventdate
A    2020Q1 2020Q3
A    2020Q2 2020Q3
A    2020Q3 2020Q3
A    2020Q4 2020Q3
B    2019Q1 2019Q2
B    2019Q2 2019Q2
B    2019Q3 2019Q2
B    2019Q4 2019Q2
...

desired df

id   date   eventdate   t
A    2020Q1 2020Q3      -2
A    2020Q2 2020Q3      -1
A    2020Q3 2020Q3       0
A    2020Q4 2020Q3       1
B    2019Q1 2019Q2      -1
B    2019Q2 2019Q2       0
B    2019Q3 2019Q2       1
B    2019Q4 2019Q2       2
...

thanks.

CodePudding user response:

I think the correct way would be to parse them as datetime objects and calculate their difference. It feels like the more natural and intended way when working with DateTime objects:

df['date'] = pd.to_datetime(df['date'],infer_datetime_format=True).dt.to_period('Q')
df['eventdate'] = pd.to_datetime(df['eventdate'],infer_datetime_format=True).dt.to_period('Q')

Calculating the difference would be simply getting the difference between one date and the other.

df['diff'] = (df['date'] - df['eventdate'])

However given the format is QuarterEnd, then you should do one final fix to extract the correct format:

df['t'] = (df['date'] - df['eventdate']).map(lambda x: x.n)

Returning:

  id    date eventdate  t
0  A  2020Q1    2020Q3 -2
1  A  2020Q2    2020Q3 -1
2  A  2020Q3    2020Q3  0
3  A  2020Q4    2020Q3  1
4  B  2019Q1    2019Q2 -1
5  B  2019Q2    2019Q2  0
6  B  2019Q3    2019Q2  1
7  B  2019Q4    2019Q2  2

CodePudding user response:

Convert values to quarter periods by using PeriodIndex and then subtract them. Finally, you will need to convert MonthEnd to integer:

df['diff'] = (pd.PeriodIndex(df['date'], freq='Q') - 
              pd.PeriodIndex(df['eventdate'], freq='Q')).map(lambda x: x.n)
print (df)
  id    date eventdate  diff
0  A  2020Q1    2020Q3    -2
1  A  2020Q2    2020Q3    -1
2  A  2020Q3    2020Q3     0
3  A  2020Q4    2020Q3     1
4  B  2019Q1    2019Q2    -1
5  B  2019Q2    2019Q2     0
6  B  2019Q3    2019Q2     1
7  B  2019Q4    2019Q2     2

Similar solution can be achieved by previously converting the quarter periods to integers:

df['diff'] = (pd.PeriodIndex(df['date'], freq='Q').astype('int') - 
              pd.PeriodIndex(df['eventdate'], freq='Q').astype('int'))
print (df)
  id    date eventdate  diff
0  A  2020Q1    2020Q3    -2
1  A  2020Q2    2020Q3    -1
2  A  2020Q3    2020Q3     0
3  A  2020Q4    2020Q3     1
4  B  2019Q1    2019Q2    -1
5  B  2019Q2    2019Q2     0
6  B  2019Q3    2019Q2     1
7  B  2019Q4    2019Q2     2
  • Related