I have panel data and an event date, i want to create an additional column "t" where if the date and the event date are equal, t=0, and any periods before is -1, -2, ... to the very first period and 1, 2, ... to the past period per unit
my df
id date eventdate
A 2020Q1 2020Q3
A 2020Q2 2020Q3
A 2020Q3 2020Q3
A 2020Q4 2020Q3
B 2019Q1 2019Q2
B 2019Q2 2019Q2
B 2019Q3 2019Q2
B 2019Q4 2019Q2
...
desired df
id date eventdate t
A 2020Q1 2020Q3 -2
A 2020Q2 2020Q3 -1
A 2020Q3 2020Q3 0
A 2020Q4 2020Q3 1
B 2019Q1 2019Q2 -1
B 2019Q2 2019Q2 0
B 2019Q3 2019Q2 1
B 2019Q4 2019Q2 2
...
thanks.
CodePudding user response:
I think the correct way would be to parse them as datetime objects and calculate their difference. It feels like the more natural and intended way when working with DateTime objects:
df['date'] = pd.to_datetime(df['date'],infer_datetime_format=True).dt.to_period('Q')
df['eventdate'] = pd.to_datetime(df['eventdate'],infer_datetime_format=True).dt.to_period('Q')
Calculating the difference would be simply getting the difference between one date and the other.
df['diff'] = (df['date'] - df['eventdate'])
However given the format is QuarterEnd
, then you should do one final fix to extract the correct format:
df['t'] = (df['date'] - df['eventdate']).map(lambda x: x.n)
Returning:
id date eventdate t
0 A 2020Q1 2020Q3 -2
1 A 2020Q2 2020Q3 -1
2 A 2020Q3 2020Q3 0
3 A 2020Q4 2020Q3 1
4 B 2019Q1 2019Q2 -1
5 B 2019Q2 2019Q2 0
6 B 2019Q3 2019Q2 1
7 B 2019Q4 2019Q2 2
CodePudding user response:
Convert values to quarter periods by using PeriodIndex
and then subtract them. Finally, you will need to convert MonthEnd
to integer:
df['diff'] = (pd.PeriodIndex(df['date'], freq='Q') -
pd.PeriodIndex(df['eventdate'], freq='Q')).map(lambda x: x.n)
print (df)
id date eventdate diff
0 A 2020Q1 2020Q3 -2
1 A 2020Q2 2020Q3 -1
2 A 2020Q3 2020Q3 0
3 A 2020Q4 2020Q3 1
4 B 2019Q1 2019Q2 -1
5 B 2019Q2 2019Q2 0
6 B 2019Q3 2019Q2 1
7 B 2019Q4 2019Q2 2
Similar solution can be achieved by previously converting the quarter periods to integers:
df['diff'] = (pd.PeriodIndex(df['date'], freq='Q').astype('int') -
pd.PeriodIndex(df['eventdate'], freq='Q').astype('int'))
print (df)
id date eventdate diff
0 A 2020Q1 2020Q3 -2
1 A 2020Q2 2020Q3 -1
2 A 2020Q3 2020Q3 0
3 A 2020Q4 2020Q3 1
4 B 2019Q1 2019Q2 -1
5 B 2019Q2 2019Q2 0
6 B 2019Q3 2019Q2 1
7 B 2019Q4 2019Q2 2