I have the following panel dataset. "winner" =1 if in period (date), someone is a winner, zero if loser.
ID date winner
A 2017Q4 NaN
A 2018Q4 1
A 2019Q4 0
A 2020Q4 0
A 2021Q4 1
B 2017Q4 NaN
B 2018Q4 1
B 2019Q4 1
B 2020Q4 0
B 2021Q4 0
C 2017Q4 NaN
C 2018Q4 0
C 2019Q4 0
C 2020Q4 0
C 2021Q4 0
D 2017Q4 NaN
D 2018Q4 0
D 2019Q4 1
D 2020Q4 1
D 2021Q4 1
I want to create four dummy variables, WW =1 if someone is winner in two consecutive periods. LL=1 if loser in two consecutive periods. WL if winner in period 1 and loser the next period, and LW vice versa.
UPDATE
when i apply the answers below i get the following
ID date winner WW LL WL LW
A 2017Q4 NaN
A 2018Q4 1 0 0 0 0
A 2019Q4 0 0 0 1 0
A 2020Q4 0 0 1 0 0
A 2021Q4 1 0 0 0 1
B 2017Q4 NaN
B 2018Q4 1 0 0 0 0
B 2019Q4 1 1 0 0 0
B 2020Q4 0 0 0 1 0
B 2021Q4 0 0 1 0 0
C 2017Q4 NaN
C 2018Q4 0 0 0 0 0
C 2019Q4 0 0 1 0 0
C 2020Q4 0 0 1 0 0
C 2021Q4 0 0 1 0 0
D 2017Q4 NaN
D 2018Q4 0 0 0 0 0
D 2019Q4 1 0 0 0 1
D 2020Q4 1 1 0 0 0
D 2021Q4 1 1 0 0 0
How do i make sure I get the NaN when the previous value is NaN? desired output
ID date winner WW LL WL LW
A 2017Q4 NaN
A 2018Q4 1 NaN NaN NaN NaN
A 2019Q4 0 0 0 1 0
A 2020Q4 0 0 1 0 0
A 2021Q4 1 0 0 0 1
B 2017Q4 NaN
B 2018Q4 1 NaN NaN NaN NaN
B 2019Q4 1 1 0 0 0
B 2020Q4 0 0 0 1 0
B 2021Q4 0 0 1 0 0
C 2017Q4 NaN
C 2018Q4 0 NaN NaN NaN NaN
C 2019Q4 0 0 1 0 0
C 2020Q4 0 0 1 0 0
C 2021Q4 0 0 1 0 0
D 2017Q4 NaN
D 2018Q4 0 NaN NaN NaN NaN
D 2019Q4 1 0 0 0 1
D 2020Q4 1 1 0 0 0
D 2021Q4 1 1 0 0 0
How to do this in the most simple way?
CodePudding user response:
Here's one way: Use groupby.shift
to get the previous record; then use numpy.select
to assign values, which you use get_dummies
to convert to dummy variables:
import numpy as np
df['previous'] = df.groupby('ID')['winner'].shift()
tmp = df[['previous','winner']]
dummy_vars = ['WW','LL','WL', 'LW']
out = (df.join(pd.get_dummies(np.select([tmp.eq(1).all(1),
tmp.eq(0).all(1),
tmp.eq([1,0]).all(1),
tmp.eq([0,1]).all(1)],
dummy_vars, ''))[dummy_vars ['']]
.mask(df['previous'].isna(), ''))
.drop(columns=['previous','']))
Output:
ID date winner WW LL WL LW
0 A 2018Q4 1
1 A 2019Q4 0 0 0 1 0
2 A 2020Q4 0 0 1 0 0
3 A 2021Q4 1 0 0 0 1
4 B 2018Q4 1
5 B 2019Q4 1 1 0 0 0
6 B 2020Q4 0 0 0 1 0
7 B 2021Q4 0 0 1 0 0
8 C 2018Q4 0
9 C 2019Q4 0 0 1 0 0
10 C 2020Q4 0 0 1 0 0
11 C 2021Q4 0 0 1 0 0
12 D 2018Q4 0
13 D 2019Q4 1 0 0 0 1
14 D 2020Q4 1 1 0 0 0
15 D 2021Q4 1 1 0 0 0
CodePudding user response:
map
1 and 0 to "W" and "L"- get the 2-period streak
get_dummies
for the "streak"join
to original DataFrame ignoring the first row of each ID
wins = df["winner"].fillna(0).map({1:"W",0:"L"})
streaks = wins.shift() wins
other = pd.get_dummies(streaks.where(df["ID"].eq(df["ID"].shift())))
output = df.join(other.where(df["ID"].duplicated()&df["winner"].shift().notna()))
>>> output
ID date winner LL LW WL WW
0 A 2017Q4 NaN NaN NaN NaN NaN
1 A 2018Q4 1.0 NaN NaN NaN NaN
2 A 2019Q4 0.0 0.0 0.0 1.0 0.0
3 A 2020Q4 0.0 1.0 0.0 0.0 0.0
4 A 2021Q4 1.0 0.0 1.0 0.0 0.0
5 B 2017Q4 NaN NaN NaN NaN NaN
6 B 2018Q4 1.0 NaN NaN NaN NaN
7 B 2019Q4 1.0 0.0 0.0 0.0 1.0
8 B 2020Q4 0.0 0.0 0.0 1.0 0.0
9 B 2021Q4 0.0 1.0 0.0 0.0 0.0
10 C 2017Q4 NaN NaN NaN NaN NaN
11 C 2018Q4 0.0 NaN NaN NaN NaN
12 C 2019Q4 0.0 1.0 0.0 0.0 0.0
13 C 2020Q4 0.0 1.0 0.0 0.0 0.0
14 C 2021Q4 0.0 1.0 0.0 0.0 0.0
15 D 2017Q4 NaN NaN NaN NaN NaN
16 D 2018Q4 0.0 NaN NaN NaN NaN
17 D 2019Q4 1.0 0.0 1.0 0.0 0.0
18 D 2020Q4 1.0 0.0 0.0 0.0 1.0
19 D 2021Q4 1.0 0.0 0.0 0.0 1.0