Let's say, I want to identify traders who only traded during bull runs but did not trade (zero values) during downturns or stable periods. Let's say we have two bull runs, 2018Q4
, 2021Q4
. Below, D
starts trading only from 2021Q4
(the second bull run period) but I want to include this as =1 well.
This is my df.
id date value other variables..
A 2019Q4 2
A 2020Q4 2
A 2021Q4 3
B 2018Q4 2
B 2019Q4 0
B 2020Q4 0
B 2021Q4 4
C 2020Q4 3
C 2021Q4 4
D 2021Q4 4
E 2018Q4 3
E 2019Q4 0
E 2020Q4 0
E 2021Q4 2
. .
desired output would be
id dummy
A 0
B 1
C 0
D 1
E 1
. .
CodePudding user response:
You can test if both values not equal 0
and test both quarters, compare (thanks mozway for improvement) and last aggregate GroupBy.all
for test if all True
s per groups:
m1 = df['value'].ne(0)
m2 = df['date'].isin(['2018Q4','2021Q4'])
df1 = (m1 == m2).groupby(df['id']).all().astype(int).reset_index(name='dummy')
print (df1)
id dummy
0 A 0
1 B 1
2 C 0
3 D 1
4 E 1