Home > Software design >  Add a new column having a dummy variable for complete group based on a condition
Add a new column having a dummy variable for complete group based on a condition

Time:05-22

I have the following dataframe:

df = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C'], 
            "date": [2015, 2016,  2017, 2018,
                     2015, 2016,  2017, 2018, 
                     2015, 2016,  2017, 2018],
            "col_1": [1,1,1,0,1,0,1,1,0,1,1,1]})

I want to add a new col that should be a dummy 1 for a group if "col_1" has three consecutive 1s. The resulting data-frame should be:

df_new = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C'  ], 
            "date": [2015, 2016,  2017, 2018,
                     2015, 2016,  2017, 2018, 
                     2015, 2016,  2017, 2018],                
            "col_1": [1,1,1,0,1,0,1,1,0,1,1,1],
            "col_2": [1,1,1,1,0,0,0,0,1,1,1,1]})

CodePudding user response:

Try this

df = pd.DataFrame({"id": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C','C','C','C'], 
                   "date": [2015, 2016,  2017, 2018, 2015, 2016,  2017, 2018, 2015, 2016,  2017, 2018],
                   "col_1": [1,1,1,0,1,0,1,1,0,1,1,1]})

# identify 3 consecutive 1s
v = df.groupby('id')['col_1'].rolling(3).sum().eq(3)
# identify which ids contain consecutive 1s
ids = next(zip(*v[v].index))
# assign new column
df['col_2'] = df['id'].isin(ids).astype(int)
print(df)
   id  date  col_1  col_2
0   A  2015      1      1
1   A  2016      1      1
2   A  2017      1      1
3   A  2018      0      1
4   B  2015      1      0
5   B  2016      0      0
6   B  2017      1      0
7   B  2018      1      0
8   C  2015      0      1
9   C  2016      1      1
10  C  2017      1      1
11  C  2018      1      1

CodePudding user response:

You can do it like this also.

df['col_2'] = (df.groupby('id')['col_1']
                 .transform(lambda x: x.rolling(3).sum().eq(3).any())
                 .astype(int))
df

Output:

   id  date  col_1  col_2
0   A  2015      1      1
1   A  2016      1      1
2   A  2017      1      1
3   A  2018      0      1
4   B  2015      1      0
5   B  2016      0      0
6   B  2017      1      0
7   B  2018      1      0
8   C  2015      0      1
9   C  2016      1      1
10  C  2017      1      1
11  C  2018      1      1
  • Related