How to substitute with 0, the first of 2 consecutive values in a pandas column with groupby-CodePudding

I have the following pandas dataframe

import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
                    'col_a': [0,1,1,0,1,1,1,0,1,1]})

I would like to create a new column (col_a_new) which will be the same as col_a but substitute with 0 the 1st out of the 2 consecutive 1s in col_a, by id

The resulting dataframe looks like this:

foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
                        'col_a': [0,1,1,0,1,1,1,0,1,1],
             'col_a_new': [0,0,1,0,1,0,1,0,0,1]})

Any ideas ?

CodePudding user response：

# group by id and non-consecutive clusters of 0/1 in col_a
group = foo.groupby(["id", foo["col_a"].ne(foo["col_a"].shift()).cumsum()])

# get cumcount and count of groups
foo_cumcount = group.cumcount()
foo_count = group.col_a.transform(len)

# set to zero all first ones of groups with two ones, otherwise use original value
foo["col_a_new"] = np.where(foo_cumcount.eq(0) 
                            & foo_count.gt(1) 
                            & foo.col_a.eq(1), 
                            0, foo.col_a)
# result
   id  col_a  col_a_new
0   1      0          0
1   1      1          0
2   1      1          1
3   1      0          0
4   1      1          1
5   2      1          0
6   2      1          1
7   2      0          0
8   2      1          0
9   2      1          1

CodePudding user response：

Other approach: Just group by id and define new values using appropriate conditions.

(foo.groupby("id").col_a
    .transform(lambda series: [0 if i < len(series) - 1 
                               and series.tolist()[i 1] == 1 
                               else x for i, x in enumerate(series.tolist())]))