I have the following pandas dataframe
import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
'col_a': [0,1,1,0,1,1,1,0,1,1]})
I would like to create a new column (col_a_new
) which will be the same as col_a
but substitute with 0
the 1st out of the 2 consecutive 1
s in col_a
, by id
The resulting dataframe looks like this:
foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
'col_a': [0,1,1,0,1,1,1,0,1,1],
'col_a_new': [0,0,1,0,1,0,1,0,0,1]})
Any ideas ?
CodePudding user response:
# group by id and non-consecutive clusters of 0/1 in col_a
group = foo.groupby(["id", foo["col_a"].ne(foo["col_a"].shift()).cumsum()])
# get cumcount and count of groups
foo_cumcount = group.cumcount()
foo_count = group.col_a.transform(len)
# set to zero all first ones of groups with two ones, otherwise use original value
foo["col_a_new"] = np.where(foo_cumcount.eq(0)
& foo_count.gt(1)
& foo.col_a.eq(1),
0, foo.col_a)
# result
id col_a col_a_new
0 1 0 0
1 1 1 0
2 1 1 1
3 1 0 0
4 1 1 1
5 2 1 0
6 2 1 1
7 2 0 0
8 2 1 0
9 2 1 1
CodePudding user response:
Other approach: Just group by id and define new values using appropriate conditions.
(foo.groupby("id").col_a
.transform(lambda series: [0 if i < len(series) - 1
and series.tolist()[i 1] == 1
else x for i, x in enumerate(series.tolist())]))