I am trying to convert the first occurrence of 0 to 1 in a column in a Pandas Dataframe. The Column in question contains 1, 0 and null values. The sample data is as follows:
mask_col | categorical_col | target_col |
---|---|---|
TRUE | A | 1 |
TRUE | A | 1 |
FALSE | A | |
TRUE | A | 0 |
FALSE | A | |
TRUE | A | 0 |
TRUE | B | 1 |
FALSE | B | |
FALSE | B | |
FALSE | B | |
TRUE | B | 0 |
FALSE | B |
I want row 4 and 11 to change to 1 and keep row 6 as 0.
How do I do this?
CodePudding user response:
For set first 0
per groups by categorical_col
use DataFrameGroupBy.idxmax
with compare by 0
for set 1
:
df.loc[df['target_col'].eq(0).groupby(df['categorical_col']).idxmax(), 'target_col'] = 1
print (df)
mask_col categorical_col target_col
0 True A 1.0
1 True A 1.0
2 False A NaN
3 True A 1.0
4 False A NaN
5 True A 0.0
6 True B 1.0
7 False B NaN
8 False B NaN
9 False B NaN
10 True B 1.0
11 False B NaN
CodePudding user response:
The logic is not fully clear, so here are two options:
option 1
Considering the stretches of True
per group of categorical_col
and assuming you want the first N
stretches (here N=2) as 1
, you can use a custom groupby.apply
:
vals = (df.groupby('categorical_col', group_keys=False)['mask_col']
.apply(lambda s: s.ne(s.shift())[s].cumsum())
)
df.loc[vals[vals.le(2)].index, 'target_col'] = 1
option 2
If you literally want to match only the first 0
per group and replace it with 1
, you can slice only the 0s and get the first value's index with groupby.idxmax
:
df.loc[df['target_col'].eq(0).groupby(df['categorical_col']).idxmax(), 'target_col'] = 1
# variant with idxmin
idx = df[df['target_col'].eq(0)].groupby(df['categorical_col'])['mask_col'].idxmin()
df.loc[idx, 'target_col'] = 1
Output:
mask_col categorical_col target_col
0 True A 1.0
1 True A 1.0
2 False A NaN
3 True A 1.0
4 False A NaN
5 True A 0.0
6 True B 1.0
7 False B NaN
8 False B NaN
9 False B NaN
10 True B 1.0
11 False B NaN
CodePudding user response:
You can update the first zero occurrence for each category with the following loop:
for category in df['categorical_col'].unique():
index = df[(df['categorical_col'] == category) &
(df['target_col'] == 0)].index[0]
df.loc[index, 'target_col'] = 1