Convert first value from 0 to 1 based on group and mask columns in a Pandas Dataframe-CodePudding

I am trying to convert the first occurrence of 0 to 1 in a column in a Pandas Dataframe. The Column in question contains 1, 0 and null values. The sample data is as follows:

mask_col	categorical_col	target_col
TRUE	A	1
TRUE	A	1
FALSE	A
TRUE	A	0
FALSE	A
TRUE	A	0
TRUE	B	1
FALSE	B
FALSE	B
FALSE	B
TRUE	B	0
FALSE	B

I want row 4 and 11 to change to 1 and keep row 6 as 0.

How do I do this?

CodePudding user response：

For set first 0 per groups by categorical_col use DataFrameGroupBy.idxmax with compare by 0 for set 1:

df.loc[df['target_col'].eq(0).groupby(df['categorical_col']).idxmax(), 'target_col'] = 1
print (df)
    mask_col categorical_col  target_col
0       True               A         1.0
1       True               A         1.0
2      False               A         NaN
3       True               A         1.0
4      False               A         NaN
5       True               A         0.0
6       True               B         1.0
7      False               B         NaN
8      False               B         NaN
9      False               B         NaN
10      True               B         1.0
11     False               B         NaN

CodePudding user response：

The logic is not fully clear, so here are two options:

option 1

Considering the stretches of True per group of categorical_col and assuming you want the first N stretches (here N=2) as 1, you can use a custom groupby.apply:

vals = (df.groupby('categorical_col', group_keys=False)['mask_col']
          .apply(lambda s: s.ne(s.shift())[s].cumsum())
       )

df.loc[vals[vals.le(2)].index, 'target_col'] = 1

option 2

If you literally want to match only the first 0 per group and replace it with 1, you can slice only the 0s and get the first value's index with groupby.idxmax:

df.loc[df['target_col'].eq(0).groupby(df['categorical_col']).idxmax(), 'target_col'] = 1

# variant with idxmin
idx = df[df['target_col'].eq(0)].groupby(df['categorical_col'])['mask_col'].idxmin()
df.loc[idx, 'target_col'] = 1

Output:

    mask_col categorical_col  target_col
0       True               A         1.0
1       True               A         1.0
2      False               A         NaN
3       True               A         1.0
4      False               A         NaN
5       True               A         0.0
6       True               B         1.0
7      False               B         NaN
8      False               B         NaN
9      False               B         NaN
10      True               B         1.0
11     False               B         NaN

CodePudding user response：

You can update the first zero occurrence for each category with the following loop:

for category in df['categorical_col'].unique():
    index = df[(df['categorical_col'] == category) &
               (df['target_col'] == 0)].index[0]
    df.loc[index, 'target_col'] = 1