Find the last index with specific value for each group and complete all the above values in group wi-CodePudding

I have the following data frame:

Group  value_1  value_2
     1      1         1
     1      1         1
     1      1         1
     1      0         1
     1      0         1
     1      1         1
     2      0         0
     2      0         0
     2      0         0
     2      0         0
     2      1         0
     3      1         1
     3      1         1
     3      0         1
     3      0         1
     3      0         1
     3      1         1
     3      1         1
     3      1         1
     3      1         1
     3      1         1
     4      1         1
     4      1         1
     4      1         1
     4      1         1
     5      1         1
     6      0         0

Currently, the value in the value_2 column is the most frequent value of each group in column value_1.

My desired output is:

Group  value_1  value_2
     1      1         0
     1      1         0
     1      1         0
     1      0         0
     1      0         0
     1      1         1
     2      0         0
     2      0         0
     2      0         0
     2      0         0
     2      1         0
     3      1         0
     3      1         0
     3      0         0
     3      0         0
     3      0         0
     3      1         1
     3      1         1
     3      1         1
     3      1         1
     3      1         1
     4      1         1
     4      1         1
     4      1         1
     4      1         1
     5      1         1
     6      0         0

If a group includes a value of 0 in the value_1 column, change the values in the indices between the last value of 0 to the first value in the group in the value_2 column to be 0.

CodePudding user response：

zeros_propagated = df["value_1"].replace(1, np.nan).groupby(df["Group"]).bfill()
mode_per_group   = df.groupby("Group")["value_1"].transform(lambda gr: 
                                                            gr.mode().iat[0])

df["value_2"] = zeros_propagated.fillna(mode_per_group, downcast="int")

backpropagate the 0 values per group; replacing 1 with NaN so that they will become 0
- values after last 0, if any, will be NaN...
...which we fill by combining it with the modes per group

to get

>>> df

    Group  value_1  value_2
0       1        1        0
1       1        1        0
2       1        1        0
3       1        0        0
4       1        0        0
5       1        1        1
6       2        0        0
7       2        0        0
8       2        0        0
9       2        0        0
10      2        1        0
11      3        1        0
12      3        1        0
13      3        0        0
14      3        0        0
15      3        0        0
16      3        1        1
17      3        1        1
18      3        1        1
19      3        1        1
20      3        1        1
21      4        1        1
22      4        1        1
23      4        1        1
24      4        1        1
25      5        1        1
26      6        0        0

CodePudding user response：

I guess the answer of @Mustafa looks more clean but here is my attempt. I wrote a little helper function which checks for the most frequent values, if there is a 0, search the last index, and return a list of values for each group conditionally.

def func(grp):
    most_common_val = grp.value_counts().idxmax()
    
    if 0 in grp.values and not most_common_val==0:
        last_zero = grp[grp==0].last_valid_index()
        return [most_common_val if idx>last_zero else 0 for idx, val in zip(grp.index, grp)]
    else:
        return [most_common_val for _ in range(len(grp))]

df['desired_value_2'] = df[['Group', 'value_1']].groupby('Group')['value_1'].transform(func)

Output:

    Group  value_1  desired_value_2
0       1        1                0
1       1        1                0
2       1        1                0
3       1        0                0
4       1        0                0
5       1        1                1
6       2        0                0
7       2        0                0
8       2        0                0
9       2        0                0
10      2        1                0
11      3        1                0
12      3        1                0
13      3        0                0
14      3        0                0
15      3        0                0
16      3        1                1
17      3        1                1
18      3        1                1
19      3        1                1
20      3        1                1
21      4        1                1
22      4        1                1
23      4        1                1
24      4        1                1
25      5        1                1
26      6        0                0