I have the following data frame:
Group value_1 value_2
1 1 1
1 1 1
1 1 1
1 0 1
1 0 1
1 1 1
2 0 0
2 0 0
2 0 0
2 0 0
2 1 0
3 1 1
3 1 1
3 0 1
3 0 1
3 0 1
3 1 1
3 1 1
3 1 1
3 1 1
3 1 1
4 1 1
4 1 1
4 1 1
4 1 1
5 1 1
6 0 0
Currently, the value in the value_2 column is the most frequent value of each group in column value_1.
My desired output is:
Group value_1 value_2
1 1 0
1 1 0
1 1 0
1 0 0
1 0 0
1 1 1
2 0 0
2 0 0
2 0 0
2 0 0
2 1 0
3 1 0
3 1 0
3 0 0
3 0 0
3 0 0
3 1 1
3 1 1
3 1 1
3 1 1
3 1 1
4 1 1
4 1 1
4 1 1
4 1 1
5 1 1
6 0 0
If a group includes a value of 0 in the value_1 column, change the values in the indices between the last value of 0 to the first value in the group in the value_2 column to be 0.
CodePudding user response:
zeros_propagated = df["value_1"].replace(1, np.nan).groupby(df["Group"]).bfill()
mode_per_group = df.groupby("Group")["value_1"].transform(lambda gr:
gr.mode().iat[0])
df["value_2"] = zeros_propagated.fillna(mode_per_group, downcast="int")
- backpropagate the 0 values per group; replacing 1 with NaN so that they will become 0
- values after last 0, if any, will be NaN...
- ...which we fill by combining it with the modes per group
to get
>>> df
Group value_1 value_2
0 1 1 0
1 1 1 0
2 1 1 0
3 1 0 0
4 1 0 0
5 1 1 1
6 2 0 0
7 2 0 0
8 2 0 0
9 2 0 0
10 2 1 0
11 3 1 0
12 3 1 0
13 3 0 0
14 3 0 0
15 3 0 0
16 3 1 1
17 3 1 1
18 3 1 1
19 3 1 1
20 3 1 1
21 4 1 1
22 4 1 1
23 4 1 1
24 4 1 1
25 5 1 1
26 6 0 0
CodePudding user response:
I guess the answer of @Mustafa looks more clean but here is my attempt. I wrote a little helper function which checks for the most frequent values, if there is a 0, search the last index, and return a list of values for each group conditionally.
def func(grp):
most_common_val = grp.value_counts().idxmax()
if 0 in grp.values and not most_common_val==0:
last_zero = grp[grp==0].last_valid_index()
return [most_common_val if idx>last_zero else 0 for idx, val in zip(grp.index, grp)]
else:
return [most_common_val for _ in range(len(grp))]
df['desired_value_2'] = df[['Group', 'value_1']].groupby('Group')['value_1'].transform(func)
Output:
Group value_1 desired_value_2
0 1 1 0
1 1 1 0
2 1 1 0
3 1 0 0
4 1 0 0
5 1 1 1
6 2 0 0
7 2 0 0
8 2 0 0
9 2 0 0
10 2 1 0
11 3 1 0
12 3 1 0
13 3 0 0
14 3 0 0
15 3 0 0
16 3 1 1
17 3 1 1
18 3 1 1
19 3 1 1
20 3 1 1
21 4 1 1
22 4 1 1
23 4 1 1
24 4 1 1
25 5 1 1
26 6 0 0