I have a DataFrame where I am looking to fill in values in a column based on their grouping. I only want to fill in the values (by propagating non-NaN values using ffill
and bfill
) if there is only one unique value in the column to be filled; otherwise, it should be left as is. My code below has a sample dataset where I try to do this, but I get an error.
Code:
df = pd.DataFrame({"A": [1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6, 6],
"B": ['a', 'a', np.nan, 'b', 'b', 'c', np.nan, 'd', np.nan, 'e', 'e', np.nan, 'h', 'h'],
"C": [5.0, np.nan, 4.0, 4.0, np.nan, 9.0, np.nan, np.nan, 9.0, 8.0, np.nan, 2.0, np.nan, np.nan]})
col_to_groupby = "A"
col_to_modify = "B"
group = df.groupby(col_to_groupby)
modified = group[group[col_to_modify].nunique() == 1].transform(lambda x: x.ffill().bfill())
df.update(modified)
Error:
KeyError: 'Columns not found: False, True'
Original dataset:
A B C
0 1 a 5.0
1 1 a NaN
2 2 NaN 4.0
3 2 b 4.0
4 2 b NaN
5 3 c 9.0
6 3 NaN NaN
7 3 d NaN
8 3 NaN 9.0
9 4 e 8.0
10 4 e NaN
11 5 NaN 2.0
12 6 h NaN
13 6 NaN NaN
Desired result:
A B C
0 1 a 5.0
1 1 a NaN
2 2 b 4.0
3 2 b 4.0
4 2 b NaN
5 3 c 9.0
6 3 NaN NaN
7 3 d NaN
8 3 NaN 9.0
9 4 e 8.0
10 4 e NaN
11 5 NaN 2.0
12 6 h NaN
13 6 h NaN
The above is the desired result because
- row index 2 is in group 2, which only has 1 unique value in column B ("b"), so it is changed.
- row indices 6 and 8 are in group 3, but there are 2 unique values in column B ("c" and "d"), so they are unaltered.
- row index 5 is in group 11, but has no data in column B to propagate.
- row index 13 is in group 6, which only has 1 unique value in column B ("h"), so it is changed.
CodePudding user response:
One option is to add a condition in groupby.apply
:
df[col_to_modify] = df.groupby(col_to_groupby)[col_to_modify].apply(lambda x: x.ffill().bfill() if x.nunique()==1 else x)
Another could be to use groupby
transform(nunique)
eq
to create a boolean filter for the groups with unique values; then update those rows with groupby
first
(first
drops NaN) using where
:
g = df.groupby(col_to_groupby)[col_to_modify]
df[col_to_modify] = g.transform('first').where(g.transform('nunique').eq(1), df[col_to_modify])
Output:
A B C
0 1 a 5.0
1 1 a NaN
2 2 b 4.0
3 2 b 4.0
4 2 b NaN
5 3 c 9.0
6 3 NaN NaN
7 3 d NaN
8 3 NaN 9.0
9 4 e 8.0
10 4 e NaN
11 5 NaN 2.0
12 6 h NaN
13 6 h NaN