Home > Back-end >  Pandas fill in group if condition is met
Pandas fill in group if condition is met

Time:03-16

I have a DataFrame where I am looking to fill in values in a column based on their grouping. I only want to fill in the values (by propagating non-NaN values using ffill and bfill) if there is only one unique value in the column to be filled; otherwise, it should be left as is. My code below has a sample dataset where I try to do this, but I get an error.

Code:

df = pd.DataFrame({"A": [1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6, 6],
                    "B": ['a', 'a', np.nan, 'b', 'b', 'c', np.nan, 'd', np.nan, 'e', 'e', np.nan, 'h', 'h'],
                    "C": [5.0, np.nan, 4.0, 4.0, np.nan, 9.0, np.nan, np.nan, 9.0, 8.0, np.nan, 2.0, np.nan, np.nan]})

col_to_groupby = "A"
col_to_modify = "B"

group = df.groupby(col_to_groupby)
modified = group[group[col_to_modify].nunique() == 1].transform(lambda x: x.ffill().bfill())
df.update(modified)

Error:

KeyError: 'Columns not found: False, True'

Original dataset:

    A    B    C
0   1    a  5.0
1   1    a  NaN
2   2  NaN  4.0
3   2    b  4.0
4   2    b  NaN
5   3    c  9.0
6   3  NaN  NaN
7   3    d  NaN
8   3  NaN  9.0
9   4    e  8.0
10  4    e  NaN
11  5  NaN  2.0
12  6    h  NaN
13  6  NaN  NaN

Desired result:

    A    B    C
0   1    a  5.0
1   1    a  NaN
2   2    b  4.0
3   2    b  4.0
4   2    b  NaN
5   3    c  9.0
6   3  NaN  NaN
7   3    d  NaN
8   3  NaN  9.0
9   4    e  8.0
10  4    e  NaN
11  5  NaN  2.0
12  6    h  NaN
13  6    h  NaN

The above is the desired result because

  • row index 2 is in group 2, which only has 1 unique value in column B ("b"), so it is changed.
  • row indices 6 and 8 are in group 3, but there are 2 unique values in column B ("c" and "d"), so they are unaltered.
  • row index 5 is in group 11, but has no data in column B to propagate.
  • row index 13 is in group 6, which only has 1 unique value in column B ("h"), so it is changed.

CodePudding user response:

One option is to add a condition in groupby.apply:

df[col_to_modify] = df.groupby(col_to_groupby)[col_to_modify].apply(lambda x: x.ffill().bfill() if x.nunique()==1 else x)

Another could be to use groupby transform(nunique) eq to create a boolean filter for the groups with unique values; then update those rows with groupby first (first drops NaN) using where:

g = df.groupby(col_to_groupby)[col_to_modify]
df[col_to_modify] = g.transform('first').where(g.transform('nunique').eq(1), df[col_to_modify])

Output:

    A    B    C
0   1    a  5.0
1   1    a  NaN
2   2    b  4.0
3   2    b  4.0
4   2    b  NaN
5   3    c  9.0
6   3  NaN  NaN
7   3    d  NaN
8   3  NaN  9.0
9   4    e  8.0
10  4    e  NaN
11  5  NaN  2.0
12  6    h  NaN
13  6    h  NaN
  • Related