Home > Software design >  repeat non-null value in Pandas DataFrame with Groupby
repeat non-null value in Pandas DataFrame with Groupby

Time:07-12

I'd like to fill all NaN values with a non-NaN value using groupby. I know I can do this using fillna(method='ffill') and then fillna(method='bfill') since my non-NaN value is not always a the beginning or end, or I can create a new DataFrame with the non-NaN value and then merge them, but I need to do this a bunch of times and I was wondering if there is a more efficient method to reduce the process time.

I was thinking something like a .transform(fillna) function.

This is my input:

col1 col2
A Nan
A Nan
A 'apples'
B Nan
B 'bananas'
B Nan

and this is what I'd like to get

col1 col2
A 'apples'
A 'apples
A 'apples'
B 'bananas
B 'bananas'
B 'bananas'

CodePudding user response:

As envisioned, you can do:

df.groupby('col1')['col2'].ffill().bfill()

You can also do:

df.groupby('col1')['col2'].transform('first')

which should be faster on large dataframes.

Note that it is doing the same thing in this case but it would behave differently if you have non unique non-NaN values in each group.

CodePudding user response:

df1.col2 = df1.col2.mask((df1.col2.isnull()) & (df1.col1 == 'A'), 'apples')
df1.col2 = df1.col2.mask((df1.col2.isnull()) & (df1.col1 == 'B'), 'bananas')
  • Related