repeat non-null value in Pandas DataFrame with Groupby-CodePudding

I'd like to fill all NaN values with a non-NaN value using groupby. I know I can do this using fillna(method='ffill') and then fillna(method='bfill') since my non-NaN value is not always a the beginning or end, or I can create a new DataFrame with the non-NaN value and then merge them, but I need to do this a bunch of times and I was wondering if there is a more efficient method to reduce the process time.

I was thinking something like a .transform(fillna) function.

This is my input:

col1	col2
A	Nan
A	Nan
A	'apples'
B	Nan
B	'bananas'
B	Nan

and this is what I'd like to get

col1	col2
A	'apples'
A	'apples
A	'apples'
B	'bananas
B	'bananas'
B	'bananas'

CodePudding user response：

As envisioned, you can do:

df.groupby('col1')['col2'].ffill().bfill()

You can also do:

df.groupby('col1')['col2'].transform('first')

which should be faster on large dataframes.

Note that it is doing the same thing in this case but it would behave differently if you have non unique non-NaN values in each group.

CodePudding user response：

df1.col2 = df1.col2.mask((df1.col2.isnull()) & (df1.col1 == 'A'), 'apples')
df1.col2 = df1.col2.mask((df1.col2.isnull()) & (df1.col1 == 'B'), 'bananas')