I'd like to fill all NaN values with a non-NaN value using groupby. I know I can do this using fillna(method='ffill') and then fillna(method='bfill') since my non-NaN value is not always a the beginning or end, or I can create a new DataFrame with the non-NaN value and then merge them, but I need to do this a bunch of times and I was wondering if there is a more efficient method to reduce the process time.
I was thinking something like a .transform(fillna) function.
This is my input:
col1 | col2 |
---|---|
A | Nan |
A | Nan |
A | 'apples' |
B | Nan |
B | 'bananas' |
B | Nan |
and this is what I'd like to get
col1 | col2 |
---|---|
A | 'apples' |
A | 'apples |
A | 'apples' |
B | 'bananas |
B | 'bananas' |
B | 'bananas' |
CodePudding user response:
As envisioned, you can do:
df.groupby('col1')['col2'].ffill().bfill()
You can also do:
df.groupby('col1')['col2'].transform('first')
which should be faster on large dataframes.
Note that it is doing the same thing in this case but it would behave differently if you have non unique non-NaN values in each group.
CodePudding user response:
df1.col2 = df1.col2.mask((df1.col2.isnull()) & (df1.col1 == 'A'), 'apples')
df1.col2 = df1.col2.mask((df1.col2.isnull()) & (df1.col1 == 'B'), 'bananas')