Home > Net >  How do you forward fill in pandas within a group?
How do you forward fill in pandas within a group?

Time:02-22

I have a pandas DataFrame of the form

quarter user_id # Sessions
2022 Q1 1 9
2021 Q4 1
2021 Q3 1
2022 Q1 2 8
2021 Q4 2
2021 Q3 2

And I'd like to forward fill the # Sessions column within each user_id to get a table like:

quarter user_id # Sessions
2022 Q1 1 9
2021 Q4 1 9
2021 Q3 1 9
2022 Q1 2 8
2021 Q4 2 8
2021 Q3 2 8

I can do this with

x.groupby('user_id').apply(lambda x: x.fillna({'# Sessions': x['# Sessions'].ffill()}))

But I have reasonably big data (~10k users), and this is a very common operation in the codebase.

Is .groupby necessary or is there a more performant way to achieve the same?

CodePudding user response:

You could also try transforming first (since first skips NaNs):

df['# Sessions'] = df.groupby('user_id')['# Sessions'].transform('first')

Output:

   quarter  user_id  # Sessions
0  2022 Q1        1         9.0
1  2021 Q4        1         9.0
2  2021 Q3        1         9.0
3  2022 Q1        2         8.0
4  2021 Q4        2         8.0
5  2021 Q3        2         8.0

CodePudding user response:

We can avoid use groupby, First we use DataFrame.sort_values, Then we just use ffill except for the rows where user_id change occurs.

df2 = df.sort_values('user_id')
df['# Sessions'] = df2['# Sessions'].ffill()\
    .where(df2['user_id'].eq(df2['user_id'].shift()),
           df2['# Sessions'])

CodePudding user response:

This should be much faster than what you're doing:

x['# Sessions'] = x.groupby('user_id')['# Sessions'].ffill()
  • Related