Home > database >  Why fillna have no effect after several operators on dataframe series?
Why fillna have no effect after several operators on dataframe series?

Time:08-07

I have the dataframe which look like this:

df = pd.DataFrame({'Event': ['A', 'B', 'A', 'A', 'B', 'C', 'B', 'B', 'A', 'C'], 
                   'Direction': ['UP', 'DOWN', 'UP', 'UP', 'DOWN', 'DOWN', 'DOWN', 'UP', 'DOWN', 'UP'],
                   'group':[1,2,3,3,3,4,4,4,5,5]})

Everything works fine, when i do:

df['prev'] = df[(df.Event == 'A') & (df.Direction == 'UP')].groupby('group').cumcount().add(1)
df['prev'].fillna(0, inplace=True)

But if i do it in one line the fillna() function does not works:

df['prev'] = df[(df.Event == 'A') & (df.Direction == 'UP')].groupby('group').cumcount().add(1).fillna(0)

My questioni is: Why is that? And is there a way to do it in one line?

CodePudding user response:

Look at the output at this step:

print(df[(df.Event == 'A') & (df.Direction == 'UP')].groupby('group').cumcount().add(1))

# Output:
0    1
2    1
3    2
dtype: int64

Do you see any nan values to fill? Is adding .fillna(0) here going to do anything?


A one liner that would work:

df['prev'] = df.assign(prev = df[(df.Event == 'A') & (df.Direction == 'UP')].groupby('group').cumcount().add(1))['prev'].fillna(0)

CodePudding user response:

Because this part df[(df.Event == 'A') & (df.Direction == 'UP')] is filtering only rows for Event A and Direction UP so when you put the fillna(0) at the end, you are only replacing NaN in the filtered subset of rows and the rest will be filled with NaN because the column prev didn't exist prebiously.

Also because the column prev didn't exist prebiously, I think you cannot do this in a single line. What you are doing is create a whole column and modify only a subset of the same column which you would have to break in 2 steps.

CodePudding user response:

I'm not exactly sure why it's not working, but I have a rough idea. In your first idea, this is what is happening:

df['prev'] = df[...]...
df['prev'] = df['prev'].fillna(0)

Your second idea:

df['prev'] = df[...]....fillna(0)

This probably has something to do with placing fillna(0) on the whole dataframe and when transferred over to the new variable (column) prev, it will revert the 0.0 back to NaN.

  • Related