Home > front end >  How to replace successive null values of a particular column in a pandas with the top value
How to replace successive null values of a particular column in a pandas with the top value

Time:04-19

Suppose I have a data frame like this

import pandas as pd

data = {'first_column':  ['A', 'null', 'null', 'B', 'null', 'null', 'null' ],
        'second_column': [1, 3, 5, 32, 32, 12, 51]}
df = pd.DataFrame(data)
print (df)

I want to produce this

data = {'first_column':  ['A', 'A', 'A', 'B', 'B', 'B', 'B' ],
        'second_column': [1, 3, 5, 32, 32, 12, 51]}
df = pd.DataFrame(data)
print (df)

how do I do it? I am newbie, I know replace.na, but it's not exactly straight forward I can apply here.

CodePudding user response:

Mask the 'null' as null/nan vaues, then forward fill with ffill:

df['first_column'] = df['first_column'].mask(df['first_column'] == 'null').ffill()

CodePudding user response:

If your values are actually na as opposed to the string 'null' then Pandas has a .fillna() function you can use. Documentation here.

df['first_column'] = df['first_column'].fillna(method='ffill')

CodePudding user response:

You can replace the 'null' string by NaN and then use fillna():

df['first_column'] = df['first_column'].replace('null', pd.NA).fillna(method='ffill')
# But if there are actually null values instead of 'null' then use:
# df['first_column'] = df['first_column'].fillna(method='ffill')

Output:

  first_column  second_column
0            A              1
1            A              3
2            A              5
3            B             32
4            B             32
5            B             12
6            B             51
  • Related