Suppose I have a data frame like this
import pandas as pd
data = {'first_column': ['A', 'null', 'null', 'B', 'null', 'null', 'null' ],
'second_column': [1, 3, 5, 32, 32, 12, 51]}
df = pd.DataFrame(data)
print (df)
I want to produce this
data = {'first_column': ['A', 'A', 'A', 'B', 'B', 'B', 'B' ],
'second_column': [1, 3, 5, 32, 32, 12, 51]}
df = pd.DataFrame(data)
print (df)
how do I do it? I am newbie, I know replace.na, but it's not exactly straight forward I can apply here.
CodePudding user response:
Mask the 'null' as null/nan vaues, then forward fill with ffill
:
df['first_column'] = df['first_column'].mask(df['first_column'] == 'null').ffill()
CodePudding user response:
If your values are actually na as opposed to the string 'null' then Pandas has a .fillna()
function you can use. Documentation here.
df['first_column'] = df['first_column'].fillna(method='ffill')
CodePudding user response:
You can replace the 'null'
string by NaN
and then use fillna()
:
df['first_column'] = df['first_column'].replace('null', pd.NA).fillna(method='ffill')
# But if there are actually null values instead of 'null' then use:
# df['first_column'] = df['first_column'].fillna(method='ffill')
Output:
first_column second_column
0 A 1
1 A 3
2 A 5
3 B 32
4 B 32
5 B 12
6 B 51