I have the following dataframe:
import pandas as pd
df = pd.DataFrame({
'a': [1, 1, 2, 2],
'b': [None, 'w', None, 'z']
})
a | b |
---|---|
1 | None |
1 | 'w' |
2 | None |
2 | 'z' |
And I want to repeat the values that are not None
in column 'b', but based on the value in column 'a'.
At the end I would have this dataframe:
a | b |
---|---|
1 | 'w' |
1 | 'w' |
2 | 'z' |
2 | 'z' |
CodePudding user response:
The logic is not fully clear on how you would like to generalize, but you could bfill
/ffill
per group:
df['b'] = df.groupby('a')['b'].apply(lambda x: x.bfill().ffill())
output:
a b
0 1 w
1 1 w
2 2 z
3 2 z
CodePudding user response:
it's a bit tricky but it works. Basically what happen is that for each subsample of 'a' we are going to fill na values with the column 'b'. I'm assuming that for each element of 'a' there exist only one value of 'b' and no more
df = pd.DataFrame({
'a': [1, 1, 2, 2],
'b': [None, 'w', None, 'z']})
df
a b
0 1 None
1 1 w
2 2 None
3 2 z
for i in df['a'].unique():
df[df['a']==i] = df[df['a']==i].fillna(df[df['a']==i].dropna()['b'].iloc[0])
df
a b
0 1 w
1 1 w
2 2 z
3 2 z