I'm interested in forward filling both single and multiple values in a column in pandas. With the following dataframe:
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df
0 1 2
0 1 2 3
1 4 NaN NaN
2 NaN NaN 9
Forward fill will produce:
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df.fillna(method='ffill')
df
0 1 2
0 1 2 3
1 4 2 3
2 4 2 9
However, I need an ffill-like method that will do this, or alternatively copy all above values if the values above follow one another:
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, 5, 9], [None,None,None])
df
0 1 2
0 1 2 3
1 4 NaN NaN
2 NaN 5 9
3 NaN NaN NaN
Resulting in:
df
0 1 2
0 1 2 3
1 4 2 3
2 1 5 9
3 4 5 9
Major edit: In my data the values will always be followed by NaNs in an unknown multiple of the length of values. Take df[0], for instance 1,4 would repeat for as long as there are NaNs. The only rule is that they will be a multiple of the length of the values (2)
CodePudding user response:
You can create consecutive values for missing and not missing values, then create counter per columns and forward filling missing values per groups:
df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [None,None,10],
[0, 2, None], [5, None, None], [None, 5, None], [None,None,None]])
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 NaN 8.0
2 NaN 5.0 9.0
3 NaN NaN 10.0
4 0.0 2.0 NaN
5 5.0 NaN NaN
6 NaN 5.0 NaN
7 NaN NaN NaN
m = df.isna()
g = m.ne(m.shift()).cumsum()
for c in df.columns:
df[c] = df.groupby(g.groupby(c).cumcount())[c].ffill()
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 8.0
2 1.0 5.0 9.0
3 4.0 5.0 10.0
4 0.0 2.0 3.0
5 5.0 2.0 8.0
6 0.0 5.0 9.0
7 5.0 5.0 10.0
EDIT: New solution repeat non missing values by newxt missing values per groups creted by first non missing value, here is used numpy.tile
for generate sequences:
df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [7,None,10],
[0, 2, None], [5, None, None], [None, 6, None], [None,8,None]
, [None,None,None], [None,None,None]])
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 NaN 8.0
2 NaN 5.0 9.0
3 7.0 NaN 10.0
4 0.0 2.0 NaN
5 5.0 NaN NaN
6 NaN 6.0 NaN
7 NaN 8.0 NaN
8 NaN NaN NaN
9 NaN NaN NaN
g = (df.notna() & df.shift().isna()).cumsum()
def f(x):
non_miss = x.dropna()
return np.tile(non_miss, int(len(x) // len(non_miss) 2))[:len(x)]
df = df.apply(lambda x: x.groupby(g[x.name]).transform(f))
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 8.0
2 1.0 5.0 9.0
3 7.0 5.0 10.0
4 0.0 2.0 3.0
5 5.0 2.0 8.0
6 7.0 6.0 9.0
7 0.0 8.0 10.0
8 5.0 6.0 3.0
9 7.0 8.0 8.0