Forward fill blocks of above values pandas-CodePudding

I'm interested in forward filling both single and multiple values in a column in pandas. With the following dataframe:

import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN NaN   9

Forward fill will produce:

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df.fillna(method='ffill')
df
   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

However, I need an ffill-like method that will do this, or alternatively copy all above values if the values above follow one another:

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, 5, 9], [None,None,None])
df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN   5   9 
3 NaN NaN NaN

Resulting in:

df
    0   1   2
0   1   2   3
1   4   2   3
2   1   5   9 
3   4   5   9

Major edit: In my data the values will always be followed by NaNs in an unknown multiple of the length of values. Take df[0], for instance 1,4 would repeat for as long as there are NaNs. The only rule is that they will be a multiple of the length of the values (2)

CodePudding user response：

You can create consecutive values for missing and not missing values, then create counter per columns and forward filling missing values per groups:

df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [None,None,10],
                   [0, 2, None], [5, None, None], [None, 5, None], [None,None,None]])


print (df)
     0    1     2
0  1.0  2.0   3.0
1  4.0  NaN   8.0
2  NaN  5.0   9.0
3  NaN  NaN  10.0
4  0.0  2.0   NaN
5  5.0  NaN   NaN
6  NaN  5.0   NaN
7  NaN  NaN   NaN

m = df.isna()
g = m.ne(m.shift()).cumsum()
for c in df.columns:
    df[c] = df.groupby(g.groupby(c).cumcount())[c].ffill()

print (df)
     0    1     2
0  1.0  2.0   3.0
1  4.0  2.0   8.0
2  1.0  5.0   9.0
3  4.0  5.0  10.0
4  0.0  2.0   3.0
5  5.0  2.0   8.0
6  0.0  5.0   9.0
7  5.0  5.0  10.0

EDIT: New solution repeat non missing values by newxt missing values per groups creted by first non missing value, here is used numpy.tile for generate sequences:

df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [7,None,10],
                   [0, 2, None], [5, None, None], [None, 6, None], [None,8,None]
                   , [None,None,None], [None,None,None]])
print (df)
     0    1     2
0  1.0  2.0   3.0
1  4.0  NaN   8.0
2  NaN  5.0   9.0
3  7.0  NaN  10.0
4  0.0  2.0   NaN
5  5.0  NaN   NaN
6  NaN  6.0   NaN
7  NaN  8.0   NaN
8  NaN  NaN   NaN
9  NaN  NaN   NaN

g = (df.notna() & df.shift().isna()).cumsum()

def f(x):
    non_miss = x.dropna()
    return np.tile(non_miss, int(len(x) // len(non_miss)   2))[:len(x)]

df = df.apply(lambda x: x.groupby(g[x.name]).transform(f))
print (df)
     0    1     2
0  1.0  2.0   3.0
1  4.0  2.0   8.0
2  1.0  5.0   9.0
3  7.0  5.0  10.0
4  0.0  2.0   3.0
5  5.0  2.0   8.0
6  7.0  6.0   9.0
7  0.0  8.0  10.0
8  5.0  6.0   3.0
9  7.0  8.0   8.0