Home > other >  Forward fill blocks of above values pandas
Forward fill blocks of above values pandas

Time:05-11

I'm interested in forward filling both single and multiple values in a column in pandas. With the following dataframe:

import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN NaN   9 

Forward fill will produce:

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df.fillna(method='ffill')
df
   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

However, I need an ffill-like method that will do this, or alternatively copy all above values if the values above follow one another:

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, 5, 9], [None,None,None])
df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN   5   9 
3 NaN NaN NaN

Resulting in:

df
    0   1   2
0   1   2   3
1   4   2   3
2   1   5   9 
3   4   5   9

Major edit: In my data the values will always be followed by NaNs in an unknown multiple of the length of values. Take df[0], for instance 1,4 would repeat for as long as there are NaNs. The only rule is that they will be a multiple of the length of the values (2)

CodePudding user response:

You can create consecutive values for missing and not missing values, then create counter per columns and forward filling missing values per groups:

df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [None,None,10],
                   [0, 2, None], [5, None, None], [None, 5, None], [None,None,None]])


print (df)
     0    1     2
0  1.0  2.0   3.0
1  4.0  NaN   8.0
2  NaN  5.0   9.0
3  NaN  NaN  10.0
4  0.0  2.0   NaN
5  5.0  NaN   NaN
6  NaN  5.0   NaN
7  NaN  NaN   NaN

m = df.isna()
g = m.ne(m.shift()).cumsum()
for c in df.columns:
    df[c] = df.groupby(g.groupby(c).cumcount())[c].ffill()

print (df)
     0    1     2
0  1.0  2.0   3.0
1  4.0  2.0   8.0
2  1.0  5.0   9.0
3  4.0  5.0  10.0
4  0.0  2.0   3.0
5  5.0  2.0   8.0
6  0.0  5.0   9.0
7  5.0  5.0  10.0

EDIT: New solution repeat non missing values by newxt missing values per groups creted by first non missing value, here is used numpy.tile for generate sequences:

df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [7,None,10],
                   [0, 2, None], [5, None, None], [None, 6, None], [None,8,None]
                   , [None,None,None], [None,None,None]])
print (df)
     0    1     2
0  1.0  2.0   3.0
1  4.0  NaN   8.0
2  NaN  5.0   9.0
3  7.0  NaN  10.0
4  0.0  2.0   NaN
5  5.0  NaN   NaN
6  NaN  6.0   NaN
7  NaN  8.0   NaN
8  NaN  NaN   NaN
9  NaN  NaN   NaN

g = (df.notna() & df.shift().isna()).cumsum()

def f(x):
    non_miss = x.dropna()
    return np.tile(non_miss, int(len(x) // len(non_miss)   2))[:len(x)]

df = df.apply(lambda x: x.groupby(g[x.name]).transform(f))
print (df)
     0    1     2
0  1.0  2.0   3.0
1  4.0  2.0   8.0
2  1.0  5.0   9.0
3  7.0  5.0  10.0
4  0.0  2.0   3.0
5  5.0  2.0   8.0
6  7.0  6.0   9.0
7  0.0  8.0  10.0
8  5.0  6.0   3.0
9  7.0  8.0   8.0
  • Related