Home > front end >  How to fillna when based on other column's percentage change when there are more than one NaN v
How to fillna when based on other column's percentage change when there are more than one NaN v

Time:11-09

I have dataframe which has one nan value in column B.

df_stack = pd.DataFrame({'A': [2, 4, 8, 0,6],
                         'B': [2, 4, np.NaN, 5,7],
                         'C': [10, 2, 1, 8,9]},
                        index=['1', '2', '3', '4', '5'])


    A   B   C
1   2   2.0 10
2   4   4.0 2
3   8   NaN 1
4   0   5.0 8
5   6   7.0 9

This nan value should be replaced by a value so that the new value represent the same percentage change as column C. So I used this technique and it works.

df_stack['B'] = df_stack['B'].fillna(
    (
        df_stack['C'] /
        df_stack['C'].shift(1)
    ) * df_stack['B'].shift(1)
)

A   B   C
1   2   2.0 10
2   4   4.0 2
3   8   2.0 1
4   0   5.0 8
5   6   7.0 9

But the issue arises when there are two or more consecutive nan values. It replaces only the first nan and keep the rest.

df_stack = pd.DataFrame({'A': [2, 4, 8, 0,6],
                         'B': [2, 4, np.NaN, np.NaN ,7],
                         'C': [10, 2, 1, 8,9]},
                        index=['1', '2', '3', '4', '5'])

df_stack['B'] = df_stack['B'].fillna(
    (
        df_stack['C'] /
        df_stack['C'].shift(1)
    ) * df_stack['B'].shift(1)
)
A   B   C
1   2   2.0 10
2   4   4.0 2
3   8   2.0 1
4   0   NaN 8
5   6   7.0 9

Is there any way without using for loop to do this operation so that the value will be changed row by row (from top to bottom)? My expected output is this:

df_stack = pd.DataFrame({'A': [2, 4, 8, 0,6],
                             'B': [2, 4, np.NaN, np.NaN ,7],
                             'C': [10, 2, 1, 8,9]},
                            index=['1', '2', '3', '4', '5'])
    
    df_stack['B'] = df_stack['B'].fillna(
        (
            df_stack['C'] /
            df_stack['C'].shift(1)
        ) * df_stack['B'].shift(1)
    )
    A   B   C
    1   2   2.0  10
    2   4   4.0  2
    3   8   2.0  1
    4   0   16.0 8
    5   6   7.0  9

CodePudding user response:

For a vectorial approach you can use a cumprod per group:

group = df_stack['B'].shift().notna().cumsum()
factor = df_stack['C'].pct_change().add(1)
fill = (df_stack['B'].shift()
        .mul(factor, fill_value=1)
        .groupby(group).cumprod()
       )

df_stack['B'] = df_stack['B'].fillna(fill)

output:

   A     B   C
1  2   2.0  10
2  4   4.0   2
3  8   2.0   1
4  0  16.0   8
5  6   7.0   9

CodePudding user response:

Could not find a direct approach. But there is a work around. First get the number highest number of consecutive nan values.

for i in range(len(df_stack["B"):
    if len(df_stack[(df_stack['B'].isnull()) & (df_stack['B'].shift(i).isnull())])>0:
        continue
    else:
        break

Then run the fillna code for the maximum numbers of consecutive nan values.

for count in range(i):
    df_stack['B'] = df_stack['B'].fillna(
    (
        df_stack['C'] /
        df_stack['C'].shift(1)
    ) * df_stack['B'].shift(1)
)
  • Related