I have dataframe which has one nan value in column B
.
df_stack = pd.DataFrame({'A': [2, 4, 8, 0,6],
'B': [2, 4, np.NaN, 5,7],
'C': [10, 2, 1, 8,9]},
index=['1', '2', '3', '4', '5'])
A B C
1 2 2.0 10
2 4 4.0 2
3 8 NaN 1
4 0 5.0 8
5 6 7.0 9
This nan value should be replaced by a value so that the new value represent the same percentage change as column C. So I used this technique and it works.
df_stack['B'] = df_stack['B'].fillna(
(
df_stack['C'] /
df_stack['C'].shift(1)
) * df_stack['B'].shift(1)
)
A B C
1 2 2.0 10
2 4 4.0 2
3 8 2.0 1
4 0 5.0 8
5 6 7.0 9
But the issue arises when there are two or more consecutive nan values. It replaces only the first nan and keep the rest.
df_stack = pd.DataFrame({'A': [2, 4, 8, 0,6],
'B': [2, 4, np.NaN, np.NaN ,7],
'C': [10, 2, 1, 8,9]},
index=['1', '2', '3', '4', '5'])
df_stack['B'] = df_stack['B'].fillna(
(
df_stack['C'] /
df_stack['C'].shift(1)
) * df_stack['B'].shift(1)
)
A B C
1 2 2.0 10
2 4 4.0 2
3 8 2.0 1
4 0 NaN 8
5 6 7.0 9
Is there any way without using for loop to do this operation so that the value will be changed row by row (from top to bottom)? My expected output is this:
df_stack = pd.DataFrame({'A': [2, 4, 8, 0,6],
'B': [2, 4, np.NaN, np.NaN ,7],
'C': [10, 2, 1, 8,9]},
index=['1', '2', '3', '4', '5'])
df_stack['B'] = df_stack['B'].fillna(
(
df_stack['C'] /
df_stack['C'].shift(1)
) * df_stack['B'].shift(1)
)
A B C
1 2 2.0 10
2 4 4.0 2
3 8 2.0 1
4 0 16.0 8
5 6 7.0 9
CodePudding user response:
For a vectorial approach you can use a cumprod
per group:
group = df_stack['B'].shift().notna().cumsum()
factor = df_stack['C'].pct_change().add(1)
fill = (df_stack['B'].shift()
.mul(factor, fill_value=1)
.groupby(group).cumprod()
)
df_stack['B'] = df_stack['B'].fillna(fill)
output:
A B C
1 2 2.0 10
2 4 4.0 2
3 8 2.0 1
4 0 16.0 8
5 6 7.0 9
CodePudding user response:
Could not find a direct approach. But there is a work around. First get the number highest number of consecutive nan values.
for i in range(len(df_stack["B"):
if len(df_stack[(df_stack['B'].isnull()) & (df_stack['B'].shift(i).isnull())])>0:
continue
else:
break
Then run the fillna
code for the maximum numbers of consecutive nan values.
for count in range(i):
df_stack['B'] = df_stack['B'].fillna(
(
df_stack['C'] /
df_stack['C'].shift(1)
) * df_stack['B'].shift(1)
)