When a value in "Returns" column is less than "m"(say m=-0.5), I want the subsequent rows to be filled with zeros only till that month end. How to accomplish this? Thanks in advance.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
date_today = datetime.now()
days = pd.date_range(date_today, date_today timedelta(365), freq='D')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'test': days, 'Price': data})
df = df.set_index('test')
df['Returns']=df['Price'].pct_change()
CodePudding user response:
Use numpy.where
with mask for compare Returns
for less like m
and aggregate Series.cummax
with shift
for < m
per months by DatetimeIndex.to_period
:
np.random.seed(123)
date_today = datetime.now()
days = pd.date_range(date_today, date_today timedelta(365), freq='D')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'test': days, 'Price': data})
df = df.set_index('test')
df['Returns']=df['Price'].pct_change()
m = -0.5
m = (df['Returns'].lt(m)
.groupby(df.index.to_period('m'))
.transform(lambda x: x.shift().cummax())
.fillna(False))
df['Returns'] = np.where(m, 0, df['Returns'])
print (df)
Price Returns
test
2021-11-22 14:32:35.550767 29 NaN
2021-11-23 14:32:35.550767 56 0.931034
2021-11-24 14:32:35.550767 82 0.464286
2021-11-25 14:32:35.550767 13 -0.841463
2021-11-26 14:32:35.550767 35 0.000000
... ...
2022-11-18 14:32:35.550767 47 0.000000
2022-11-19 14:32:35.550767 90 0.000000
2022-11-20 14:32:35.550767 20 0.000000
2022-11-21 14:32:35.550767 27 0.000000
2022-11-22 14:32:35.550767 51 0.000000
[366 rows x 2 columns]