Question: how can I replace specific row value within a pandas method chain.
Here is my code:
days = np.arange(0,11)
rets = np.array([ 0.00, 0.02, 0.03, 0.04, -0.01, -0.02, 0.01, 0.02, -0.03, -0.05,0.10 ])
start = 100
df = pd.DataFrame({"time": days, "return":rets})
new_df = (df
.assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
)
new_df.iloc[0] = new_df.iloc[0].replace(np.nan,1) # add to method chain above
How can I do the operation in the last line within the method chain. With method chain I mean
new_df = (df
.assign(...)
.replace(...)
)
CodePudding user response:
You actually can use .replace
for this (with arbitrary values, not necessarily na
-s):
new_df = (df
.assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
.T.replace({0: np.nan}, 1).T
)
It doesn't have an option to specify which row to fill, but does - for the column! So we can simply transpose our dataframe before and after the operation.
.fillna
can work in a similar way (but it only replaces the na
values):
new_df = (df
.assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
.T.fillna({0:1}).T
)
I had to transpose the dataframe before and after filling because currently it "can only fill with dict/Series column by column".
CodePudding user response:
Vladimir has posted correct and useful answers to my initial question.
I have meanwhile encountered a third option. This option is more verbose as it requires writing a separate function. It has the benefit to be more readable in the method chain itself. At least for me.
# New function doing the replacement
def replace_first_row(_df, to_replace=np.nan, value=1):
cols = _df.columns
_df.iloc[0] = _df.iloc[0].replace(to_replace=to_replace, value=value)
return _df
days = np.arange(0,11)
rets = np.array([ 0.00, 0.02, 0.03, 0.04, -0.01, -0.02, 0.01, 0.02, -0.03, -0.05,0.10 ])
start = 100
df = pd.DataFrame({"time": days, "return":rets})
new_df = (df
.assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
.pipe(replace_first_row, np.nan, 1)
)
new_df