Pandas: replace specific row values for certain columns using method chaining-CodePudding

Question: how can I replace specific row value within a pandas method chain.

Here is my code:

days = np.arange(0,11)
rets = np.array([ 0.00, 0.02, 0.03, 0.04, -0.01,    -0.02, 0.01, 0.02,  -0.03, -0.05,0.10 ])
start = 100

df = pd.DataFrame({"time": days, "return":rets})

new_df = (df
          .assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
)
new_df.iloc[0] = new_df.iloc[0].replace(np.nan,1) # add to method chain above

How can I do the operation in the last line within the method chain. With method chain I mean

new_df = (df
          .assign(...) 
          .replace(...)
          )

CodePudding user response：

You actually can use .replace for this (with arbitrary values, not necessarily na-s):

new_df = (df
          .assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
          .T.replace({0: np.nan}, 1).T
)

It doesn't have an option to specify which row to fill, but does - for the column! So we can simply transpose our dataframe before and after the operation.

.fillna can work in a similar way (but it only replaces the na values):

new_df = (df
          .assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
          .T.fillna({0:1}).T
)

I had to transpose the dataframe before and after filling because currently it "can only fill with dict/Series column by column".

CodePudding user response：

Vladimir has posted correct and useful answers to my initial question.

I have meanwhile encountered a third option. This option is more verbose as it requires writing a separate function. It has the benefit to be more readable in the method chain itself. At least for me.

# New function doing the replacement
def replace_first_row(_df, to_replace=np.nan, value=1):
    cols = _df.columns
    _df.iloc[0] = _df.iloc[0].replace(to_replace=to_replace, value=value)
    return _df

days = np.arange(0,11)
rets = np.array([ 0.00, 0.02, 0.03, 0.04, -0.01,    -0.02, 0.01, 0.02,  -0.03, -0.05,0.10 ])
start = 100

df = pd.DataFrame({"time": days, "return":rets})

new_df = (df
          .assign(**{f"lag_{i}":df["return"].add(1).iloc[1:].shift(-i).cumprod() for i in np.arange(6)})
          .pipe(replace_first_row, np.nan, 1)
          )

new_df