Home > database >  Performing a function on dataframe rows
Performing a function on dataframe rows

Time:07-06

I am trying to winsorize a data set that would contain a few hundred columns of data. I'd like to make a new column to the dataframe and the column would contain the winsorized result from its row's data. How can I do this with a pandas dataframe without having to specify each column (I'd like to use all columns)?

Edit: I would want to use the function 'winsorize(list, limits = [0.1,0.1])' but I'm not sure how to format the dataframe rows to work as a list.

CodePudding user response:

Some tips:

  • You may use the pandas function apply with axis=1 to apply a function to every row.
  • The apply function will receive a pandas Series object but you can easily convert it to a list using tolist method

For example:

df.apply(lambda x: winsorize(x.tolist(), limits=[0.1,0.1]), axis=1)

CodePudding user response:

You can use the numpy version of your dataframe using to_numpy()

from scipy.stats.mstats import winsorize

ma = winsorize(df.to_numpy(), axis=1, limits=[0.1, 0.1])
out = pd.DataFrame(ma.data, index=df.index, columns=df.columns)
  • Related