I am trying to winsorize a data set that would contain a few hundred columns of data. I'd like to make a new column to the dataframe and the column would contain the winsorized result from its row's data. How can I do this with a pandas dataframe without having to specify each column (I'd like to use all columns)?
Edit: I would want to use the function 'winsorize(list, limits = [0.1,0.1])' but I'm not sure how to format the dataframe rows to work as a list.
CodePudding user response:
Some tips:
- You may use the pandas function
apply
withaxis=1
to apply a function to every row. - The apply function will receive a pandas
Series
object but you can easily convert it to a list usingtolist
method
For example:
df.apply(lambda x: winsorize(x.tolist(), limits=[0.1,0.1]), axis=1)
CodePudding user response:
You can use the numpy version of your dataframe using to_numpy()
from scipy.stats.mstats import winsorize
ma = winsorize(df.to_numpy(), axis=1, limits=[0.1, 0.1])
out = pd.DataFrame(ma.data, index=df.index, columns=df.columns)