So have got a pandas data frame looking like this:
<STD> <IQR> <EE> <IF>
Inliers 1 2 3 4
Outliers 4 3 2 1
and I want to add another row calculating the percentage (of outliers in data i.e. contamination) so it would look like this:
<STD> <IQR> <EE> <IF>
Inliers 1 2 3 4
Outliers 4 3 2 1
Percentage 80% 60% 40% 20%
What's the most efficient way to do this with not much code and small runtime? (sorry for pretty basic question, but I'm lost in indexing and can't figure it out by myself without transposing the data frame or writing redundant)
CodePudding user response:
You can select row by index in DataFrame.loc
, divide by sum
and if need custom format add map
:
df.loc['Percentage'] = df.loc['Outliers'].div(df.sum()).map("{:.0%}".format)
print (df)
<STD> <IQR> <EE> <IF>
Inliers 1 2 3 4
Outliers 4 3 2 1
Percentage 80% 60% 40% 20%
Another idea inspire another answer, also simplify without Series
and added map
for custom format:
df = df.append(df.loc['Outliers'].div(df.sum()).map("{:.0%}".format).rename('Percentage'))
print (df)
<STD> <IQR> <EE> <IF>
Inliers 1 2 3 4
Outliers 4 3 2 1
Percentage 80% 60% 40% 20%
CodePudding user response:
simple case of using vectorised calc
import io
import pandas as pd
df = pd.read_csv(io.StringIO(""" <STD> <IQR> <EE> <IF>
Inliers 1 2 3 4
Outliers 4 3 2 1"""), sep="\s ")
df.append(pd.Series(df.loc["Outliers"].div(df.sum(axis=0)), name="Percentage"))
CodePudding user response:
Alternative here:
sum_column = df.loc['Inliers'] df.loc['Outliers']
percentage = df.loc['Outliers'] / sum_column
df.loc['Percentage'] = percentage.map("{:.0%}".format)