Home > database >  Efficient way to get a percentage over values columns
Efficient way to get a percentage over values columns

Time:10-06

So have got a pandas data frame looking like this:

          <STD>  <IQR>  <EE>  <IF>  
Inliers      1      2     3     4
Outliers     4      3     2     1

and I want to add another row calculating the percentage (of outliers in data i.e. contamination) so it would look like this:

          <STD>  <IQR>  <EE>  <IF>  
Inliers      1      2     3     4
Outliers     4      3     2     1
Percentage  80%    60%   40%   20%  

What's the most efficient way to do this with not much code and small runtime? (sorry for pretty basic question, but I'm lost in indexing and can't figure it out by myself without transposing the data frame or writing redundant)

CodePudding user response:

You can select row by index in DataFrame.loc, divide by sum and if need custom format add map:

df.loc['Percentage'] = df.loc['Outliers'].div(df.sum()).map("{:.0%}".format)
print (df)
           <STD> <IQR> <EE> <IF>
Inliers        1     2    3    4
Outliers       4     3    2    1
Percentage   80%   60%  40%  20%

Another idea inspire another answer, also simplify without Series and added map for custom format:

df = df.append(df.loc['Outliers'].div(df.sum()).map("{:.0%}".format).rename('Percentage'))
print (df)
           <STD> <IQR> <EE> <IF>
Inliers        1     2    3    4
Outliers       4     3    2    1
Percentage   80%   60%  40%  20%

CodePudding user response:

simple case of using vectorised calc

import io
import pandas as pd

df = pd.read_csv(io.StringIO("""          <STD>  <IQR>  <EE>  <IF>  
Inliers      1      2     3     4
Outliers     4      3     2     1"""), sep="\s ")

df.append(pd.Series(df.loc["Outliers"].div(df.sum(axis=0)), name="Percentage"))

CodePudding user response:

Alternative here:

sum_column = df.loc['Inliers']   df.loc['Outliers']
percentage = df.loc['Outliers'] / sum_column
df.loc['Percentage'] = percentage.map("{:.0%}".format)
  • Related