I have a dataframe of the form:
A B C
Cat-1 798.26 456.65 187.56
Cat-2 165165.53 45450.00 4897.57
Cat-3 488565.65 15198.56 15654.65
Cat-4 0.00 54256.35 49878.65
Cat-5 1156.61 789.05 89789.54
Cat-6 0.00 1644.78 6876.15
I am attempting to get a percentage by dividing B by A. To achieve this I used the following:
if_condition = df['A'] != 0
then = (1 - df['B'].div(df['A']))
else_= 0
df['New Col'] = np.where(if_condition, then, else_)
I expected the following result:
A B C New Col
Cat-1 798.26 456.65 187.56 .5720
Cat-2 165165.53 45450.00 4897.57 .2751
Cat-3 488565.65 15198.56 15654.65 .0311
Cat-4 0.00 54256.35 49878.65 0
Cat-5 1156.61 789.05 89789.54 .6822
Cat-6 0.00 1644.78 6876.15 0
However, I got the following result:
A B C New Col
Cat-1 798.26 456.65 187.56 NaN
Cat-2 165165.53 45450.00 4897.57 0.2751
Cat-3 488565.65 15198.56 15654.65 0.0311
Cat-4 0.00 54256.35 49878.65 0
Cat-5 1156.61 789.05 89789.54 NaN
Cat-6 0.00 1644.78 6876.15 0
I have tried some other solutions which involved the alignment of the two columns, however that did not alter the end result. What could potentially generate these NaN values?
CodePudding user response:
import pandas as pd
import numpy as np
import io
df = pd.read_csv(io.StringIO(""" A B C
Cat-1 798.26 456.65 187.56
Cat-2 165165.53 45450.00 4897.57
Cat-3 488565.65 15198.56 15654.65
Cat-4 0.00 54256.35 49878.65
Cat-5 1156.61 789.05 89789.54
Cat-6 0.00 1644.78 6876.15"""), sep="\s\s ", engine="python")
df
# output
A B C
Cat-1 798.26 456.65 187.56
Cat-2 165165.53 45450.00 4897.57
Cat-3 488565.65 15198.56 15654.65
Cat-4 0.00 54256.35 49878.65
Cat-5 1156.61 789.05 89789.54
Cat-6 0.00 1644.78 6876.15
if_condition = df['A'] != 0
then = (1 - df['B'].div(df['A']))
else_= 0
df['New Col'] = np.where(if_condition, then, else_)
# output
A B C New Col
Cat-1 798.26 456.65 187.56 0.427943
Cat-2 165165.53 45450.00 4897.57 0.724822
Cat-3 488565.65 15198.56 15654.65 0.968891
Cat-4 0.00 54256.35 49878.65 0.000000
Cat-5 1156.61 789.05 89789.54 0.317791
Cat-6 0.00 1644.78 6876.15 0.000000
Seems to be correct. I use pandas version '1.2.5'
Also you could do this "if else" condition a bit easier:
df["New col"] = df.apply(lambda x: 1 - x["B"] / x["A"] if x["A"] != 0 else 0, axis=1)
CodePudding user response:
You don't need a condition, replace -np.inf
by 0:
# df['New Col'] = (1 - df['B'] / df['A']).replace(-np.inf, 0)
df['New Col'] = ((1 - df['B'] / df['A']) * 100).round(2).replace(-np.inf, 0)
print(df)
# Output:
A B C New Col
Cat-1 798.26 456.65 187.56 42.79
Cat-2 165165.53 45450.00 4897.57 72.48
Cat-3 488565.65 15198.56 15654.65 96.89
Cat-4 0.00 54256.35 49878.65 0.00
Cat-5 1156.61 789.05 89789.54 31.78
Cat-6 0.00 1644.78 6876.15 0.00