I'm trying to add a new column that is the result of division between 2 others, but eliminating division by zero rows by only applying the division to rows where the denominator is greater than zero.
df['division'] = 0
df.loc[(df['B'] > 0), 'division'] = (df['A'] / df['B'])
It works just fine when you set df = df.head(X), where X eliminates the rows that don't contain zero's in 'B', so I know it's a failure of the conditional formatting but I don't understand why.
Is there a reason this conditional formatting doesn't work that isn't obvious? This formatting does work fine:
df.loc[df['B'] > 0]
And returns the df that you would expect.
CodePudding user response:
I would keep all the rows and place an np.nan
where the denominator is 0:
df['division'] = np.where(df['B']!=0, df['A'] / df['B'], np.nan)
CodePudding user response:
Note that pandas supports division by zero for columns with numeric dtype (such as float and int64) by returning a result of inf
. However, for columns of object type, it raises a ZeroDivisionError exception.
Example:
import pandas as pd
df = pd.DataFrame({'A':[1,2,3,4,5], 'B':[0,1,2,3,4]})
print(df)
print('', 'result for:', f'{df.dtypes}:', sep='\n')
print(df['A'] / df['B'])
df = df.astype('float')
print('', 'result for:', f'{df.dtypes}:', sep='\n')
print(df['A'] / df['B'])
df = df.astype('object')
try:
print('', 'result for:', f'{df.dtypes}:', sep='\n')
print(df['A'] / df['B'])
except (ZeroDivisionError):
print('raised ZeroDivisionError exception')
Output:
A B
0 1 0
1 2 1
2 3 2
3 4 3
4 5 4
result for:
A int64
B int64
dtype: object:
0 inf
1 2.000000
2 1.500000
3 1.333333
4 1.250000
dtype: float64
result for:
A float64
B float64
dtype: object:
0 inf
1 2.000000
2 1.500000
3 1.333333
4 1.250000
dtype: float64
result for:
A object
B object
dtype: object:
raised ZeroDivisionError exception
One possible solution is to set the dtype of the columns you plan to divide to a numeric type such as float:
try:
print('', 'result for:', f'{df.dtypes}:', sep='\n')
print('first change column types to float')
df.A = df.A.astype('float')
df.B = df.B.astype('float')
print(df['A'] / df['B'])
except (ZeroDivisionError):
print('raised ZeroDivisionError exception')
Output:
result for:
A object
B object
dtype: object:
first change column types to float
0 inf
1 2.000000
2 1.500000
3 1.333333
4 1.250000
dtype: float64