I am trying to create some new columns in a dataframe which are ratios of existing columns:
df[e] = df[a]/df[b]
df[f] = df[c]/df[d]
df[g] = df[a]/df[d]
df[h] = df[b]/df[c]
...
Since some values in the columns are zeros, the code above raises the ZeroDivisionError. I tried to fix it manually with:
try:
df[e] = df[a]/df[b]
except ZeroDivisionError:
df[e] = np.nan
try:
df[f] = df[c]/df[d]
except ZeroDivisionError:
df[f] = np.nan
try:
df[g] = df[a]/df[d]
except ZeroDivisionError:
df[g] = np.nan
...
But with this code all the rows in the new columns are then np.nan instead of only those which would raise the ZeroDivisionError.
So, how could I do this correctly? Possibly while also using a for loop over the new columns without having to do it manually for each new column like I tried in the second code block.
Thank you very much!
CodePudding user response:
Pandas should not raise a ValueError upon division by zero but rather define the value as NaN/inf:
np.random.seed(42)
df = pd.DataFrame(np.random.choice(range(3), size=(5,4)), columns=list('abcd'))
df['e'] = df['a']/df['b']
output:
a b c d e
0 2 0 2 2 inf
1 0 0 2 1 NaN
2 2 2 2 2 1.0
3 0 2 1 0 0.0
4 1 1 1 1 1.0
Not that you can also perform all computations in one shot:
np.random.seed(42)
df = pd.DataFrame(np.random.choice(range(3), size=(5,4)), columns=list('abcd'))
df.loc[:, ['e', 'f', 'g', 'h']] = df[['a', 'c', 'a', 'b']].div(df[['b', 'd', 'd', 'c']].values, axis=1).values
output:
a b c d e f g h
0 2 0 2 2 inf 1.0 1.0 0.0
1 0 0 2 1 NaN 2.0 0.0 0.0
2 2 2 2 2 1.0 1.0 1.0 1.0
3 0 2 1 0 0.0 inf NaN 2.0
4 1 1 1 1 1.0 1.0 1.0 1.0
CodePudding user response:
You can try by iterating over each single element like this:
df[e] = [a/b if b else 0 for a,b in zip(df[a],df[b])