I have a dataframe and I'm doing tons (20 ) of calculations creating new columns etc. All the calculations work well, including the calculation in question except for 2 rows out of roughly 1,000. The rows are not adjacent to one another and I can't find anything remarkable about these two specific rows the calculation seems to be skipping. The data is being read from a csv and an xlsx file. The trouble rows are from apart of the data from the csv file.
The calculation is:
df['c'] = df['b'] - df['a']
The data for the two trouble rows looks like this:
['a'] ['b'] ['c']
0 30.6427984591421 0
0 9584.28792256921 0
The data for the rest of the df where the calculation works fine looks similar but is processing correctly:
['a'] ['b'] ['c']
102411.4521 37008.6603 -65402.7918
202244.75895 211200.2304295 8955.4714795
Example code:
a = [0, 0, 102411.4521, 202244.75895]
b = [30.6427984591421, 9584.28792256921, 37008.6603, 211200.2304295]
df = pd.DataFrame(zip(a, b), columns=['a', 'b'])
df['c'] = df['b'] - df['a']
Why would the calculation seemingly skip these rows?
CodePudding user response:
You could try resetting the index before doing the operation.
df = df.reset_index(drop=True)
CodePudding user response:
Based on the information you supplied, cPython 3.10.8 does not reproduce the error.
import pandas as pd
df = pd.DataFrame(
[
dict(a=0, b= 30.6427984591421),
dict(a=0, b= 9584.28792256921),
dict(a=102411.4521, b= 37008.6603),
dict(a=202244.75895, b=211200.2304295),
]
)
df["c"] = df.b - df.a
print(pd.__version__)
print(df)
output
1.5.2
a b c
0 0.00000 30.642798 30.642798
1 0.00000 9584.287923 9584.287923
2 102411.45210 37008.660300 -65402.791800
3 202244.75895 211200.230429 8955.471480