Home > database >  Why is pandas broadcast formula is skipping rows
Why is pandas broadcast formula is skipping rows

Time:12-16

I have a dataframe and I'm doing tons (20 ) of calculations creating new columns etc. All the calculations work well, including the calculation in question except for 2 rows out of roughly 1,000. The rows are not adjacent to one another and I can't find anything remarkable about these two specific rows the calculation seems to be skipping. The data is being read from a csv and an xlsx file. The trouble rows are from apart of the data from the csv file.

The calculation is:

df['c'] =  df['b'] - df['a']

The data for the two trouble rows looks like this:

['a']       ['b']                 ['c']
  0      30.6427984591421           0
  0      9584.28792256921           0

The data for the rest of the df where the calculation works fine looks similar but is processing correctly:

['a']                ['b']                 ['c']
  102411.4521      37008.6603          -65402.7918
  202244.75895    211200.2304295         8955.4714795

Example code:

a = [0, 0, 102411.4521, 202244.75895]
b = [30.6427984591421, 9584.28792256921, 37008.6603, 211200.2304295]
df = pd.DataFrame(zip(a, b), columns=['a', 'b'])
df['c'] =  df['b'] - df['a']

Why would the calculation seemingly skip these rows?

CodePudding user response:

You could try resetting the index before doing the operation.

df = df.reset_index(drop=True)

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html#pandas-dataframe-reset-index

CodePudding user response:

Based on the information you supplied, cPython 3.10.8 does not reproduce the error.

import pandas as pd

df = pd.DataFrame(
    [
        dict(a=0,            b=    30.6427984591421),
        dict(a=0,            b=  9584.28792256921),
        dict(a=102411.4521,  b= 37008.6603),
        dict(a=202244.75895, b=211200.2304295),
    ]
)
df["c"] = df.b - df.a
print(pd.__version__)
print(df)

output

1.5.2
              a              b             c
0       0.00000      30.642798     30.642798
1       0.00000    9584.287923   9584.287923
2  102411.45210   37008.660300 -65402.791800
3  202244.75895  211200.230429   8955.471480
  • Related