Home > database >  Trouble with cumsum and decimal numbers
Trouble with cumsum and decimal numbers

Time:12-26

Suppose the following code:

import numpy as np
import pandas as pd
   
df = pd.DataFrame({'A': np.random.randint(0, 10, 10)})
df['B'] = df['A'].diff()

x, x_diff = 1, df['B'].iloc[1:]
df['C'] = np.r_[x, x_diff].cumsum()

   A    B    C
# 0  6  NaN  1.0
# 1  6  0.0  1.0
# 2  0 -6.0 -5.0
# 3  7  7.0  2.0
# 4  5 -2.0  0.0
# 5  3 -2.0 -2.0
# 6  3  0.0 -2.0
# 7  8  5.0  3.0
# 8  8  0.0  3.0
# 9  8  0.0  3.0

Column C is beautifully changing as expected. This seems to work without trouble. However, when I use decimal numbers then they get rounded to 0 and I end up with the starting value not changing at all. Any ideas how to prevent this? Theoretically I could multiply the numbers again, but is there a better way to resolve this? The problem is demonstrated below:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': np.random.randint(0, 10, 10)})/100000000000000


df['B'] = df['A'].diff()

x, x_diff = 1, df['B'].iloc[1:]
df['C'] = np.r_[x, x_diff].cumsum()

#               A             B    C
# 0  9.000000e-14           NaN  1.0
# 1  7.000000e-14 -2.000000e-14  1.0
# 2  1.000000e-14 -6.000000e-14  1.0
# 3  9.000000e-14  8.000000e-14  1.0
# 4  9.000000e-14  0.000000e 00  1.0
# 5  4.000000e-14 -5.000000e-14  1.0
# 6  6.000000e-14  2.000000e-14  1.0
# 7  9.000000e-14  3.000000e-14  1.0
# 8  7.000000e-14 -2.000000e-14  1.0
# 9  0.000000e 00 -7.000000e-14  1.0

CodePudding user response:

thanks for the many suggestions. Using 0 as starting value as suggested by PUFF worked. Also the method with pd.set_option('precision', 16) as suggested by Chris worked.

  • Related