I tried to do the sum of large integers in pandas and the answer is not as expected.
Input file : my_file_lg_int
my_int
111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
Python code
file = 'my_file_lg_int'
data = pd.read_csv(file)
data['my_int'].sum()
The output is :
111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
As integers are too long, they are not integers but strings.
So I tried data = pd.read_csv(file,dtype = {'my_int': int})
but I have an overflow error. How could I solve it ?
CodePudding user response:
Perhaps:
df["my_int"].apply(int).sum()
CodePudding user response:
Many tasks are easier without hauling in the enormous pandas and numpy modules.
filename = 'my_file_lg_int'
mysum = sum( int(k.rstrip()) for k in open(filename) )
CodePudding user response:
We can use decimal
module to solve this. According to the documentation:
Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem:
Since the number in any given row here has 102 digits, we can choose to set the precision to 103 digits. This method will not, however, work if a number in any row has more than 103 digits.
import pandas as pd
import decimal
from decimal import Decimal
decimal.setcontext(decimal.Context(prec=103))
df = pd.read_csv(file, dtype={"my_int": Decimal})
x = Decimal("0")
for i in df['my_int']:
x = x Decimal(i)
print(x)
print(type(x))
This gives:
333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333
<class 'decimal.Decimal'>