How to handle large integers in python with Pandas?-CodePudding

Problem

I have an issue while creating a dataframe in pandas. I am creating a new null data frame df2 from an existing data frame df1 with the same columns as shown below:

import pandas as pd
import numpy as np

df2 = df1.DataFrame(columns=df1.columns)

Now while in a loop, I add another column which stores an integer with 18 digits using the following code:

df2.loc[i, 'new column'] = 123123123123123123123123

This, however, stores the result in the data frame in the exponential form as 1.231231231231e 17. It truncates the last two digits. I wish to store the value in the new column as an 18-digit integer itself.

I tried two attempts to solve this.

Approach 1: Modification at the point of definition

df2 = df1.DataFrame(columns=df1.columns)
df2['new column'] = 0
df2['new column'] = df2['new column'].astype(np.int64) # also tried .apply(np.int64)

Approach 2: Modification at the point of assignment

df2.loc[i, 'new column'] = np.int64(123123123123123123123123)

Unfortunately, both solutions have not worked for me.

Reproducible Code for More Clarity

df1 = pd.DataFrame(data={'A':[123123123123123123, 234234234234234234, 345345345345345345], 'B':[11,22,33]})
df1

Output:

                     A  B
0   123123123123123123  11
1   234234234234234234  22
2   345345345345345345  33

for i in range(df1.shape[0]):
    df1.loc[i, 'new column'] = 222222222222222222
df1

Output:

                     A  B   new column
0   123123123123123123  11  2.222222e 17
1   234234234234234234  22  2.222222e 17
2   345345345345345345  33  2.222222e 17

When I try to convert it back, I get a different number.

df1['new column'] = df1['new column'].astype(np.int64)
df1

Output:

                     A  B   new column
0   123123123123123123  11  222222222222222208
1   234234234234234234  22  222222222222222208
2   345345345345345345  33  222222222222222208

CodePudding user response：

The solution is to assign the 'new column' as an object before assignment of value and convert it to long integer after the assignment is done. I am not sure if this is the most efficient method. Open to better solutions. But it works.

df1 = pd.DataFrame(data={'A':[123123123123123123, 234234234234234234, 345345345345345345], 'B':[11,22,33]})

df1['new column'] = df1['new column'].astype(object) # store as an object

for i in range(df1.shape[0]):
    df1.loc[i, 'new column'] = 222222222222222222

df1['new column'] = df1['new column'].astype(np.int64) # convert it to long integer after assignment of the value

Output:

                     A  B   new column
0   123123123123123123  11  222222222222222222
1   234234234234234234  22  222222222222222222
2   345345345345345345  33  222222222222222222

df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   A           3 non-null      int64
 1   B           3 non-null      int64
 2   new column  3 non-null      int64
dtypes: int64(3)
memory usage: 200.0 bytes

CodePudding user response：

Try np.float

np.float64(123123123123123123123123)

# Output
1.2312312312312312e 23