Home > Net >  How to handle large integers in python with Pandas?
How to handle large integers in python with Pandas?

Time:11-03

Problem

I have an issue while creating a dataframe in pandas. I am creating a new null data frame df2 from an existing data frame df1 with the same columns as shown below:

import pandas as pd
import numpy as np

df2 = df1.DataFrame(columns=df1.columns)

Now while in a loop, I add another column which stores an integer with 18 digits using the following code:

df2.loc[i, 'new column'] = 123123123123123123123123

This, however, stores the result in the data frame in the exponential form as 1.231231231231e 17. It truncates the last two digits. I wish to store the value in the new column as an 18-digit integer itself.

I tried two attempts to solve this.

Approach 1: Modification at the point of definition

df2 = df1.DataFrame(columns=df1.columns)
df2['new column'] = 0
df2['new column'] = df2['new column'].astype(np.int64) # also tried .apply(np.int64)

Approach 2: Modification at the point of assignment

df2.loc[i, 'new column'] = np.int64(123123123123123123123123)

Unfortunately, both solutions have not worked for me.

Reproducible Code for More Clarity

df1 = pd.DataFrame(data={'A':[123123123123123123, 234234234234234234, 345345345345345345], 'B':[11,22,33]})
df1

Output:

                     A  B
0   123123123123123123  11
1   234234234234234234  22
2   345345345345345345  33

for i in range(df1.shape[0]):
    df1.loc[i, 'new column'] = 222222222222222222
df1

Output:

                     A  B   new column
0   123123123123123123  11  2.222222e 17
1   234234234234234234  22  2.222222e 17
2   345345345345345345  33  2.222222e 17

When I try to convert it back, I get a different number.

df1['new column'] = df1['new column'].astype(np.int64)
df1

Output:

                     A  B   new column
0   123123123123123123  11  222222222222222208
1   234234234234234234  22  222222222222222208
2   345345345345345345  33  222222222222222208

CodePudding user response:

The solution is to assign the 'new column' as an object before assignment of value and convert it to long integer after the assignment is done. I am not sure if this is the most efficient method. Open to better solutions. But it works.

df1 = pd.DataFrame(data={'A':[123123123123123123, 234234234234234234, 345345345345345345], 'B':[11,22,33]})

df1['new column'] = df1['new column'].astype(object) # store as an object

for i in range(df1.shape[0]):
    df1.loc[i, 'new column'] = 222222222222222222

df1['new column'] = df1['new column'].astype(np.int64) # convert it to long integer after assignment of the value

Output:

                     A  B   new column
0   123123123123123123  11  222222222222222222
1   234234234234234234  22  222222222222222222
2   345345345345345345  33  222222222222222222

df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   A           3 non-null      int64
 1   B           3 non-null      int64
 2   new column  3 non-null      int64
dtypes: int64(3)
memory usage: 200.0 bytes

CodePudding user response:

Try np.float

np.float64(123123123123123123123123)

# Output
1.2312312312312312e 23

  • Related