Problem
I have an issue while creating a dataframe in pandas. I am creating a new null data frame df2 from an existing data frame df1 with the same columns as shown below:
import pandas as pd
import numpy as np
df2 = df1.DataFrame(columns=df1.columns)
Now while in a loop, I add another column which stores an integer with 18 digits using the following code:
df2.loc[i, 'new column'] = 123123123123123123123123
This, however, stores the result in the data frame in the exponential form as 1.231231231231e 17. It truncates the last two digits. I wish to store the value in the new column
as an 18-digit integer itself.
I tried two attempts to solve this.
Approach 1: Modification at the point of definition
df2 = df1.DataFrame(columns=df1.columns)
df2['new column'] = 0
df2['new column'] = df2['new column'].astype(np.int64) # also tried .apply(np.int64)
Approach 2: Modification at the point of assignment
df2.loc[i, 'new column'] = np.int64(123123123123123123123123)
Unfortunately, both solutions have not worked for me.
Reproducible Code for More Clarity
df1 = pd.DataFrame(data={'A':[123123123123123123, 234234234234234234, 345345345345345345], 'B':[11,22,33]})
df1
Output:
A B
0 123123123123123123 11
1 234234234234234234 22
2 345345345345345345 33
for i in range(df1.shape[0]):
df1.loc[i, 'new column'] = 222222222222222222
df1
Output:
A B new column
0 123123123123123123 11 2.222222e 17
1 234234234234234234 22 2.222222e 17
2 345345345345345345 33 2.222222e 17
When I try to convert it back, I get a different number.
df1['new column'] = df1['new column'].astype(np.int64)
df1
Output:
A B new column
0 123123123123123123 11 222222222222222208
1 234234234234234234 22 222222222222222208
2 345345345345345345 33 222222222222222208
CodePudding user response:
The solution is to assign the 'new column' as an object before assignment of value and convert it to long integer after the assignment is done. I am not sure if this is the most efficient method. Open to better solutions. But it works.
df1 = pd.DataFrame(data={'A':[123123123123123123, 234234234234234234, 345345345345345345], 'B':[11,22,33]})
df1['new column'] = df1['new column'].astype(object) # store as an object
for i in range(df1.shape[0]):
df1.loc[i, 'new column'] = 222222222222222222
df1['new column'] = df1['new column'].astype(np.int64) # convert it to long integer after assignment of the value
Output:
A B new column
0 123123123123123123 11 222222222222222222
1 234234234234234234 22 222222222222222222
2 345345345345345345 33 222222222222222222
df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 3 non-null int64
1 B 3 non-null int64
2 new column 3 non-null int64
dtypes: int64(3)
memory usage: 200.0 bytes
CodePudding user response:
Try np.float
np.float64(123123123123123123123123)
# Output
1.2312312312312312e 23