Convert numbers to scientific notation-CodePudding

I have data frame where one column (old_column) looks like this:

9.888E8
3.428E9
5.189E8
4.9E7
2.1E7
340.0
4100.0
1000.0
860.0
1000.0

Is there any way to convert this column into (new_column) something like this:

9.888E8
3.428E9
5.189E8
4.9E7
2.1E7
3.4E2
4.1E3
1E3
8.6E2
1E3

So I would like to have all numbers written in scientific notation XXEX .

I was trying to use this method:

new_column = '{:.2e}'.format(old_column)

but does not work or I do not know how to use it :)

Any advice or suggestions?

Thanks.

CodePudding user response：

You can set the display.float_format option to a function with takes a float and returns a string representing the float.

pd.set_option('display.float_format', lambda x: f'{x:.2e}')

Output:

>>> old_column  # notice that you don't need to create a new column at all, since all the above code does is change the way the data is rendered.
0   9.89e 08
1   3.43e 09
2   5.19e 08
3   4.90e 07
4   2.10e 07
5   3.40e 02
6   4.10e 03
7   1.00e 03
8   8.60e 02
9   1.00e 03
Name: a, dtype: float64

CodePudding user response：

In [63]: """9.888E8
    ...: 3.428E9
    ...: 5.189E8
    ...: 4.9E7
    ...: 2.1E7
    ...: 340.0
    ...: 4100.0
    ...: 1000.0
    ...: 860.0
    ...: 1000.0""".splitlines()
...
In [64]: arr=np.array(_,float)
In [65]: arr
Out[65]: 
array([9.888e 08, 3.428e 09, 5.189e 08, 4.900e 07, 2.100e 07, 3.400e 02,
       4.100e 03, 1.000e 03, 8.600e 02, 1.000e 03])

numpy uses scientific notation for the whole array if the range of values is large enough.

List displays each value by its own format:

In [66]: arr.tolist()
Out[66]: 
[988800000.0,
 3428000000.0,
 518900000.0,
 49000000.0,
 21000000.0,
 340.0,
 4100.0,
 1000.0,
 860.0,
 1000.0]

Now put the array in a dataframe:

In [68]: import pandas as pd
In [69]: df = pd.DataFrame(arr)
In [70]: df
Out[70]: 
              0
0  9.888000e 08
1  3.428000e 09
2  5.189000e 08
3  4.900000e 07
4  2.100000e 07
5  3.400000e 02
6  4.100000e 03
7  1.000000e 03
8  8.600000e 02
9  1.000000e 03

In [72]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       10 non-null     float64
dtypes: float64(1)
memory usage: 208.0 bytes

I asked about dtype because I expected a float column to use the same format for all values, same as numpy.

The display option changes the display, but does not change the dtype:

In [75]: pd.set_option('display.float_format', lambda x: f'{x:.2e}')
In [76]: df
Out[76]: 
         0
0 9.89e 08
1 3.43e 09
2 5.19e 08
3 4.90e 07
4 2.10e 07
5 3.40e 02
6 4.10e 03
7 1.00e 03
8 8.60e 02
9 1.00e 03
In [77]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       10 non-null     float64
dtypes: float64(1)
memory usage: 208.0 bytes

I get the mix of formats if I make the frame from the original list of strings. But now the values are strings, not floats.

In [80]: pd.DataFrame(Out[63])
Out[80]: 
         0
0  9.888E8
1  3.428E9
2  5.189E8
3    4.9E7
4    2.1E7
5    340.0
6   4100.0
7   1000.0
8    860.0
9   1000.0
In [81]: df1=pd.DataFrame(Out[63])
In [82]: df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       10 non-null     object
dtypes: object(1)
memory usage: 208.0  bytes