I have data frame where one column (old_column) looks like this:
9.888E8
3.428E9
5.189E8
4.9E7
2.1E7
340.0
4100.0
1000.0
860.0
1000.0
Is there any way to convert this column into (new_column) something like this:
9.888E8
3.428E9
5.189E8
4.9E7
2.1E7
3.4E2
4.1E3
1E3
8.6E2
1E3
So I would like to have all numbers written in scientific notation XXEX .
I was trying to use this method:
new_column = '{:.2e}'.format(old_column)
but does not work or I do not know how to use it :)
Any advice or suggestions?
Thanks.
CodePudding user response:
You can set the display.float_format
option to a function with takes a float
and returns a string representing the float.
pd.set_option('display.float_format', lambda x: f'{x:.2e}')
Output:
>>> old_column # notice that you don't need to create a new column at all, since all the above code does is change the way the data is rendered.
0 9.89e 08
1 3.43e 09
2 5.19e 08
3 4.90e 07
4 2.10e 07
5 3.40e 02
6 4.10e 03
7 1.00e 03
8 8.60e 02
9 1.00e 03
Name: a, dtype: float64
CodePudding user response:
In [63]: """9.888E8
...: 3.428E9
...: 5.189E8
...: 4.9E7
...: 2.1E7
...: 340.0
...: 4100.0
...: 1000.0
...: 860.0
...: 1000.0""".splitlines()
...
In [64]: arr=np.array(_,float)
In [65]: arr
Out[65]:
array([9.888e 08, 3.428e 09, 5.189e 08, 4.900e 07, 2.100e 07, 3.400e 02,
4.100e 03, 1.000e 03, 8.600e 02, 1.000e 03])
numpy
uses scientific notation for the whole array if the range of values is large enough.
List displays each value by its own format:
In [66]: arr.tolist()
Out[66]:
[988800000.0,
3428000000.0,
518900000.0,
49000000.0,
21000000.0,
340.0,
4100.0,
1000.0,
860.0,
1000.0]
Now put the array in a dataframe:
In [68]: import pandas as pd
In [69]: df = pd.DataFrame(arr)
In [70]: df
Out[70]:
0
0 9.888000e 08
1 3.428000e 09
2 5.189000e 08
3 4.900000e 07
4 2.100000e 07
5 3.400000e 02
6 4.100000e 03
7 1.000000e 03
8 8.600000e 02
9 1.000000e 03
In [72]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 10 non-null float64
dtypes: float64(1)
memory usage: 208.0 bytes
I asked about dtype because I expected a float column to use the same format for all values, same as numpy.
The display option changes the display, but does not change the dtype:
In [75]: pd.set_option('display.float_format', lambda x: f'{x:.2e}')
In [76]: df
Out[76]:
0
0 9.89e 08
1 3.43e 09
2 5.19e 08
3 4.90e 07
4 2.10e 07
5 3.40e 02
6 4.10e 03
7 1.00e 03
8 8.60e 02
9 1.00e 03
In [77]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 10 non-null float64
dtypes: float64(1)
memory usage: 208.0 bytes
I get the mix of formats if I make the frame from the original list of strings. But now the values are strings, not floats.
In [80]: pd.DataFrame(Out[63])
Out[80]:
0
0 9.888E8
1 3.428E9
2 5.189E8
3 4.9E7
4 2.1E7
5 340.0
6 4100.0
7 1000.0
8 860.0
9 1000.0
In [81]: df1=pd.DataFrame(Out[63])
In [82]: df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 10 non-null object
dtypes: object(1)
memory usage: 208.0 bytes