Home > Software engineering >  Pandas Split Scientific Notation into two Columns - Significand and Exponent
Pandas Split Scientific Notation into two Columns - Significand and Exponent

Time:09-22

I have a column in a data frame called MARKET_VALUE that I need to pass to a downstream system in a defined format. MARKET_VALUE, a float, needs to be passed as two integer columns (significand, with no trailing zeros and exp) as follows

MARKET VALUE   SIGNIFICAND    EXP
6.898806e 09       6898806     3
6.898806e 05       6898806    -1
6.898806e 03       6898806    -3

I contemplated using formatted strings but am convinced there must be a smarter solution. The data frame is large, containing millions of rows, so a solution that doesn't depend on apply would be preferable.

CodePudding user response:

Generate a random pandas dataframe

I use a DataFrame consiting in 1e5 rows (you could try with more to test the bottleneck)

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.random((100000,2))**10, columns=['random1', 'random2'])

Use .apply method

In this case I use the standard python formatting.

  • 8E is the number of digits after point.
  • [:-4] to remove the exponential notation and keep only the significand.
  • [-3:] to get only the exponential with the sign, then convert it into a int value.
# get the signficand
df.random1.apply(lambda x: f'{x:.8E}'[:-4].replace('.', ''))
    
# get the exp
df.random1.apply(lambda x: int(f'{x:.0E}'[-3:])) 

On my laptop it took less than 100ms.

I am thinking about faster solution (vectorized one), but for now I hope that this can help.

  • Related