I have a column in a data frame called MARKET_VALUE that I need to pass to a downstream system in a defined format. MARKET_VALUE, a float, needs to be passed as two integer columns (significand, with no trailing zeros and exp) as follows
MARKET VALUE SIGNIFICAND EXP
6.898806e 09 6898806 3
6.898806e 05 6898806 -1
6.898806e 03 6898806 -3
I contemplated using formatted strings but am convinced there must be a smarter solution. The data frame is large, containing millions of rows, so a solution that doesn't depend on apply would be preferable.
CodePudding user response:
Generate a random pandas dataframe
I use a DataFrame consiting in 1e5 rows (you could try with more to test the bottleneck)
import pandas as pd
import numpy as np
df=pd.DataFrame(np.random.random((100000,2))**10, columns=['random1', 'random2'])
Use .apply method
In this case I use the standard python formatting.
- 8E is the number of digits after point.
- [:-4] to remove the exponential notation and keep only the significand.
- [-3:] to get only the exponential with the sign, then convert it into a int value.
# get the signficand
df.random1.apply(lambda x: f'{x:.8E}'[:-4].replace('.', ''))
# get the exp
df.random1.apply(lambda x: int(f'{x:.0E}'[-3:]))
On my laptop it took less than 100ms.
I am thinking about faster solution (vectorized one), but for now I hope that this can help.