The below sample has some bid/ask prices. What is a good way to calculate the average (mid) in each cell for the entire df?
#---sample df
prices = pd.DataFrame({
'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
'1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
'3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
'6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
'12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
})
so that the below would return 0.645
, for example.
prices.iat[0,1]
Out[112]: '0.67/0.62'
CodePudding user response:
You can split all of them on the /
then take the mean. Setting the non-numeric column as index first allows you to use applymap
to do the entire rest of the df at once.
import numpy as np
import pandas as pd
prices = pd.DataFrame({
'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
'1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
'3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
'6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
'12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
})
prices = prices.set_index('tenor').applymap(lambda x: np.mean(list(map(float,x.split('/'))))).reset_index()
Output
tenor 1M 3M 6M 12M
0 5Y 0.645 0.765 0.865 1.105
1 10Y 1.075 1.165 1.265 1.435
2 15Y 1.205 1.295 1.395 1.575
3 20Y 1.355 1.465 1.575 1.760
4 30Y 1.525 1.625 1.735 1.910
CodePudding user response:
While applymap
is nice and simple, this is unfortunately slow.
A more efficient, vectorial solution, would be to split
and explode
before groupby
mean
:
(prices.set_index('tenor')
.apply(lambda c: c.str.split('/').explode())
.astype(float)
.groupby(level=0, sort=False).mean()
)
Output:
1M 3M 6M 12M
tenor
5Y 0.645 0.765 0.865 1.105
10Y 1.075 1.165 1.265 1.435
15Y 1.205 1.295 1.395 1.575
20Y 1.355 1.465 1.575 1.760
30Y 1.525 1.625 1.735 1.910
This is ~8 times faster on 50k rows
NB. If you have many more columns than rows, you can inverse the logic to work on the other axis
CodePudding user response:
For each column, you can split the string by /
and run a lambda operation to get the mean
prices["1M"].str.split('/').apply(lambda x : (float(x[0]) float(x[1]))/2)
0 0.645
1 1.075
2 1.205
3 1.355
4 1.525
Name: 1M, dtype: float64
CodePudding user response:
Here's another solution:
x = prices.iloc[:,1:].unstack().swaplevel(1,0).str.split('/').explode().astype(float)
temp1 = x.groupby(x.index).mean().reindex(pd.MultiIndex.from_tuples(x.index.drop_duplicates()))
prices.iloc[:,1:] = temp1.unstack()[prices.iloc[:,1:].columns]
Output:
tenor 1M 3M 6M 12M
0 5Y 0.645 0.765 0.865 1.105
1 10Y 1.075 1.165 1.265 1.435
2 15Y 1.205 1.295 1.395 1.575
3 20Y 1.355 1.465 1.575 1.76
4 30Y 1.525 1.625 1.735 1.91