pandas take an average of two string values in a cell-CodePudding

The below sample has some bid/ask prices. What is a good way to calculate the average (mid) in each cell for the entire df?

#---sample df
prices = pd.DataFrame({
    'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
    '1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
    '3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
    '6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
    '12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
    })

so that the below would return 0.645 , for example.

prices.iat[0,1]
Out[112]: '0.67/0.62'

CodePudding user response：

You can split all of them on the / then take the mean. Setting the non-numeric column as index first allows you to use applymap to do the entire rest of the df at once.

import numpy as np
import pandas as pd
prices = pd.DataFrame({
    'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
    '1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
    '3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
    '6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
    '12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
    })

prices = prices.set_index('tenor').applymap(lambda x: np.mean(list(map(float,x.split('/'))))).reset_index()

Output

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575  1.760
4   30Y  1.525  1.625  1.735  1.910

CodePudding user response：

While applymap is nice and simple, this is unfortunately slow.

A more efficient, vectorial solution, would be to split and explode before groupby mean:

(prices.set_index('tenor')
       .apply(lambda c: c.str.split('/').explode())
       .astype(float)
       .groupby(level=0, sort=False).mean()
)

Output:

          1M     3M     6M    12M
tenor                            
5Y     0.645  0.765  0.865  1.105
10Y    1.075  1.165  1.265  1.435
15Y    1.205  1.295  1.395  1.575
20Y    1.355  1.465  1.575  1.760
30Y    1.525  1.625  1.735  1.910

This is ~8 times faster on 50k rows

NB. If you have many more columns than rows, you can inverse the logic to work on the other axis

CodePudding user response：

For each column, you can split the string by / and run a lambda operation to get the mean

prices["1M"].str.split('/').apply(lambda x : (float(x[0]) float(x[1]))/2)

0    0.645
1    1.075
2    1.205
3    1.355
4    1.525
Name: 1M, dtype: float64

CodePudding user response：

Here's another solution:

x = prices.iloc[:,1:].unstack().swaplevel(1,0).str.split('/').explode().astype(float)
temp1 = x.groupby(x.index).mean().reindex(pd.MultiIndex.from_tuples(x.index.drop_duplicates()))
prices.iloc[:,1:] = temp1.unstack()[prices.iloc[:,1:].columns]

Output:

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575   1.76
4   30Y  1.525  1.625  1.735   1.91