Home > front end >  pandas take an average of two string values in a cell
pandas take an average of two string values in a cell

Time:12-22

The below sample has some bid/ask prices. What is a good way to calculate the average (mid) in each cell for the entire df?

#---sample df
prices = pd.DataFrame({
    'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
    '1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
    '3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
    '6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
    '12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
    })

so that the below would return 0.645 , for example.

prices.iat[0,1]
Out[112]: '0.67/0.62'

CodePudding user response:

You can split all of them on the / then take the mean. Setting the non-numeric column as index first allows you to use applymap to do the entire rest of the df at once.

import numpy as np
import pandas as pd
prices = pd.DataFrame({
    'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
    '1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
    '3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
    '6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
    '12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
    })

prices = prices.set_index('tenor').applymap(lambda x: np.mean(list(map(float,x.split('/'))))).reset_index()

Output

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575  1.760
4   30Y  1.525  1.625  1.735  1.910

CodePudding user response:

While applymap is nice and simple, this is unfortunately slow.

A more efficient, vectorial solution, would be to split and explode before groupby mean:

(prices.set_index('tenor')
       .apply(lambda c: c.str.split('/').explode())
       .astype(float)
       .groupby(level=0, sort=False).mean()
)

Output:

          1M     3M     6M    12M
tenor                            
5Y     0.645  0.765  0.865  1.105
10Y    1.075  1.165  1.265  1.435
15Y    1.205  1.295  1.395  1.575
20Y    1.355  1.465  1.575  1.760
30Y    1.525  1.625  1.735  1.910

This is ~8 times faster on 50k rows

NB. If you have many more columns than rows, you can inverse the logic to work on the other axis

CodePudding user response:

For each column, you can split the string by / and run a lambda operation to get the mean

prices["1M"].str.split('/').apply(lambda x : (float(x[0]) float(x[1]))/2)

0    0.645
1    1.075
2    1.205
3    1.355
4    1.525
Name: 1M, dtype: float64

CodePudding user response:

Here's another solution:

x = prices.iloc[:,1:].unstack().swaplevel(1,0).str.split('/').explode().astype(float)
temp1 = x.groupby(x.index).mean().reindex(pd.MultiIndex.from_tuples(x.index.drop_duplicates()))
prices.iloc[:,1:] = temp1.unstack()[prices.iloc[:,1:].columns]

Output:

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575   1.76
4   30Y  1.525  1.625  1.735   1.91
  • Related