Given a multiindex columns
a ...
E1 ... E3
g1 g2 g3 ... g1 g2 g3
0 0.548814 0.715189 0.602763 ... 0.437587 0.891773 0.963663
1 0.383442 0.791725 0.528895 ... 0.087129 0.020218 0.832620
2 0.778157 0.870012 0.978618 ... 0.118274 0.639921 0.143353
3 0.944669 0.521848 0.414662 ... 0.568434 0.018790 0.617635
4 0.612096 0.616934 0.943748 ... 0.697631 0.060225 0.666767
5 0.670638 0.210383 0.128926 ... 0.438602 0.988374 0.102045
6 0.208877 0.161310 0.653108 ... 0.158970 0.110375 0.656330
7 0.138183 0.196582 0.368725 ... 0.096098 0.976459 0.468651
8 0.976761 0.604846 0.739264 ... 0.296140 0.118728 0.317983
9 0.414263 0.064147 0.692472 ... 0.093941 0.575946 0.929296
[10 rows x 9 columns]
I would like to apply multiple column filtered by the second level (i.e., E1
, E2
, E3
) to a functions (e.g., ration_type1
,ration_type2
, or can be more in actual implementation).
For example. Assume we are to compute the second level of E1
under the function ration_type1
and ration_type2
. Then we only process the following df
a
E1
g1 g2 g3
0 0.548814 0.715189 0.602763
1 0.383442 0.791725 0.528895
.................
8 0.976761 0.604846 0.739264
9 0.414263 0.064147 0.692472
To generalise all second level, I than rely on the list comprehension as below
for each of the ration_type1
and ration_type2
.
all_df1 = [ration_type1(df.loc[:, (slice(None), dgroup, slice(None))]) for dgroup in [`E1`, `E2`, `E3`]]
all_df2 = [ration_type2(df.loc[:, (slice(None), dgroup, slice(None))]) for dgroup in [`E1`, `E2`, `E3`]]
before concat it back to the orignal df
.
However, I wonder whether there is more elegant and compact way than the list comprehension
approach. This is because, in real life implementation, there can be more ration function.
The full code is as below
import numpy as np
import pandas as pd
np.random.seed(0)
arr = np.random.rand(10,9)
tuples = [('a', 'E1', 'g1'), ('a', 'E1', 'g2'), ('a', 'E1', 'g3'), ('a', 'E2', 'g1'), ('a', 'E2', 'g2'),
('a', 'E2', 'g3'), ('a', 'E3', 'g1'), ('a', 'E3', 'g2'), ('a', 'E3', 'g3')]
df = pd.DataFrame(data=arr, columns=pd.MultiIndex.from_tuples(tuples))
print(df)
def ration_type1(df):
"""
(g3 g2)/g1
# Ugly way since have to convert to numpy 1st
"""
print(df)
dration = 'ration_type1'
l1, l2, _ = df.columns.tolist()[0]
total = df.loc[:, (slice(None), slice(None), 'g2')].to_numpy() \
df.loc[:, (slice(None), slice(None), 'g3')].to_numpy()
arr = total / df.loc[:, (slice(None), slice(None), 'g1')].to_numpy()
return pd.DataFrame(data=arr, columns=pd.MultiIndex.from_tuples([(l1, l2, dration)]))
def ration_type2(df):
"""
(g3 g2 g1)/g1
# Ugly way since have to convert to numpy 1st
"""
dration = 'ration_type2'
l1, l2, _ = df.columns.tolist()[0]
total = df.loc[:, (slice(None), slice(None), 'g1')].to_numpy() \
df.loc[:, (slice(None), slice(None), 'g2')].to_numpy() \
df.loc[:, (slice(None), slice(None), 'g3')].to_numpy()
arr = total / df.loc[:, (slice(None), slice(None), 'g1')].to_numpy()
return pd.DataFrame(data=arr, columns=pd.MultiIndex.from_tuples([(l1, l2, dration)]))
level1_name = list(set(df.columns.get_level_values(1)))
all_df1 = [ration_type1(df.loc[:, (slice(None), dgroup, slice(None))]) for dgroup in level1_name]
all_df2 = [ration_type2(df.loc[:, (slice(None), dgroup, slice(None))]) for dgroup in level1_name]
df1 = pd.concat(all_df1, axis=1)
df2 = pd.concat(all_df2, axis=1)
df=pd.concat([df,df1,df2],axis=1)
Expected output.
a ...
E1 ... E2 E3
g1 g2 g3 ... ration_type2 ration_type2 ration_type2
0 0.548814 0.715189 0.602763 ... 3.401458 2.962896 5.240151
1 0.383442 0.791725 0.528895 ... 4.444124 2.754497 10.788191
2 0.778157 0.870012 0.978618 ... 3.375653 2.554145 7.622516
3 0.944669 0.521848 0.414662 ... 1.991363 5.650758 2.119612
4 0.612096 0.616934 0.943748 ... 3.549735 2.168255 2.042087
5 0.670638 0.210383 0.128926 ... 1.505949 3.960760 3.486126
6 0.208877 0.161310 0.653108 ... 4.899035 3.806001 5.822965
7 0.138183 0.196582 0.368725 ... 5.091008 2.138921 16.037821
8 0.976761 0.604846 0.739264 ... 2.376088 11.283905 2.474676
9 0.414263 0.064147 0.692472 ... 2.826423 2.391873 17.023361
[10 rows x 15 columns]
I am thinking to do something like using apply
# function for prepending 'Geek'
def multiply_by_2(number):
return 2 * number
# executing the function
df[["Integers", "Float"]] = df[["Integers", "Float"]].apply(multiply_by_2)
But, I am having difficulties (due to my limited knowledge)to do it since my example involving multiindex columns
CodePudding user response:
Not so easy if working with MultiIndex
- solution filter levels with rename g
values to ration_type1, ration_type2
for possible divide MultiIndex
DataFrames:
idx = pd.IndexSlice
c = {'g1':'ration_type1','g2':'ration_type1','g3':'ration_type1'}
df1 = df.loc[:, idx[:,:,['g3','g2']]].rename(columns=c).groupby(level=[0,1,2], axis=1).sum()
df11 = df1.div(df.xs('g1', level=2, axis=1, drop_level=False).rename(columns=c))
c1 = {'g1':'ration_type2','g2':'ration_type2','g3':'ration_type2'}
df2 = df.rename(columns=c1).groupby(level=[0,1,2], axis=1).sum()
df22 = df2.div(df.xs('g1', level=2, axis=1, drop_level=False).rename(columns=c1))
df=pd.concat([df,df11,df22],axis=1)
Simplier is reshape first:
df1 = df.stack([0,1])
df1['ration_type1'] = df1[['g2','g3']].sum(axis=1).div(df1['g1'])
df1['ration_type2'] = df1.sum(axis=1).div(df1['g1'])
print(df1)
g1 g2 g3 ration_type1 ration_type2
0 a E1 0.548814 0.715189 0.602763 2.401458 7.777186
E2 0.544883 0.423655 0.645894 1.962896 6.565312
E3 0.437587 0.891773 0.963663 4.240151 14.929992
1 a E1 0.383442 0.791725 0.528895 3.444124 13.426259
E2 0.568045 0.925597 0.071036 1.754497 5.843159
E3 0.087129 0.020218 0.832620 9.788191 123.129174
2 a E1 0.778157 0.870012 0.978618 2.375653 6.428577
E2 0.799159 0.461479 0.780529 1.554145 4.498872
E3 0.118274 0.639921 0.143353 6.622516 63.615316
3 a E1 0.944669 0.521848 0.414662 0.991363 3.040793
E2 0.264556 0.774234 0.456150 4.650758 23.230266
E3 0.568434 0.018790 0.617635 1.119612 4.089254
4 a E1 0.612096 0.616934 0.943748 2.549735 7.715318
E2 0.681820 0.359508 0.437032 1.168255 3.881690
E3 0.697631 0.060225 0.666767 1.042087 3.535837
5 a E1 0.670638 0.210383 0.128926 0.505949 2.260380
E2 0.315428 0.363711 0.570197 2.960760 13.347233
E3 0.438602 0.988374 0.102045 2.486126 9.154429
6 a E1 0.208877 0.161310 0.653108 3.899035 23.565714
E2 0.253292 0.466311 0.244426 2.806001 14.884143
E3 0.158970 0.110375 0.656330 4.822965 36.161882
7 a E1 0.138183 0.196582 0.368725 4.091008 34.696743
E2 0.820993 0.097101 0.837945 1.138921 3.526168
E3 0.096098 0.976459 0.468651 15.037821 172.521382
8 a E1 0.976761 0.604846 0.739264 1.376088 3.784915
E2 0.039188 0.282807 0.120197 10.283905 273.710140
E3 0.296140 0.118728 0.317983 1.474676 7.454332
9 a E1 0.414263 0.064147 0.692472 1.826423 7.235273
E2 0.566601 0.265389 0.523248 1.391873 4.848404
E3 0.093941 0.575946 0.929296 16.023361 187.592593
Last reshape to original MultiIndex
:
df = df1.unstack([1,2]).reorder_levels([1,2,0], axis=1)