Given a tuple of two variable ('a','E1_g1')
, I would like to expand it into tuple of three variable ('a','E1', 'g1')
.
The following code should answer the objective
import numpy as np
import pandas as pd
np.random.seed(0)
arr = np.random.randint(5, size=(2, 9))
_names = ['a','a','a','a','a','a','a','a','a']
_idx = ['E1_g1','E1_g2','E1_g3',
'E2_g1','E2_g2','E2_g3',
'E3_g1','E3_g2','E3_g3']
columns = pd.MultiIndex.from_arrays([_names, _idx])
df= pd.DataFrame(data=arr, columns=columns)
ntuple=[]
for dg in df.columns:
A,B=dg
f,r=B.split('_')
ntuple.append((A,f,r))
# df.colums=pd.MultiIndex.from_arrays(ntuple) # WIP since I still got an error here
But, I wonder whether there is another way, that perhaps can be improve especially the step within the for-loops
.
CodePudding user response:
Not the cleanest, but this is what I was able to do:
idx = df.columns.to_flat_index()
pd.MultiIndex.from_tuples(map(tuple, idx.str.join("_").str.split("_")))
Output:
MultiIndex([('a', 'E1', 'g1'),
('a', 'E1', 'g2'),
('a', 'E1', 'g3'),
('a', 'E2', 'g1'),
('a', 'E2', 'g2'),
('a', 'E2', 'g3'),
('a', 'E3', 'g1'),
('a', 'E3', 'g2'),
('a', 'E3', 'g3')],
)
However, since the dtype
is object, you really can't get much faster. In fact, a plain comprehension is a bit faster:
n = len(df.columns)
lvl_0, lvl_1 = df.columns.levels
[(a, b, c) for a, (b, c) in zip(*lvl_0*n, lvl_1.str.split("_"))]
Performance:
In [4]: %timeit pd.MultiIndex.from_tuples(map(tuple, idx.str.join("_").str.split("_")))
914 µs ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [5]: %timeit pd.MultiIndex.from_tuples([(a, b, c) for a, (b, c) in zip(*lvl_0*n, lvl_1.str.split("_"))])
877 µs ± 53.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The only real benefit is the relative simplicity of the syntax in the first example.
CodePudding user response:
You can try this,
new_list = [tuple([_names[i]] _idx[i].split("_")) for i in range(len(_idx))]
Output -
[('a', 'E1', 'g1'),
('a', 'E1', 'g2'),
('a', 'E1', 'g3'),
('a', 'E2', 'g1'),
('a', 'E2', 'g2'),
('a', 'E2', 'g3'),
('a', 'E3', 'g1'),
('a', 'E3', 'g2'),
('a', 'E3', 'g3')]