Home > Software design >  Excellent way of expanding tuple separated by underscore in Python
Excellent way of expanding tuple separated by underscore in Python

Time:04-25

Given a tuple of two variable ('a','E1_g1'), I would like to expand it into tuple of three variable ('a','E1', 'g1').

The following code should answer the objective

import numpy as np
import pandas as pd
np.random.seed(0)
arr = np.random.randint(5, size=(2, 9))

_names = ['a','a','a','a','a','a','a','a','a']
_idx = ['E1_g1','E1_g2','E1_g3',
        'E2_g1','E2_g2','E2_g3',
        'E3_g1','E3_g2','E3_g3']
columns = pd.MultiIndex.from_arrays([_names, _idx])

df= pd.DataFrame(data=arr, columns=columns)

ntuple=[]
for dg in df.columns:
    A,B=dg
    f,r=B.split('_')
    ntuple.append((A,f,r))

# df.colums=pd.MultiIndex.from_arrays(ntuple) # WIP since I still got an error here

But, I wonder whether there is another way, that perhaps can be improve especially the step within the for-loops.

CodePudding user response:

Not the cleanest, but this is what I was able to do:

idx = df.columns.to_flat_index()
pd.MultiIndex.from_tuples(map(tuple, idx.str.join("_").str.split("_")))

Output:

MultiIndex([('a', 'E1', 'g1'),
            ('a', 'E1', 'g2'),
            ('a', 'E1', 'g3'),
            ('a', 'E2', 'g1'),
            ('a', 'E2', 'g2'),
            ('a', 'E2', 'g3'),
            ('a', 'E3', 'g1'),
            ('a', 'E3', 'g2'),
            ('a', 'E3', 'g3')],
           )

However, since the dtype is object, you really can't get much faster. In fact, a plain comprehension is a bit faster:

n = len(df.columns)
lvl_0, lvl_1 = df.columns.levels

[(a, b, c) for a, (b, c) in zip(*lvl_0*n, lvl_1.str.split("_"))]

Performance:

In [4]: %timeit pd.MultiIndex.from_tuples(map(tuple, idx.str.join("_").str.split("_")))
914 µs ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit pd.MultiIndex.from_tuples([(a, b, c) for a, (b, c) in zip(*lvl_0*n, lvl_1.str.split("_"))])
877 µs ± 53.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The only real benefit is the relative simplicity of the syntax in the first example.

CodePudding user response:

You can try this,

new_list = [tuple([_names[i]]   _idx[i].split("_")) for i in range(len(_idx))]

Output -

[('a', 'E1', 'g1'),
 ('a', 'E1', 'g2'),
 ('a', 'E1', 'g3'),
 ('a', 'E2', 'g1'),
 ('a', 'E2', 'g2'),
 ('a', 'E2', 'g3'),
 ('a', 'E3', 'g1'),
 ('a', 'E3', 'g2'),
 ('a', 'E3', 'g3')]
  • Related