Home > Mobile >  split several columns with tuples into separate columns
split several columns with tuples into separate columns

Time:03-12

So I know there are a lot of posts on how to split a single column of tuples into several columns. But I haven't seen any on how to split several columns of tuples at the same time. I've tried doing a nested apply:

spl.apply(lambda col: spl[col].apply(pd.Series)) 

but it doesn't work.

spl = pd.DataFrame({('a', 'b') : [(i, i 1) for i in range(3)], 
                    ('a', 'c'): [(2*i, 2*i 1) for i in range(3)], 
                    ('e', 'b'): [(2*i, 2*i 1) for i in range(3)]})
>>> spl
                a        e
         b      c        b
0   (0, 1)  (0, 1)  (0, 1)
1   (1, 2)  (2, 3)  (2, 3)
2   (2, 3)  (4, 5)  (4, 5)

The target is:

                a          e
    b0  b1  c0  c1   b0   b1
0   0   1   0    1    0    1
1   1   2   2    3    2    3
2   2   3   4    5    4    5

CodePudding user response:

Here's a solution:

new_df = pd.concat([pd.DataFrame(spl[c].tolist()).add_prefix(c[-1]) for c in spl], axis=1)
new_df.columns = pd.MultiIndex.from_arrays([np.repeat(spl.columns.get_level_values(0), 2), new_df.columns])

Output:

>>> new_df
   a           e   
  b0 b1 c0 c1 b0 b1
0  0  1  0  1  0  1
1  1  2  2  3  2  3
2  2  3  4  5  4  5

One-big-liner :)

new_df = pd.concat([pd.DataFrame(spl[c].tolist()).add_prefix(c[-1]) for c in spl], axis=1).pipe(lambda x: x.set_axis(pd.MultiIndex.from_arrays([np.repeat(spl.columns.get_level_values(0), 2), x.columns]), axis=1))

CodePudding user response:

We could use unstack DataFrame constructor pivot. We can't pivot a Series object, so we reset_index first.

To fix duplicate column names, we could reformat column names by creating a MultiIndex object by combining original column name level 1 with the tuple index:

tmp = spl.unstack()
out = (pd.pivot(pd.DataFrame(tmp.tolist(), index=tmp.index).reset_index(), 
                ['level_2'], ['level_0','level_1'], [0,1])
       .rename_axis(columns=[None, None, None], index=[None]))
out.columns = pd.MultiIndex.from_arrays((out.columns.get_level_values(1), 
                                         out.columns.get_level_values(2)   
                                         out.columns.get_level_values(0).astype(str)))
out = out.sort_index(axis=1)

Output:

   a           e   
  b0 b1 c0 c1 b0 b1
0  0  1  0  1  0  1
1  1  2  2  3  2  3
2  2  3  4  5  4  5

As @sammywemmy suggested in the comments to this question, rebuilding the DataFrame from vanilla Python is indeed easier than the mess above.

from itertools import chain, product
out = pd.DataFrame([list(chain.from_iterable(lst)) for lst in spl.to_numpy().tolist()], 
                   columns=pd.MultiIndex.from_tuples((i, j str(k)) for (i,j), k in 
                                                     product(spl.columns, [0,1])))
  • Related