Home > OS >  Pandas - Unstack/pivot a dataframe with pandas
Pandas - Unstack/pivot a dataframe with pandas

Time:01-18

I have a dataframe that looks like this:

Column A Column B Category
1 7 A
2 8 A
3 9 B
4 10 B
5 11 C
6 12 C

I would like to write code to produce the following dataframe:

Category A Category B Category C
Column A Column B Column A Column B Column A Column B
1 7 3 9 5 11
2 8 4 10 6 12

I've tried pd.pivot_table, but am not able to figure it out. Can someone help me with this please? Thanks!

CodePudding user response:

You can create a dummy index to use pivot table with:

out = df.pivot_table(
    columns="Category",
    index=df.groupby("Category").cumcount()
)

which has output:

         Column A       Column B
Category        A  B  C        A   B   C
0               1  3  5        7   9  11
1               2  4  6        8  10  12

I don't know if there's any simple way to rearrange the columns to be in your format within pivot_table itself. Here is a way by doing some post processing:

final = out.swaplevel(axis=1).sort_index(axis=1, level=0)

final:

Category        A                 B                 C
         Column A Column B Column A Column B Column A Column B
0               1        7        3        9        5       11
1               2        8        4       10        6       12

CodePudding user response:

The issue is that you cannot identify each row uniquely to be able to apply pivot. To this end, create a "within-group" index as follows.

from io import StringIO
import pandas as pd

# setup sample data
data = StringIO("""
Column A;Column B;Category
1;7;A
2;8;A
3;9;B
4;10;B
5;11;C
6;12;C
"""
)
df = pd.read_csv(data, sep=";")

# assign a within-group index
df['id'] = df.groupby('Category').cumcount()

# now apply pivot
df = df.pivot(index='id', columns='Category', values=['Column A', 'Column B'])

Now, you can apply swaplevel and sort_index to match the desired result

df.swaplevel(axis=1).sort_index(axis=1)
  • Related