I have a dataframe that looks like this:
Column A | Column B | Category |
---|---|---|
1 | 7 | A |
2 | 8 | A |
3 | 9 | B |
4 | 10 | B |
5 | 11 | C |
6 | 12 | C |
I would like to write code to produce the following dataframe:
Category A | Category B | Category C | |||
---|---|---|---|---|---|
Column A | Column B | Column A | Column B | Column A | Column B |
1 | 7 | 3 | 9 | 5 | 11 |
2 | 8 | 4 | 10 | 6 | 12 |
I've tried pd.pivot_table, but am not able to figure it out. Can someone help me with this please? Thanks!
CodePudding user response:
You can create a dummy index to use pivot table with:
out = df.pivot_table(
columns="Category",
index=df.groupby("Category").cumcount()
)
which has output:
Column A Column B
Category A B C A B C
0 1 3 5 7 9 11
1 2 4 6 8 10 12
I don't know if there's any simple way to rearrange the columns to be in your format within pivot_table
itself. Here is a way by doing some post processing:
final = out.swaplevel(axis=1).sort_index(axis=1, level=0)
final:
Category A B C
Column A Column B Column A Column B Column A Column B
0 1 7 3 9 5 11
1 2 8 4 10 6 12
CodePudding user response:
The issue is that you cannot identify each row uniquely to be able to apply pivot
. To this end, create a "within-group" index as follows.
from io import StringIO
import pandas as pd
# setup sample data
data = StringIO("""
Column A;Column B;Category
1;7;A
2;8;A
3;9;B
4;10;B
5;11;C
6;12;C
"""
)
df = pd.read_csv(data, sep=";")
# assign a within-group index
df['id'] = df.groupby('Category').cumcount()
# now apply pivot
df = df.pivot(index='id', columns='Category', values=['Column A', 'Column B'])
Now, you can apply swaplevel
and sort_index
to match the desired result
df.swaplevel(axis=1).sort_index(axis=1)