I have a Pandas dataframe which looks as follows:
Age | USA | USA | USA | UK | UK | UK |
---|---|---|---|---|---|---|
Age | male | female | total | male | female | total |
2-year-old | 2 | 3 | 5 | 1 | 1 | 2 |
3-year-old | 8 | 8 | 16 | 7 | 9 | 16 |
In fact I have two row headers (USA male; USA female;..)
CSV-File (test.csv):
;USA;USA;USA;UK;UK;UK
Age;male;female;total;male;female;total
2-year-old;2;3;5;1;1;2
3-year-old;8;8;16;7;9;16
My python-code:
df = pd.read_csv('test.csv',
delimiter=";",
header=[0,1])
df = df.rename(columns={'Unnamed: 0_level_0': 'Age'})
How to pivot pandas dataframe to get the following result?
Age | Country | Gender | frequency |
---|---|---|---|
2-year-old | USA | male | 2 |
2-year-old | USA | female | 3 |
2-year-old | UK | male | 1 |
2-year-old | UK | female | 1 |
3-year-old | USA | male | 8 |
3-year-old | USA | female | 8 |
3-year-old | UK | male | 7 |
3-year-old | UK | female | 9 |
CodePudding user response:
Use DataFrame.set_index
with DataFrame.stack
, also if need remove total add drop
:
df = (df.drop('total', axis=1, level=1)
.set_index(df.columns[0])
.stack([0,1])
.rename_axis(['Age','Country','Gender'])
.reset_index(name='frequency'))
print (df)
Age Country Gender frequency
0 2-year-old UK female 1
1 2-year-old UK male 1
2 2-year-old USA female 3
3 2-year-old USA male 2
4 3-year-old UK female 9
5 3-year-old UK male 7
6 3-year-old USA female 8
7 3-year-old USA male 8
Or:
df = (df.set_index(df.columns[0])
.stack([0,1])
.rename_axis(['Age','Country','Gender'])
.reset_index(name='frequency'))
print (df)
Age Country Gender frequency
0 2-year-old UK female 1
1 2-year-old UK male 1
2 2-year-old UK total 2
3 2-year-old USA female 3
4 2-year-old USA male 2
5 2-year-old USA total 5
6 3-year-old UK female 9
7 3-year-old UK male 7
8 3-year-old UK total 16
9 3-year-old USA female 8
10 3-year-old USA male 8
11 3-year-old USA total 16