Home > Enterprise >  pandas to pivot or melt data
pandas to pivot or melt data

Time:07-26

I have a data frame looks like this:

{'year1_ID': [Id1, Id2], 'year1_Name':['Name1', 'Name2'], 'year1_Info': ['Info1','Info2'],'year1_AddInfo':['AddInfo1','AddInfo2'], 
'year2_ID': [Id3, Id4], 'year2_Name':['Name3', 'Name4'], 'year2_Info': ['Info3','Info4'],'year2_AddInfo':['AddInfo3','AddInfo4'],
'year3_ID': [Id5, Id6], 'year3_Name':['Name5', 'Name6'], 'year3_Info': ['Info5','Info6'],'year3_AddInfo':['AddInfo5','AddInfo6']}
year1   year1   year1   year1     year2   year2  year2   year2    year3  year3  year3  year3
ID      Name   Info   AddInfo     Id     Name   Info    AddInfo    Id     Name   Info   AddInfo
Id1     Name1  Info1  AddInfo1    Id3    Name3  Info3   AddInfo3   Id5    Name5  Info5  AddInfo5
Id2     Name2  Info2  AddInfo2    Id4    Name4  Info4   AddInfo4   Id6    Name6  Info6  AddInfo6

The ID, Name, Info and AddInfo is basically another headers that repeats across multiple years

I would like to convert this data frame into this:

Year  ID  Name  Info  AddInfo
year1 Id1 Name1 Info1 AddInfo1
year1 Id2 Name2 Info2 AddInfo2
year2 Id3 Name3 Info3 AddInfo3
year2 Id4 Name4 Info4 AddInfo4
year3 Id5 Name5 Info5 AddInfo5
year3 Id6 Name6 Info6 AddInfo6

I'm trying pd.melt() and pd.stack() but I haven't able to get it quite right.

CodePudding user response:

Given the provided DataFrame, you can set a MultiIndex and stack:

(df
 .set_axis(pd.MultiIndex.from_arrays(zip(*df.columns.str.split('_', n=1))), axis=1)
 .stack(0).rename_axis((None, 'Year'))
 .reset_index('Year')
 #.reset_index(drop=True)  # uncomment to have a clean index
)

output:

    Year   AddInfo   ID   Info   Name
0  year1  AddInfo1  Id1  Info1  Name1
0  year2  AddInfo3  Id3  Info3  Name3
0  year3  AddInfo5  Id5  Info5  Name5
1  year1  AddInfo2  Id2  Info2  Name2
1  year2  AddInfo4  Id4  Info4  Name4
1  year3  AddInfo6  Id6  Info6  Name6

CodePudding user response:

Here's an easy way:

df = pd.DataFrame(data)
df.columns = df.columns.str.split('_', expand=True)
df.stack(0).sort_index(level=1)

Output:

          AddInfo   ID   Info   Name
0 year1  AddInfo1  Id1  Info1  Name1
1 year1  AddInfo2  Id2  Info2  Name2
0 year2  AddInfo3  Id3  Info3  Name3
1 year2  AddInfo4  Id4  Info4  Name4
0 year3  AddInfo5  Id5  Info5  Name5
1 year3  AddInfo6  Id6  Info6  Name6

Removing the 0 and 1 then reset_index gives:

df.stack(0).sort_index(level=1).droplevel(0).reset_index()

Output:

   index   AddInfo   ID   Info   Name
0  year1  AddInfo1  Id1  Info1  Name1
1  year1  AddInfo2  Id2  Info2  Name2
2  year2  AddInfo3  Id3  Info3  Name3
3  year2  AddInfo4  Id4  Info4  Name4
4  year3  AddInfo5  Id5  Info5  Name5
5  year3  AddInfo6  Id6  Info6  Name6
  • Related