I have a process that takes a dataframe and turns a set of wide pandas columns into two long pandas columns, like so:
original wide:
wide = pd.DataFrame(
{
'id':['foo'],
'a':[1],
'b':[2],
'c':[3],
'x':[4],
'y':[5],
'z':[6]
}
)
wide
id a b c x y z
0 foo 1 2 3 4 5 6
desired long:
lon = pd.DataFrame(
{
'id':['foo','foo','foo','foo','foo','foo'],
'type':['a','b','c','x','y','z'],
'val':[1,2,3,4,5,6]
}
)
lon
id type val
0 foo a 1
1 foo b 2
2 foo c 3
3 foo x 4
4 foo y 5
5 foo z 6
I found out a way to do this by chaining the following pandas assignments
(wide
.set_index('id')
.T
.unstack()
.reset_index()
.rename(columns={'level_1':'type',0:'val'})
)
id type val
0 foo a 1
1 foo b 2
2 foo c 3
3 foo x 4
4 foo y 5
5 foo z 6
But when I scale my data this seems to be posing issues for me. I was just looking for an alternative solution to what I have already accomplished that is perhaps faster/more computationally efficient.
CodePudding user response:
I think you are looking for the pandas melt function.
Assuming your original dataframe is called wide
, then:
df = pd.melt(wide, id_vars="id")
df.columns = ['id', 'type', 'val']
print(df)
Output:
id type val
0 foo a 1
1 foo b 2
2 foo c 3
3 foo x 4
4 foo y 5
5 foo z 6