I want to reshape this table:
Back Vowels | x | x | x | y | y | y |
---|---|---|---|---|---|---|
a: | -26.69 | -40.06 | -49.59 | -15.56 | -7.5 | -11.89 |
o: | ... | ... | ... | ... | ... | ... |
to the following format:
Back Vowels | x | y |
---|---|---|
a: | -26.69 | -15.56 |
a: | -40.06 | -7.5 |
a: | -49.59 | -11.89 |
o: | ... | .... |
What pandas function should I use?
Thank you!:)
I am having trouble formulating the right question, I looked into pivot_table()
, melt()
, and stack()
, but they seem to be doing other stuff.
CodePudding user response:
I don't know exactly how you got multiple columns with same name, so i supposed that come from some kind of merge. This input make the following output
df = pd.DataFrame({'Back Vowels': ["a:", "o:"],
'x': [-26.69,"..."],
'y': [-15.56,"..."],})
df2 = pd.DataFrame({'Back Vowels': ["a:", "o:"],
'x': [-40.06,"..."],
'y': [-11.89,"..."],})
df3 = pd.DataFrame({'Back Vowels': ["a:", "o:"],
'x': [-49.59,"..."],
'y': [-7.5,"..."],})
df = pd.merge(pd.merge(df,df2,'inner','Back Vowels'),df3,'inner', 'Back Vowels')
df= df.rename(columns={"x_x": "x", "x_y": "x", "y_x": "y" , "y_y": "y"})
Output:
Back Vowels x y x y x y
0 a: -26.69 -15.56 -40.06 -11.89 -49.59 -7.5
1 o: ... ... ... ... ... ...
This function will merge columns with same names together:
def same_merge(x): return ','.join(x[x.notnull()].astype(str))
Apply the function :
df_new = df.groupby(level=0, axis=1).apply(lambda x: x.apply(same_merge, axis=1))
Make the columns be arrays :
df_new['x'] = df_new['x'].str.split(",")
df_new['y'] = df_new['y'].str.split(",")
Finally explode the columns to get desired output:
df_final = df_new.explode(list('xy')).reset_index(drop=True)
Final output:
Back Vowels x y
0 a: -26.69 -15.56
1 a: -40.06 -11.89
2 a: -49.59 -7.5
3 o: ... ...
4 o: ... ...
5 o: ... ...
CodePudding user response:
TLDR
Transpose
the dataframe, aggregate
x
and y
into lists, then transpose
back and explode
:
df = df.set_index('Back Vowels').T.groupby(level=0).agg(list).T.explode(['x', 'y'])
# x y
# Back Vowels
# a: -26.69 -15.56
# a: -40.06 -7.5
# a: -49.59 -11.89
# o: 6.69 5.56
# o: 0.06 0.5
# o: 9.59 1.89
Details
Given this dataframe:
df = pd.DataFrame(
data=[['a:', -26.69, -40.06, -49.59, -15.56, -7.5, -11.89], ['o:', 6.69, 0.06, 9.59, 5.56, 0.5, 1.89]],
columns=['Back Vowels', *'xxx', *'yyy'],
)
# Back Vowels x x x y y y
# 0 a: -26.69 -40.06 -49.59 -15.56 -7.5 -11.89
# 1 o: 6.69 0.06 9.59 5.56 0.5 1.89
Work with the transposed dataframe:
df = df.set_index('Back Vowels').T
# Back Vowels a: o:
# x -26.69 6.69
# x -40.06 0.06
# x -49.59 9.59
# y -15.56 5.56
# y -7.50 0.50
# y -11.89 1.89
Group by the index (level 0) and aggregate
x
and y
into lists:
df = df.groupby(level=0).agg(list)
# Back Vowels a: o:
# x [-26.69, -40.06, -49.59] [6.69, 0.06, 9.59]
# y [-15.56, -7.5, -11.89] [5.56, 0.5, 1.89]
Then transpose back and explode
the x
and y
lists into rows:
df = df.T.explode(['x', 'y'])
# x y
# Back Vowels
# a: -26.69 -15.56
# a: -40.06 -7.5
# a: -49.59 -11.89
# o: 6.69 5.56
# o: 0.06 0.5
# o: 9.59 1.89