Home > front end >  How to reshape a dataframe with identical column names
How to reshape a dataframe with identical column names

Time:02-04

I want to reshape this table:

Back Vowels x x x y y y
a: -26.69 -40.06 -49.59 -15.56 -7.5 -11.89
o: ... ... ... ... ... ...

to the following format:

Back Vowels x y
a: -26.69 -15.56
a: -40.06 -7.5
a: -49.59 -11.89
o: ... ....

What pandas function should I use?

Thank you!:)

I am having trouble formulating the right question, I looked into pivot_table(), melt(), and stack(), but they seem to be doing other stuff.

CodePudding user response:

I don't know exactly how you got multiple columns with same name, so i supposed that come from some kind of merge. This input make the following output

df = pd.DataFrame({'Back Vowels': ["a:", "o:"],
                   'x': [-26.69,"..."],
                   'y': [-15.56,"..."],})

df2 = pd.DataFrame({'Back Vowels': ["a:", "o:"],
                   'x': [-40.06,"..."],
                   'y': [-11.89,"..."],})

df3 = pd.DataFrame({'Back Vowels': ["a:", "o:"],
                   'x': [-49.59,"..."],
                   'y': [-7.5,"..."],})

df = pd.merge(pd.merge(df,df2,'inner','Back Vowels'),df3,'inner', 'Back Vowels')

df= df.rename(columns={"x_x": "x", "x_y": "x", "y_x": "y" , "y_y": "y"})

Output:


     Back Vowels       x    y           x     y        x    y
0         a:      -26.69    -15.56  -40.06  -11.89  -49.59  -7.5
1         o:         ...    ...      ...    ...      ...    ...

This function will merge columns with same names together:

def same_merge(x): return ','.join(x[x.notnull()].astype(str))

Apply the function :

df_new = df.groupby(level=0, axis=1).apply(lambda x: x.apply(same_merge, axis=1))

Make the columns be arrays :

df_new['x'] = df_new['x'].str.split(",")
df_new['y'] = df_new['y'].str.split(",")

Finally explode the columns to get desired output:

df_final = df_new.explode(list('xy')).reset_index(drop=True)

Final output:


   Back Vowels       x         y
0    a:           -26.69    -15.56
1    a:           -40.06    -11.89
2    a:           -49.59    -7.5
3    o:              ...    ...
4    o:              ...    ...
5    o:              ...    ...
 

CodePudding user response:

TLDR

Transpose the dataframe, aggregate x and y into lists, then transpose back and explode:

df = df.set_index('Back Vowels').T.groupby(level=0).agg(list).T.explode(['x', 'y'])

#                  x       y
# Back Vowels               
# a:          -26.69  -15.56
# a:          -40.06    -7.5
# a:          -49.59  -11.89
# o:            6.69    5.56
# o:            0.06     0.5
# o:            9.59    1.89

Details

Given this dataframe:

df = pd.DataFrame(
    data=[['a:', -26.69, -40.06, -49.59, -15.56, -7.5, -11.89], ['o:', 6.69, 0.06, 9.59, 5.56, 0.5, 1.89]],
    columns=['Back Vowels', *'xxx', *'yyy'],
)

#   Back Vowels      x      x      x      y    y      y
# 0          a: -26.69 -40.06 -49.59 -15.56 -7.5 -11.89
# 1          o:   6.69   0.06   9.59   5.56  0.5   1.89

Work with the transposed dataframe:

df = df.set_index('Back Vowels').T

# Back Vowels     a:    o:
# x           -26.69  6.69
# x           -40.06  0.06
# x           -49.59  9.59
# y           -15.56  5.56
# y            -7.50  0.50
# y           -11.89  1.89

Group by the index (level 0) and aggregate x and y into lists:

df = df.groupby(level=0).agg(list)

# Back Vowels                        a:                  o:
# x            [-26.69, -40.06, -49.59]  [6.69, 0.06, 9.59]
# y              [-15.56, -7.5, -11.89]   [5.56, 0.5, 1.89]

Then transpose back and explode the x and y lists into rows:

df = df.T.explode(['x', 'y'])

#                  x       y
# Back Vowels               
# a:          -26.69  -15.56
# a:          -40.06    -7.5
# a:          -49.59  -11.89
# o:            6.69    5.56
# o:            0.06     0.5
# o:            9.59    1.89
  • Related