Pandas/Pyspark - How to melt multiple variable and value columns into one variable and one value col-CodePudding

I have the following table:

df = pd.DataFrame(
{
    'metric_1': {0: 'a', 1: 'b', 2: 'c'},
    'value_1': {0: 1, 1: 3, 2: 5},
    'metric_2': {0: 'x', 1: 'y', 2: 'z'},
    'value_2': {0: 2, 1: 4, 2: 6},
    'metric_3': {0: 'j', 1: 'k', 2: 'l'},
    'value_3': {0: 7, 1: 8, 2: 9}
})

I am trying to melt the table so it looks like this instead:

I have tried using melt as follows:

 pd.melt(
    df,
    id_vars=['metric_1', 'metric_2', 'metric_3'],
    value_vars=['value_1', 'value_2', 'value_3']
)

but end up something completely different.

Thank you in advance for any help I may receive on this. I'm also open to solutions via Pyspark.

CodePudding user response：

Use pd.wide_to_long, you are trying to do simultaneous melting:

pd.wide_to_long(df.reset_index(), ['metric', 'value'], 'index', 'No', '_', '\d ')

Output:

         metric  value
index No              
0     1       a      1
1     1       b      3
2     1       c      5
0     2       x      2
1     2       y      4
2     2       z      6
0     3       j      7
1     3       k      8
2     3       l      9

CodePudding user response：

You can use stack with some column manipulation.

df.columns = df.columns.str.split("_", expand=True)
df.stack() # .sort_index(level=1) if you care about that

    metric  value
0 1      a      1
  2      x      2
  3      j      7
1 1      b      3
  2      y      4
  3      k      8
2 1      c      5
  2      z      6
  3      l      9