Home > Software design >  Grouping several dataframe columns based on another columns values
Grouping several dataframe columns based on another columns values

Time:04-27

I have this dataframe:

    refid   col2    price1  factor1 price2  factor2 price3  factor3
  0   1       a      200       1      180     3     150       10
  1   2       b      500       1      450     3     400       10
  2   3       c      700       1      620     2     550        5

And I need to get this output:

   refid    col2    price   factor
0   1        a      200       1
1   1        b      500       1
2   1        c      700       1
3   2        a      180       3
4   2        b      450       3
5   2        c      620       2
6   3        a      150       10
7   3        b      400       10
8   3        c      550       5

Right now I'm trying to use df.melt method, but can't get it to work, this is the code and the current result:

df2_melt = df2.melt(id_vars=["refid","col2"],
        value_vars=["price1","price2","price3",
                   "factor1","factor2","factor3"],
        var_name="Price", 
        value_name="factor")



    refid   col2    price   factor
0       1   a      price1   200
1       2   b      price1   500
2       3   c      price1   700
3       1   a      price2   180
4       2   b      price2   450
5       3   c      price2   620
6       1   a      price3   150
7       2   b      price3   400
8       3   c      price3   550
9       1   a      factor1  1
10      2   b      factor1  1
11      3   c      factor1  1
12      1   a      factor2  3
13      2   b      factor2  3
14      3   c      factor2  2
15      1   a      factor3  10
16      2   b      factor3  10
17      3   c      factor3  5

CodePudding user response:

Since you have a wide DataFrame with common prefixes, you can use wide_to_long:

out = pd.wide_to_long(df, stubnames=['price','factor'], 
                      i=["refid","col2"], j='num').droplevel(-1).reset_index()

Output:

   refid col2  price  factor
0      1    a    200       1
1      1    a    180       3
2      1    a    150      10
3      2    b    500       1
4      2    b    450       3
5      2    b    400      10
6      3    c    700       1
7      3    c    620       2
8      3    c    550       5

Note that your expected output has an error where factors don't align with refids.

CodePudding user response:

You can melt two times and then concat them:

import pandas as pd  

df = pd.DataFrame({'refid': [1, 2, 3], 'col2': ['a', 'b', 'c'],
                   'price1': [200, 500, 700], 'factor1': [1, 1, 1],
                   'price2': [180, 450, 620], 'factor2': [3,3,2],
                   'price3': [150, 400, 550], 'factor3': [10, 10, 5]})
prices = [c for c in df if c.startswith('price')]
factors = [c for c in df if c.startswith('factor')]
df1 = pd.melt(df, id_vars=["refid","col2"], value_vars=prices, value_name='price').drop('variable', axis=1)
df2 = pd.melt(df, id_vars=["refid","col2"], value_vars=factors, value_name='factor').drop('variable', axis=1)
df3 = pd.concat([df1, df2['factor']],axis=1).reset_index().drop('index', axis=1)
print(df3)

Here is the output:

     refid  col2  price  factor
0      1    a    200       1
1      2    b    500       1
2      3    c    700       1
3      1    a    180       3
4      2    b    450       3
5      3    c    620       2
6      1    a    150      10
7      2    b    400      10
8      3    c    550       5
  • Related