I have this dataframe:
refid col2 price1 factor1 price2 factor2 price3 factor3
0 1 a 200 1 180 3 150 10
1 2 b 500 1 450 3 400 10
2 3 c 700 1 620 2 550 5
And I need to get this output:
refid col2 price factor
0 1 a 200 1
1 1 b 500 1
2 1 c 700 1
3 2 a 180 3
4 2 b 450 3
5 2 c 620 2
6 3 a 150 10
7 3 b 400 10
8 3 c 550 5
Right now I'm trying to use df.melt method, but can't get it to work, this is the code and the current result:
df2_melt = df2.melt(id_vars=["refid","col2"],
value_vars=["price1","price2","price3",
"factor1","factor2","factor3"],
var_name="Price",
value_name="factor")
refid col2 price factor
0 1 a price1 200
1 2 b price1 500
2 3 c price1 700
3 1 a price2 180
4 2 b price2 450
5 3 c price2 620
6 1 a price3 150
7 2 b price3 400
8 3 c price3 550
9 1 a factor1 1
10 2 b factor1 1
11 3 c factor1 1
12 1 a factor2 3
13 2 b factor2 3
14 3 c factor2 2
15 1 a factor3 10
16 2 b factor3 10
17 3 c factor3 5
CodePudding user response:
Since you have a wide DataFrame with common prefixes, you can use wide_to_long
:
out = pd.wide_to_long(df, stubnames=['price','factor'],
i=["refid","col2"], j='num').droplevel(-1).reset_index()
Output:
refid col2 price factor
0 1 a 200 1
1 1 a 180 3
2 1 a 150 10
3 2 b 500 1
4 2 b 450 3
5 2 b 400 10
6 3 c 700 1
7 3 c 620 2
8 3 c 550 5
Note that your expected output has an error where factor
s don't align with refid
s.
CodePudding user response:
You can melt two times and then concat them:
import pandas as pd
df = pd.DataFrame({'refid': [1, 2, 3], 'col2': ['a', 'b', 'c'],
'price1': [200, 500, 700], 'factor1': [1, 1, 1],
'price2': [180, 450, 620], 'factor2': [3,3,2],
'price3': [150, 400, 550], 'factor3': [10, 10, 5]})
prices = [c for c in df if c.startswith('price')]
factors = [c for c in df if c.startswith('factor')]
df1 = pd.melt(df, id_vars=["refid","col2"], value_vars=prices, value_name='price').drop('variable', axis=1)
df2 = pd.melt(df, id_vars=["refid","col2"], value_vars=factors, value_name='factor').drop('variable', axis=1)
df3 = pd.concat([df1, df2['factor']],axis=1).reset_index().drop('index', axis=1)
print(df3)
Here is the output:
refid col2 price factor
0 1 a 200 1
1 2 b 500 1
2 3 c 700 1
3 1 a 180 3
4 2 b 450 3
5 3 c 620 2
6 1 a 150 10
7 2 b 400 10
8 3 c 550 5