I have a very wide df with about 100 columns, within which there are several 'XYZ_rating' columns that I want to pivot longer.
df = pd.DataFrame({
'id': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'first_rating': [1, 2, 3, 1, 2, 3, 1, 2, 3],
'second_rating': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
'third_rating': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
})
The df in question has about 100 other columns that I do not want to transform from wide to long. I also want to remove the '_rating' suffix from the the category string.
The solution I came up with didn't work and is roughly translated from my R background.
pd.melt(df, id_vars=str.contains('[^rating]'), value_vars=re.contains(`rating`), var_name='category', value_name='value')
Desired output df would look like:
id | category | rating | 100 other columns... |
---|---|---|---|
1 | first | 1 | ... |
1 | second | 2.8 | ... |
1 | third | 3.4 | ... |
... |
CodePudding user response:
One option is to reshape with pivot_longer from pyjanitor, using names_sep
to split into new columns:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
# you could pass a regex here :
# re.compile(". rating")
column_names = "*rating", # uses the shell glob syntax
names_to = ("category", ".value"),
names_sep = "_")
)
id category rating
0 1 first 1.0
1 1 first 2.0
2 1 first 3.0
3 2 first 1.0
4 2 first 2.0
5 2 first 3.0
6 3 first 1.0
7 3 first 2.0
8 3 first 3.0
9 1 second 2.8
10 1 second 2.9
11 1 second 2.2
12 2 second 2.0
13 2 second 1.8
14 2 second 1.9
15 3 second 2.2
16 3 second 2.3
17 3 second 2.1
18 1 third 3.4
19 1 third 3.8
20 1 third 2.9
21 2 third 3.2
22 2 third 2.8
23 2 third 2.4
24 3 third 3.3
25 3 third 3.4
26 3 third 2.9