regex column selection inside pd.melt function-CodePudding

I have a very wide df with about 100 columns, within which there are several 'XYZ_rating' columns that I want to pivot longer.

df = pd.DataFrame({

    'id': [1, 1, 1, 2, 2, 2, 3, 3, 3],

    'first_rating': [1, 2, 3, 1, 2, 3, 1, 2, 3],

    'second_rating': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],

    'third_rating': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],

})

The df in question has about 100 other columns that I do not want to transform from wide to long. I also want to remove the '_rating' suffix from the the category string.

The solution I came up with didn't work and is roughly translated from my R background.

pd.melt(df, id_vars=str.contains('[^rating]'), value_vars=re.contains(`rating`), var_name='category', value_name='value')

Desired output df would look like:

id	category	rating	100 other columns...
1	first	1	...
1	second	2.8	...
1	third	3.4	...
...

CodePudding user response：

One option is to reshape with pivot_longer from pyjanitor, using names_sep to split into new columns:

# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
    # you could pass a regex here : 
    # re.compile(". rating")
    column_names = "*rating", # uses the shell glob syntax
    names_to = ("category", ".value"), 
    names_sep = "_")
)
    id category  rating
0    1    first     1.0
1    1    first     2.0
2    1    first     3.0
3    2    first     1.0
4    2    first     2.0
5    2    first     3.0
6    3    first     1.0
7    3    first     2.0
8    3    first     3.0
9    1   second     2.8
10   1   second     2.9
11   1   second     2.2
12   2   second     2.0
13   2   second     1.8
14   2   second     1.9
15   3   second     2.2
16   3   second     2.3
17   3   second     2.1
18   1    third     3.4
19   1    third     3.8
20   1    third     2.9
21   2    third     3.2
22   2    third     2.8
23   2    third     2.4
24   3    third     3.3
25   3    third     3.4
26   3    third     2.9