Follow-up from my last question: regex column selection inside pd.melt function
I used the following code to pivot_longer columns that match the string "rating".
import pandas as pd
import janitor
(df
.pivot_longer(
column_names = "*rating",
names_to = ("category", ".value"))
)
id category rating
0 1 first 1.
1 1 first 2.0
2 1 first 3.0
3 2 first 1.0
4 2 first 2.0
5 2 first 3.0
6 3 first 1.0
7 3 first 2.0
8 3 first 3.0
9 1 second 2.8
10 1 second 2.9
11 1 second 2.2
12 2 second 2.0
13 2 second 1.8
14 2 second 1.9
15 3 second 2.2
16 3 second 2.3
17 3 second 2.1
18 1 third 3.4
19 1 third 3.8
20 1 third 2.9
21 2 third 3.2
22 2 third 2.8
23 2 third 2.4
24 3 third 3.3
25 3 third 3.4
26 3 third 2.9
Now, I also have columns that match the string "type" and "estimate" that have the prefix "first", "second", and "third".
df = pd.DataFrame({
'id': [1, 1, 1],
'first_rating': [1, 2, 3],
'second_rating': [2.8, 2.9, 2.2],
'third_rating': [3.4, 3.8, 2.9],
'first_estimate': [1.2, 2.4, 2.8],
'second_estimate': [2.4, 3, 2.4],
'third_estimate':[3.4, 3.8, 2.9],
'first_type': ['red', 'green', 'blue'],
'second_type': ['red', 'green', 'yellow'],
'third_type': ['red', 'red', 'blue'],
})
The header of my desired output is the following:
id | category | rating | estimate | type |
---|---|---|---|---|
1 | first | 1.0 | 1.2 | 'red' |
I want to pivot longer the type and estimate columns too, while having only one 'id' and 'category' column. I know I could pivot longer for ratings, estimates, and types separately and then join the dfs, but I imagine there is a more efficient solution.
I tried to the following but it doesn't work:
(df
.pivot_longer(
column_names = "*rating",
names_to = "category",
values_to = "rating")
.pivot_longer(
column_names = "*type",
names_to = "category",
values_to = "type")
.pivot_longer(
column_names = "*estimate",
names_to = "category",
values_to = "estimate"))
CodePudding user response:
The same concept as the previous solution applies, you just need to adapt the column names selection based on a regex or a shell glob (under the hood it uses fnmatch):
# pip install pyjanitor
import pandas as pd
import janitor
# import re - if you are using regex to select the column names
(df
.pivot_longer(
# you can use a regex here instead
# re.compile(". _. ")
column_names="*_*",
names_to = ("category", ".value"),
names_sep="_")
)
id category rating estimate type
0 1 first 1.0 1.2 red
1 1 first 2.0 2.4 green
2 1 first 3.0 2.8 blue
3 1 second 2.8 2.4 red
4 1 second 2.9 3.0 green
5 1 second 2.2 2.4 yellow
6 1 third 3.4 3.4 red
7 1 third 3.8 3.8 red
8 1 third 2.9 2.9 blue