Home > Net >  pyjanitor pivot_longer multiple sets of columns with common grouping variable and id column
pyjanitor pivot_longer multiple sets of columns with common grouping variable and id column

Time:12-30

Follow-up from my last question: regex column selection inside pd.melt function

I used the following code to pivot_longer columns that match the string "rating".

import pandas as pd
import janitor
(df
.pivot_longer(
    column_names = "*rating",
    names_to = ("category", ".value"))
)
    id category  rating
0    1    first     1.
1    1    first     2.0
2    1    first     3.0
3    2    first     1.0
4    2    first     2.0
5    2    first     3.0
6    3    first     1.0
7    3    first     2.0
8    3    first     3.0
9    1   second     2.8
10   1   second     2.9
11   1   second     2.2
12   2   second     2.0
13   2   second     1.8
14   2   second     1.9
15   3   second     2.2
16   3   second     2.3
17   3   second     2.1
18   1    third     3.4
19   1    third     3.8
20   1    third     2.9
21   2    third     3.2
22   2    third     2.8
23   2    third     2.4
24   3    third     3.3
25   3    third     3.4
26   3    third     2.9

Now, I also have columns that match the string "type" and "estimate" that have the prefix "first", "second", and "third".

df = pd.DataFrame({

    'id': [1, 1, 1],
    'first_rating': [1, 2, 3],
    'second_rating': [2.8, 2.9, 2.2],
    'third_rating': [3.4, 3.8, 2.9],
    'first_estimate': [1.2, 2.4, 2.8],
    'second_estimate': [2.4, 3, 2.4],
    'third_estimate':[3.4, 3.8, 2.9],
    'first_type': ['red', 'green', 'blue'],
    'second_type': ['red', 'green', 'yellow'],
    'third_type': ['red', 'red', 'blue'],
})

The header of my desired output is the following:

id category rating estimate type
1 first 1.0 1.2 'red'

I want to pivot longer the type and estimate columns too, while having only one 'id' and 'category' column. I know I could pivot longer for ratings, estimates, and types separately and then join the dfs, but I imagine there is a more efficient solution.

I tried to the following but it doesn't work:

(df
.pivot_longer(
    column_names = "*rating", 
    names_to = "category",
    values_to = "rating")
.pivot_longer(
    column_names = "*type", 
    names_to = "category",
    values_to = "type") 
.pivot_longer(
    column_names = "*estimate", 
    names_to = "category",  
    values_to = "estimate"))

CodePudding user response:

The same concept as the previous solution applies, you just need to adapt the column names selection based on a regex or a shell glob (under the hood it uses fnmatch):

# pip install pyjanitor
import pandas as pd
import janitor
# import re - if you are using regex to select the column names
(df
.pivot_longer(
    # you can use a regex here instead
    # re.compile(". _. ")
    column_names="*_*", 
    names_to = ("category", ".value"), 
    names_sep="_")
)

   id category  rating  estimate    type
0   1    first     1.0       1.2     red
1   1    first     2.0       2.4   green
2   1    first     3.0       2.8    blue
3   1   second     2.8       2.4     red
4   1   second     2.9       3.0   green
5   1   second     2.2       2.4  yellow
6   1    third     3.4       3.4     red
7   1    third     3.8       3.8     red
8   1    third     2.9       2.9    blue
  • Related