Home > Enterprise >  Pivoting multiple paired columns in R
Pivoting multiple paired columns in R

Time:10-01

in the below example I want to pivot the first dataframe to create the 2nd one:

df <- data.frame(area = c(1,2,3), 
           cust1_LastName = c("Jack", "Sam", "Mary"), 
           cust1_Money = c(10,40,50),
           cust2_LastName = c("Jane", "Kevin", "John"), 
           cust2_Money = c(5,20,30))



df2 <- data.frame(area = c(1,1,2,2,3,3),
                 customer = c("Jack", "Jane", "Sam", "Kevin","Mary",  "John"),
                 money_spend = c(10,5,40,20,50,30))

I saw a similar example in the pivot_longer documentation that I failed to implement. This is my failed attempt:

df2 <- df %>%
  pivot_longer(-area,
               names_to = (".value", "set"),
               names_pattern = "(customer)(money)"
  )

Anyone knows how to pivot multiple paired columns (e.g. 10 paired columns)?

CodePudding user response:

We need a regex in the names_pattern i.e. capture one of more characters not a digit (\\D ) followed by one or more characters that are a digit (\\d )

library(dplyr)
library(tidyr)
df %>%
     pivot_longer(cols = -area, names_to = c(".value", "set"), 
          names_pattern = "(\\D )(\\d )")

-output

# A tibble: 6 × 4
   area set   customer money_spend
  <dbl> <chr> <chr>          <dbl>
1     1 1     Jack              10
2     1 2     Jane               5
3     2 1     Sam               40
4     2 2     Kevin             20
5     3 1     Mary              50
6     3 2     John              30

For the updated data

df %>%
     pivot_longer(cols = -area, names_to = c("set",".value"), 
          names_pattern = "^[A-Za-z] (\\d )_([A-Za-z] )")

-output

# A tibble: 6 × 4
   area set   LastName Money
  <dbl> <chr> <chr>    <dbl>
1     1 1     Jack        10
2     1 1     Jane         5
3     2 1     Sam         40
4     2 1     Kevin       20
5     3 1     Mary        50
6     3 1     John        30

CodePudding user response:

In the regex pattern you may extract everything after an underscore (_) to differentiate the columns.

tidyr::pivot_longer(df, -area,
                     names_to = ".value",
                     names_pattern = ".*_(.*)")

#   area LastName Money
#  <dbl> <chr>    <dbl>
#1     1 Jack        10
#2     1 Jane         5
#3     2 Sam         40
#4     2 Kevin       20
#5     3 Mary        50
#6     3 John        30
  •  Tags:  
  • r
  • Related