Home > other >  How to pivot wide to long format for columns with different names?
How to pivot wide to long format for columns with different names?

Time:04-02

I have a wide df that I would like to convert to long form based on multiple columns. For example, all columns with "Type" would be in a single column and all columns with "Finding" would be in a single column. I believe dplyr is the best approach but have not had luck with pivot_longer.

Initial df

structure(list(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400), 
                                class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
               Site = c("A", "B", "C", "D"), 
               `Finding?` = c("Yes", "Yes", "Yes", "Yes"), 
               `Topic Area` = c("A", "B", "C", "D"), 
               `Type` = c("1", "2", "3", "4"), 
               `Risk Ranking` = c("Medium", "Low", "Medium", "Medium"), 
               `Additional Finding?` = c("Yes", "Yes", "Yes", "Yes"), 
               `Topic Area2` = c("A", "B", "C", "D"), 
               `Type2` = c("1", "2", "2", "3"), 
               `Risk Ranking2` = c("Medium", "Medium", "Low", "Medium")), 
          row.names = c(NA, -4L), 
          class = c("tbl_df", "tbl", "data.frame")) 

Desired output

data.frame(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400, 1648512000, 1648598400, 1648166400, 1648166400), 
                                class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
               Site = c("A", "B", "C", "D", "A", "B", "C", "D"), 
               "Finding?" = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"), 
               "Topic Area" = c("A", "B", "C", "D", "A", "B", "C", "D"), 
               `Type` = c("1", "2", "3", "4", "1", "2", "2", "3"), 
               "Risk Ranking" = c("Medium", "Low", "Medium", "Medium", "Medium", "Medium", "Low", "Medium"))

CodePudding user response:

One of the corresponding set of columns i.e Finding, 'Additional Finding' is slightly different than the other sets as those sets of column names have value 2 at the end. Thus, we change only the suffix part of the column 'Finding' columns and then use pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
df1 %>% 
  rename_with(~ str_c("Finding", seq_along(.x)), contains("Finding")) %>% 
  pivot_longer(cols = -c(Date, Site), names_to = c(".value"), 
     names_pattern = "(\\D )\\d*$", values_drop_na = TRUE)

-output

# A tibble: 8 × 6
  Date                Site  Finding `Topic Area` Type  `Risk Ranking`
  <dttm>              <chr> <chr>   <chr>        <chr> <chr>         
1 2022-03-29 00:00:00 A     Yes     A            1     Medium        
2 2022-03-29 00:00:00 A     Yes     A            1     Medium        
3 2022-03-30 00:00:00 B     Yes     B            2     Low           
4 2022-03-30 00:00:00 B     Yes     B            2     Medium        
5 2022-03-25 00:00:00 C     Yes     C            3     Medium        
6 2022-03-25 00:00:00 C     Yes     C            2     Low           
7 2022-03-25 00:00:00 D     Yes     D            4     Medium        
8 2022-03-25 00:00:00 D     Yes     D            3     Medium        
  • Related