I have a wide df that I would like to convert to long form based on multiple columns. For example, all columns with "Type" would be in a single column and all columns with "Finding" would be in a single column. I believe dplyr
is the best approach but have not had luck with pivot_longer
.
Initial df
structure(list(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Site = c("A", "B", "C", "D"),
`Finding?` = c("Yes", "Yes", "Yes", "Yes"),
`Topic Area` = c("A", "B", "C", "D"),
`Type` = c("1", "2", "3", "4"),
`Risk Ranking` = c("Medium", "Low", "Medium", "Medium"),
`Additional Finding?` = c("Yes", "Yes", "Yes", "Yes"),
`Topic Area2` = c("A", "B", "C", "D"),
`Type2` = c("1", "2", "2", "3"),
`Risk Ranking2` = c("Medium", "Medium", "Low", "Medium")),
row.names = c(NA, -4L),
class = c("tbl_df", "tbl", "data.frame"))
Desired output
data.frame(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400, 1648512000, 1648598400, 1648166400, 1648166400),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Site = c("A", "B", "C", "D", "A", "B", "C", "D"),
"Finding?" = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"),
"Topic Area" = c("A", "B", "C", "D", "A", "B", "C", "D"),
`Type` = c("1", "2", "3", "4", "1", "2", "2", "3"),
"Risk Ranking" = c("Medium", "Low", "Medium", "Medium", "Medium", "Medium", "Low", "Medium"))
CodePudding user response:
One of the corresponding set of columns i.e Finding, 'Additional Finding' is slightly different than the other sets as those sets of column names have value 2 at the end. Thus, we change only the suffix part of the column 'Finding' columns and then use pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_with(~ str_c("Finding", seq_along(.x)), contains("Finding")) %>%
pivot_longer(cols = -c(Date, Site), names_to = c(".value"),
names_pattern = "(\\D )\\d*$", values_drop_na = TRUE)
-output
# A tibble: 8 × 6
Date Site Finding `Topic Area` Type `Risk Ranking`
<dttm> <chr> <chr> <chr> <chr> <chr>
1 2022-03-29 00:00:00 A Yes A 1 Medium
2 2022-03-29 00:00:00 A Yes A 1 Medium
3 2022-03-30 00:00:00 B Yes B 2 Low
4 2022-03-30 00:00:00 B Yes B 2 Medium
5 2022-03-25 00:00:00 C Yes C 3 Medium
6 2022-03-25 00:00:00 C Yes C 2 Low
7 2022-03-25 00:00:00 D Yes D 4 Medium
8 2022-03-25 00:00:00 D Yes D 3 Medium