I'm trying to extract the character string "segments" separated by an "_" and make them their own columns. I know how to combine many columns to a single one, but I'm curious how going backwards works (single column to many).
My data:
structure(list(REF = c("M120", "M70", "M100", "M94", "M74", "M54",
"M99", "M09", "M111", "M17"), SEAS = c("DRY", "DRY", "DRY", "WET",
"DRY", "WET", "DRY", "WET", "WET", "WET"), YR = c(2019, 2008,
2007, 2021, 2019, 2015, 2020, 2017, 2019, 2012)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
df$REF_YR_SEAS <- paste(df$REF, df$YR, df$SEAS, sep = "_") # Combining many columns to one
I'm aware of:
df$REF2 <- sub("\\_.*", "", df$REF_YR_SEAS) # Extract everything to the left of "_"?
But, don't quite understand how to use the syntax to get the remaining pieces of the character string to make columns "YR" and "SEAS" from "REF_YR_SEAS".
CodePudding user response:
We can use separate()
from the tidyr
package to do this easily:
library(tidyr)
separate(data = df,
col = REF_YR_SEAS,
remove = FALSE,
c("ref_new","yr_new","sea_new"),
sep = "_")
#> # A tibble: 10 x 7
#> REF SEAS YR REF_YR_SEAS ref_new yr_new sea_new
#> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 M120 DRY 2019 M120_2019_DRY M120 2019 DRY
#> 2 M70 DRY 2008 M70_2008_DRY M70 2008 DRY
#> 3 M100 DRY 2007 M100_2007_DRY M100 2007 DRY
#> 4 M94 WET 2021 M94_2021_WET M94 2021 WET
#> 5 M74 DRY 2019 M74_2019_DRY M74 2019 DRY
#> 6 M54 WET 2015 M54_2015_WET M54 2015 WET
#> 7 M99 DRY 2020 M99_2020_DRY M99 2020 DRY
#> 8 M09 WET 2017 M09_2017_WET M09 2017 WET
#> 9 M111 WET 2019 M111_2019_WET M111 2019 WET
#> 10 M17 WET 2012 M17_2012_WET M17 2012 WET