Home > Enterprise >  Classifying columns based on str_detect
Classifying columns based on str_detect

Time:06-23

I am currently working with a data frame that looks like this:

Example <- structure(list(ID = c(12301L, 12301L, 15271L, 11888L, 15271L, 
                                         15271L, 15271L), StationOwner = c("Brian", "Brian", "Simon", 
                                                                           "Brian", "Simon", "Simon", "Simon"), StationName = c("Red", "Red", 
                                                                                                                                "Red", "Green", "Yellow", "Yellow", "Yellow"), Parameter = c("Rain - Daily", 
                                                                                                                                                                                             "Temperature -Daily", "VPD - Daily", "Rain - Daily", "Rain - Daily", 
                                                                                                                                                                                             "Temperature -Daily", "VPD - Daily")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                        -7L))

I am looking into using str_detect to filter for example all the observation that start with “Rain –“ and adding what comes after under a new column called "Rain". I have been able to filter out only the values that start with “Rain” using str_detect but have not found a way to assign them automatically. Is there a specific function that would help with this? Appreciate the pointers, thanks!

Example of desired output that I am trying to achieve:

Desired <- structure(list(ID = c(12301L, 15271L, 12301L, 15271L
), StationOwner = c("Brian", "Simon", "Brian", "Simon"), StationName = c("Red", 
                                                                         "Red", "Green", "Yellow"), Rain = c("Daily", NA, "Daily", "Daily"
                                                                         ), Temperature = c("Daily", NA, NA, "Daily"), VDP = c(NA, "Daily", 
                                                                                                                               NA, "Daily")), class = "data.frame", row.names = c(NA, -4L))

CodePudding user response:

Directly using pivot_wider:

pivot_wider(Example, names_from = Parameter, values_from = Parameter,
        names_repair = ~str_remove(.,' .*'),values_fn = ~str_remove(.,'.*- ?'))

# A tibble: 4 x 6
     ID StationOwner StationName Rain  Temperature VPD  
  <int> <chr>        <chr>       <chr> <chr>       <chr>
1 12301 Brian        Red         Daily Daily       NA   
2 15271 Simon        Red         NA    NA          Daily
3 11888 Brian        Green       Daily NA          NA   
4 15271 Simon        Yellow      Daily Daily       Daily

CodePudding user response:

It's not using str_detectbut can achive Desired by

library(dplyr)

Example %>%
  separate(Parameter, c('a', 'b'), sep = "-") %>%
  mutate(across(where(is.character), ~trimws(.x))) %>%
  pivot_wider(id_cols = c("ID","StationOwner", "StationName"), names_from = "a", values_from = "b")

     ID StationOwner StationName Rain  Temperature VPD  
  <int> <chr>        <chr>       <chr> <chr>       <chr>
1 12301 Brian        Red         Daily Daily       NA   
2 15271 Simon        Red         NA    NA          Daily
3 11888 Brian        Green       Daily NA          NA   
4 15271 Simon        Yellow      Daily Daily       Daily
  • Related