Home > Software design >  Row-wise flatten_chr() or unlist() to convert string to vector
Row-wise flatten_chr() or unlist() to convert string to vector

Time:07-09

I'm using a dataset that included diagnoses coding, and I'm trying to see whether follow_up_code is in discharge_codes. Sadly, the discharge codes have been provided as a string, rather than a vector.

mre <- tribble(
 ~patient_id, ~discharge_codes, ~follow_up_code,
 1234       , "A_B_C"         , "A",
 4567       , "D_E_F"         , "C",
 7890.      , "A_C_E".        , "E"
)

I've tried using flatten_chr() with str_split() to allow me to search follow_up_code %in% discharge_codes, but this flattens discharge_codes entirely (rather than by patient), and using rowwise() %>% mutate(... flatten_chr()) errors as ".x must be a list, not a character vector".

I feel I must be missing something, either in the approach I'm taking, or there being a much more straightforward way to achieve this?

CodePudding user response:

You can use grepl to check if the string exists in your other column. In the column match you can see if the string was there (TRUE/FALSE) like this:

mre <- data.frame(patien_id = c(1234, 4567, 7890),
                  discharge_codes = c("A_B_C", "D_E_F", "A_C_E"),
                  follow_up_code = c("A", "C", "E"))

mre$match <- grepl(mre$follow_up_code, mre$discharge_codes)
mre
#>   patien_id discharge_codes follow_up_code match
#> 1      1234           A_B_C              A  TRUE
#> 2      4567           D_E_F              C FALSE
#> 3      7890           A_C_E              E  TRUE

Created on 2022-07-08 by the reprex package (v2.0.1)

CodePudding user response:

You can convert the text column into a character vector and then see if the code is within that vector. The benefit of this is that the discharge_codes are now available for other uses, if needed.

library(dplyr)
library(purrr)
library(stringr)

mre %>% 
  mutate(discharge_codes = str_split(discharge_codes, "_"),
         match = map2_lgl(discharge_codes, follow_up_code, ~ .y %in% .x))

You can see that discharge_codes is now a list column with character vectors.

# A tibble: 3 x 4
  patient_id discharge_codes follow_up_code match
       <dbl> <list>          <chr>          <lgl>
1       1234 <chr [3]>       A              TRUE 
2       4567 <chr [3]>       C              FALSE
3       7890 <chr [3]>       E              TRUE 

CodePudding user response:

You can simply do this with the help of str_detect() from stringr package.

library(dplyr)
library(stringr)
library(tibble)

mre <- tibble::tribble(
  ~patient_id, ~discharge_codes, ~follow_up_code,
  1234,          "A_B_C",             "A",
  4567,          "D_E_F",             "C",
  7890,          "A_C_E",             "E"
)

mre %>% 
  mutate(
    matched = str_detect(discharge_codes, follow_up_code)
  )

#> # A tibble: 3 × 4
#>   patient_id discharge_codes follow_up_code matched
#>        <dbl> <chr>           <chr>          <lgl>  
#> 1       1234 A_B_C           A              TRUE   
#> 2       4567 D_E_F           C              FALSE  
#> 3       7890 A_C_E           E              TRUE

Created on 2022-07-08 by the reprex package (v2.0.1)

CodePudding user response:

This one will automatically filter all rows where there is a match:

library(tidyverse)

mre %>% 
  mutate(match = str_match_all(discharge_codes, follow_up_code)) %>% 
  unnest(c(match))

  patient_id discharge_codes follow_up_code match[,1]
       <dbl> <chr>           <chr>          <chr>    
1       1234 A_B_C           A              A        
2       7890 A_C_E           E              E  
  • Related