I'm using a dataset that included diagnoses coding, and I'm trying to see whether follow_up_code
is in discharge_codes
. Sadly, the discharge codes have been provided as a string, rather than a vector.
mre <- tribble(
~patient_id, ~discharge_codes, ~follow_up_code,
1234 , "A_B_C" , "A",
4567 , "D_E_F" , "C",
7890. , "A_C_E". , "E"
)
I've tried using flatten_chr()
with str_split()
to allow me to search follow_up_code %in% discharge_codes
, but this flattens discharge_codes
entirely (rather than by patient), and using rowwise() %>% mutate(... flatten_chr())
errors as ".x must be a list, not a character vector".
I feel I must be missing something, either in the approach I'm taking, or there being a much more straightforward way to achieve this?
CodePudding user response:
You can use grepl
to check if the string exists in your other column. In the column match you can see if the string was there (TRUE/FALSE) like this:
mre <- data.frame(patien_id = c(1234, 4567, 7890),
discharge_codes = c("A_B_C", "D_E_F", "A_C_E"),
follow_up_code = c("A", "C", "E"))
mre$match <- grepl(mre$follow_up_code, mre$discharge_codes)
mre
#> patien_id discharge_codes follow_up_code match
#> 1 1234 A_B_C A TRUE
#> 2 4567 D_E_F C FALSE
#> 3 7890 A_C_E E TRUE
Created on 2022-07-08 by the reprex package (v2.0.1)
CodePudding user response:
You can convert the text column into a character vector and then see if the code is within that vector. The benefit of this is that the discharge_codes
are now available for other uses, if needed.
library(dplyr)
library(purrr)
library(stringr)
mre %>%
mutate(discharge_codes = str_split(discharge_codes, "_"),
match = map2_lgl(discharge_codes, follow_up_code, ~ .y %in% .x))
You can see that discharge_codes
is now a list column with character vectors.
# A tibble: 3 x 4
patient_id discharge_codes follow_up_code match
<dbl> <list> <chr> <lgl>
1 1234 <chr [3]> A TRUE
2 4567 <chr [3]> C FALSE
3 7890 <chr [3]> E TRUE
CodePudding user response:
You can simply do this with the help of str_detect()
from stringr
package.
library(dplyr)
library(stringr)
library(tibble)
mre <- tibble::tribble(
~patient_id, ~discharge_codes, ~follow_up_code,
1234, "A_B_C", "A",
4567, "D_E_F", "C",
7890, "A_C_E", "E"
)
mre %>%
mutate(
matched = str_detect(discharge_codes, follow_up_code)
)
#> # A tibble: 3 × 4
#> patient_id discharge_codes follow_up_code matched
#> <dbl> <chr> <chr> <lgl>
#> 1 1234 A_B_C A TRUE
#> 2 4567 D_E_F C FALSE
#> 3 7890 A_C_E E TRUE
Created on 2022-07-08 by the reprex package (v2.0.1)
CodePudding user response:
This one will automatically filter all rows where there is a match:
library(tidyverse)
mre %>%
mutate(match = str_match_all(discharge_codes, follow_up_code)) %>%
unnest(c(match))
patient_id discharge_codes follow_up_code match[,1]
<dbl> <chr> <chr> <chr>
1 1234 A_B_C A A
2 7890 A_C_E E E