Home > OS >  Match text from one column with another column (vlookup like)
Match text from one column with another column (vlookup like)

Time:12-15

I'm trying to perform a match of 2 columns but without success. I have one DF1 with 2 columns, Id and JSON. In the second DF2, I have one column with a pattern to be matched in each row for DF1$json (something like vlookup like function).

As an output, I'd like to get DF1$Id but only where any of DF2 is matched with DF1$json.

enter image description here

I've tried some combinations with str_detect but it doesn't work on non-vector values. Maybe some tricks with grep or stringr functions?

For example:

str_detect(DF1$json, fixed(DF2[1,1], ignore_case = TRUE))

CodePudding user response:

df1 <- data.frame(
  Id = c("AA", "BB", "CC", "DD"),
  json = c("{xxx:yyy:zzz};{mmm:zzz:vvv}", "{ccc:yyy:zzz};{ddd:zzz:vvv}", "{ttt:yyy:zzz};{mmm:zzz:vvv}", "{uuu:yyy:zzz};{mmm:zzz:vvv}")
)
matches <- c("mmm:zzz:vvv", "mmm:yyy:zzz")

library(stringr) # needed for str_extract_all()

Solution using data.table

library(data.table)
setDT(df1)

df1[, match := any(str_extract_all(json, "(?<=\\{). ?(?=\\})")[[1]] %in% matches), by = Id]
df1[match == T, .(Id)]

Solution using dplyr

library(dplyr)

df1 %>% 
  group_by(Id) %>% 
  mutate(match = any(str_extract_all(json, "(?<=\\{). ?(?=\\})")[[1]] %in% matches)) %>%
  filter(match == T) %>%
  select(Id)

Or just directly filter()

df1 %>% 
  group_by(Id) %>% 
  filter(any(str_extract_all(json, "(?<=\\{). ?(?=\\})")[[1]] %in% matches)) %>%
  select(Id)

Output on both methods

   Id
1: AA
2: CC
3: DD

CodePudding user response:

Does this give you the expected result :

my_df <- data.frame("id" = c("AA", "BB", "CC", "DD"),
                "json" = c("{x:y:z};{m:z:v}", "{c:y:z};{d:z:v}", "{t:y:z};{m:z:v}", "{u:y:z};{m:z:v}"),
                "pattern" = c("m:z:v", "t:y:z", "m:z:v", "t"),
                stringsAsFactors = FALSE)


my_f <- function(x) {
  my_var <- paste(grep(pattern = my_df[x, "pattern"], x = my_df$json), collapse = " ")
  return (my_var)
}
my_df$Value <- lapply(1:nrow(my_df), my_f)
  • Related