Home > Enterprise >  Change value based on match to comma-separated list in a seperate dataframe
Change value based on match to comma-separated list in a seperate dataframe

Time:11-30

So I'm working with two different dataframes in R. One with user information and one with company information.

user <- data.frame(User_ID = 1:4, company = c("Ford", "Ford Motors", "Ford China", "amazon.com"))

company_info <- data.frame(company_ID = 1:2, company_name = c("Ford", "Amazon"), company_name_other =  c("Ford Motors, Ford China", "amazon.com, Amazon AWS"))

The goal is to change user$company value if it is found in the comma-separated list in company_info$company_name_other column -- I want to change it to the value in the company_info$company_name column. So the goal here would be to change all the company values in the user dataframe to be either "Ford" or "Amazon"

I tried the following code:

user %>% 
   mutate(company = ifelse(str_detect(company_info$company_name_other, company), company_info$company_name, company))

This changed three of the rows, but not the Ford Motor entry? What should I be tweaking?

CodePudding user response:

I suggest the use of fuzzyjoin here for a regex-based join:

fuzzyjoin::fuzzy_left_join(
  user, company_info,
  by = c("company" = "company_name_other"),
  match_fun = function(a, b) mapply(grepl, sprintf("(^|,\\s*)%s(\\s*,|$)", a), b)
) %>%
  mutate(company = coalesce(company_name, company)) %>%
  select(-starts_with("company_"))
#   User_ID company
# 1       1    Ford
# 2       2    Ford
# 3       3    Ford
# 4       4  Amazon

The sprintf(..) magic is to guard against partial matches: it requires that the match either (a) be preceded by a comma or at the start of the string, and (b) be followed by a comma or at the end of the string.

CodePudding user response:

Not a fan of this (not sure why) but here's one way.

user %>%
  rowwise() %>%
  mutate(
    test = coalesce(company_info[str_detect(company_info$company_name_other, company), 'company_name'],company) 
  ) %>% ungroup()

CodePudding user response:

This may be done with exact matching as well if we split the column with separate_rows and create a named vector

library(dplyr)
library(tidyr)
library(tibble)
nm1 <- company_info %>% 
   separate_rows(company_name_other, sep=",\\s*") %>% 
   select(3:2) %>% 
   deframe
user %>% 
  mutate(company = coalesce(nm1[company], company))
  User_ID company
1       1    Ford
2       2    Ford
3       3    Ford
4       4  Amazon

CodePudding user response:

Perhaps:

library(tidyverse)

user <- data.frame(User_ID = 1:4, company = c("Ford", "Ford Motors", "Ford China", "amazon.com"))

company_info <- data.frame(company_ID = 1:2, company_name = c("Ford", "Amazon"), company_name_other =  c("Ford Motors, Ford China", "amazon.com, Amazon AWS"))


mutate(user, company = map(user$company,
                           ~ company_info$company_name[str_detect(company_info$company_name_other, .)]))
#>   User_ID company
#> 1       1    Ford
#> 2       2    Ford
#> 3       3    Ford
#> 4       4  Amazon

Created on 2021-11-29 by the reprex package (v2.0.1)

  • Related