So I'm working with two different dataframes in R. One with user information and one with company information.
user <- data.frame(User_ID = 1:4, company = c("Ford", "Ford Motors", "Ford China", "amazon.com"))
company_info <- data.frame(company_ID = 1:2, company_name = c("Ford", "Amazon"), company_name_other = c("Ford Motors, Ford China", "amazon.com, Amazon AWS"))
The goal is to change user$company
value if it is found in the comma-separated list in company_info$company_name_other
column -- I want to change it to the value in the company_info$company_name
column. So the goal here would be to change all the company values in the user dataframe to be either "Ford" or "Amazon"
I tried the following code:
user %>%
mutate(company = ifelse(str_detect(company_info$company_name_other, company), company_info$company_name, company))
This changed three of the rows, but not the Ford Motor entry? What should I be tweaking?
CodePudding user response:
I suggest the use of fuzzyjoin
here for a regex-based join:
fuzzyjoin::fuzzy_left_join(
user, company_info,
by = c("company" = "company_name_other"),
match_fun = function(a, b) mapply(grepl, sprintf("(^|,\\s*)%s(\\s*,|$)", a), b)
) %>%
mutate(company = coalesce(company_name, company)) %>%
select(-starts_with("company_"))
# User_ID company
# 1 1 Ford
# 2 2 Ford
# 3 3 Ford
# 4 4 Amazon
The sprintf(..)
magic is to guard against partial matches: it requires that the match either (a) be preceded by a comma or at the start of the string, and (b) be followed by a comma or at the end of the string.
CodePudding user response:
Not a fan of this (not sure why) but here's one way.
user %>%
rowwise() %>%
mutate(
test = coalesce(company_info[str_detect(company_info$company_name_other, company), 'company_name'],company)
) %>% ungroup()
CodePudding user response:
This may be done with exact matching as well if we split the column with separate_rows
and create a named vector
library(dplyr)
library(tidyr)
library(tibble)
nm1 <- company_info %>%
separate_rows(company_name_other, sep=",\\s*") %>%
select(3:2) %>%
deframe
user %>%
mutate(company = coalesce(nm1[company], company))
User_ID company
1 1 Ford
2 2 Ford
3 3 Ford
4 4 Amazon
CodePudding user response:
Perhaps:
library(tidyverse)
user <- data.frame(User_ID = 1:4, company = c("Ford", "Ford Motors", "Ford China", "amazon.com"))
company_info <- data.frame(company_ID = 1:2, company_name = c("Ford", "Amazon"), company_name_other = c("Ford Motors, Ford China", "amazon.com, Amazon AWS"))
mutate(user, company = map(user$company,
~ company_info$company_name[str_detect(company_info$company_name_other, .)]))
#> User_ID company
#> 1 1 Ford
#> 2 2 Ford
#> 3 3 Ford
#> 4 4 Amazon
Created on 2021-11-29 by the reprex package (v2.0.1)