Given a named vector consisting of unit prices and associated price list names, I want to write a function that adds a new column to my dataset that should say which price list the price in the dataset belongs to.
The function should be able to handle missing values (NA and price not included in the price list) as well as duplicate entries in the price list. For (NA and price not included in the price list) something like "not found" is totally okay, and for duplicate entries i would take the first entry
pricelist = rlang::set_names(
x = c(11.12, 11.45, 14.45, 12.66, 12.96, 14.45),
nm = c("1", "2", "3", "4", "5", "6"))
data = tibble(
article = rep("article 34", 10),
price = c(11.12, NA, 11.45, 11.45, 11.45, 14.45, NA, 20, 12.96, 12.66))
The desired output would look like this:
data %>%
mutate(pricelist = pricelist_fn(price = price))
# A tibble: 10 x 3
article price pricelist
<chr> <dbl> <chr>
1 article 34 11.1 1
2 article 34 NA not found
3 article 34 11.4 2
4 article 34 11.4 2
5 article 34 11.4 2
6 article 34 14.4 3
7 article 34 NA not found
8 article 34 20 not found
9 article 34 13.0 5
10 article 34 12.7 4
CodePudding user response:
data %>%
mutate(result = names(pricelist)[match(price, pricelist)])
# # A tibble: 10 × 3
# article price result
# <chr> <dbl> <chr>
# 1 article 34 11.1 1
# 2 article 34 NA NA
# 3 article 34 11.4 2
# 4 article 34 11.4 2
# 5 article 34 11.4 2
# 6 article 34 14.4 3
# 7 article 34 NA NA
# 8 article 34 20 NA
# 9 article 34 13.0 5
# 10 article 34 12.7 4
If you want a function, you can do this for the same result:
pricelist_fn = function(price, pricelist) {
names(pricelist)[match(price, pricelist)]
}
data %>%
mutate(result = pricelist_fn(price, pricelist))
I would add a word of caution, equality testing (like in match()
) of floats is risky and can leading bugs due to floating point precision. See this FAQ for more details. If your prices are simply read in and kept, it's probably okay. But if you are doing math to calculate the prices or doing mathematical transformations of the prices, it can be risky and you may need to find another solution that checks for price matches within a tolerance.