Home > other >  Find the corresponding pricelist from a named vector based on a price
Find the corresponding pricelist from a named vector based on a price

Time:12-16

Given a named vector consisting of unit prices and associated price list names, I want to write a function that adds a new column to my dataset that should say which price list the price in the dataset belongs to.

The function should be able to handle missing values (NA and price not included in the price list) as well as duplicate entries in the price list. For (NA and price not included in the price list) something like "not found" is totally okay, and for duplicate entries i would take the first entry

pricelist = rlang::set_names(
  x = c(11.12, 11.45, 14.45, 12.66, 12.96, 14.45),
  nm = c("1", "2", "3", "4", "5", "6"))

data = tibble(
  article = rep("article 34", 10), 
  price = c(11.12, NA, 11.45, 11.45, 11.45, 14.45, NA, 20, 12.96, 12.66)) 

The desired output would look like this:

data %>%
  mutate(pricelist = pricelist_fn(price = price))

# A tibble: 10 x 3
   article    price pricelist
   <chr>      <dbl> <chr>    
 1 article 34  11.1 1        
 2 article 34  NA   not found
 3 article 34  11.4 2        
 4 article 34  11.4 2        
 5 article 34  11.4 2        
 6 article 34  14.4 3        
 7 article 34  NA   not found
 8 article 34  20   not found
 9 article 34  13.0 5        
10 article 34  12.7 4 

CodePudding user response:

data %>%
  mutate(result = names(pricelist)[match(price, pricelist)])
# # A tibble: 10 × 3
#    article    price result
#    <chr>      <dbl> <chr>  
#  1 article 34  11.1 1      
#  2 article 34  NA   NA     
#  3 article 34  11.4 2      
#  4 article 34  11.4 2      
#  5 article 34  11.4 2      
#  6 article 34  14.4 3      
#  7 article 34  NA   NA     
#  8 article 34  20   NA     
#  9 article 34  13.0 5      
# 10 article 34  12.7 4

If you want a function, you can do this for the same result:

pricelist_fn = function(price, pricelist) {
  names(pricelist)[match(price, pricelist)]
}
data %>%
  mutate(result = pricelist_fn(price, pricelist))

I would add a word of caution, equality testing (like in match()) of floats is risky and can leading bugs due to floating point precision. See this FAQ for more details. If your prices are simply read in and kept, it's probably okay. But if you are doing math to calculate the prices or doing mathematical transformations of the prices, it can be risky and you may need to find another solution that checks for price matches within a tolerance.

  •  Tags:  
  • r
  • Related