I have two data frames:
df <- data.frame(ID = 1:10, year = c(2005, 2005, 2006, 2007, 2007, 2005, 2006, 2007, 2006, 2005), resources = c("gold", "silver", "bronze", "gold, bronze", "silver, bronze", "gold", "gold, silver, bronze", "bronze", "gold, silver", "silver"))
and
prices <- data.frame(year = c(2005:2007, 2005:2007, 2005:2007), resource = c("gold", "gold", "gold", "silver", "silver", "silver", "bronze", "bronze", "bronze"), price = c(10, 11, 12, 8, 6, 7, 5, NA, NA))
which look like
ID year resources
1 1 2005 gold
2 2 2005 silver
3 3 2006 bronze
4 4 2007 gold, bronze
5 5 2007 silver, bronze
6 6 2005 gold
7 7 2006 gold, silver, bronze
8 8 2007 bronze
9 9 2006 gold, silver
10 10 2005 silver
and
year resource price
1 2005 gold 10
2 2006 gold 11
3 2007 gold 12
4 2005 silver 8
5 2006 silver 6
6 2007 silver 7
7 2005 bronze 5
8 2006 bronze NA
9 2007 bronze NA
respectively.
I want to calculate the mean price for each observation in df
by taking prices from prices
. I.e. if one observation in 2005 only has gold, the mean price should just be the price of gold in 2005, if it has gold and silver it should be the mean of the two prices in the correct year, if one price is NA
then the NA value should just be ignored.
The result should look like
ID year resources new_price
1 1 2005 gold 10
2 2 2005 silver 8
3 3 2006 bronze NA
4 4 2007 gold, bronze 12
5 5 2007 silver, bronze 7
6 6 2005 gold 10
7 7 2006 gold, silver, bronze 8.5
8 8 2007 bronze NA
9 9 2006 gold, silver 8.5
10 10 2005 silver 8
CodePudding user response:
Try:
This method uses stringr to form the resource column into a list, we then unnest the column to left join the other table, then it's plug and play.
df |>
mutate(resource = stringr::str_extract_all(resource, "[^, ]{1,}")) |>
unnest(cols = resource) |>
left_join(prices, by = c("year", "resource")) |>
group_by(ID) |>
summarise(resource = paste(unique(resource), collapse = ", "), price = mean(price, na.rm = TRUE ), .groups = "drop")
ID resource price
<int> <chr> <dbl>
1 1 gold 10
2 2 silver 8
3 3 bronze NaN
4 4 gold, bronze 12
5 5 silver, bronze 7
6 6 gold 10
7 7 gold, silver, bronze 8.5
8 8 bronze NaN
9 9 gold, silver 8.5
10 10 silver 8