I have a data frame:
df <- data.frame(resource = c("gold", "gold", "gold", "silver", "silver", "gold", "silver", "bronze"), amount = c(500, 2000, 4, 8, 100, 2000, 3, 5), unit = c("g", "g", "kg", "ton", "kg", "g", "ton", "kg"), price = c(10, 10, 10000, 50000, 50, 10, 50000, 20))
I want to calculate the total amount of each resource in the most common unit.
My idea to approach this would be to:
- calculate the total value of each resource
- determine the most common unit of measurement and the corresponding price
- divide the total value by this price while maintaining a variable that indicates the measurement unit and ideally the price as well.
The result should look like
resource value price unit amount
bronze 100 20 kg 100
gold 85000 10 g 8500
silver 555000 50000 ton 11.1
If two measurement units are equally frequent it can be random, which one is used.
CodePudding user response:
I doubt that the format you desire is really helpful (as PeterK pointed out in the comments), but here we go:
df <- data.frame(resource = c("gold", "gold", "gold", "silver", "silver", "gold", "silver", "bronze"), amount = c(500, 2000, 4, 8, 100, 2000, 3, 5), unit = c("g", "g", "kg", "ton", "kg", "g", "ton", "kg"), price = c(10, 10, 10000, 50000, 50, 10, 50000, 20))
# calculate total value
DT <- setDT(df)[, .(value = sum(amount * price)), by = resource]
# create wide data
# variables we want to cast wide
cols <- c("amount", "unit")
# cast to wide
DT.wide <- dcast(setDT(df), resource ~ rowid(resource), value.var = cols)
new_colorder <- CJ(unique(rowid(df$resource)), cols, sorted = FALSE)[, paste(cols, V1, sep = "_")]
# reorder the relevant columns
setcolorder(DT.wide, c(setdiff(names(DT.wide), new_colorder), new_colorder))
# join together
DT[DT.wide, on = .(resource)]
# resource value amount_1 unit_1 amount_2 unit_2 amount_3 unit_3 amount_4 unit_4
# 1: bronze 100 5 kg NA <NA> NA <NA> NA <NA>
# 2: gold 85000 500 g 2000 g 4 kg 2000 g
# 3: silver 555000 8 ton 100 kg 3 ton NA <NA>
CodePudding user response:
I found an answer to my edited question following this thread Return most frequent string value for each group and modifying the code to fit my needs:
df %>% group_by(resource) %>% summarize (value = sum(price * amount), freq_unit =names(which.max(table(unit))), freq_price = as.numeric(names(which.max(table(price)))), total_amount = value/freq_price)
gives me
resource value freq_unit freq_price total_amount
<chr> <dbl> <chr> <dbl> <dbl>
1 bronze 100 kg 20 5
2 gold 85000 g 10 8500
3 silver 555000 ton 50000 11.1