Home > Mobile >  How to calculate weighted average with Rstudio
How to calculate weighted average with Rstudio

Time:05-14

I am going to start with an example:

inv <- tibble::tribble(
           ~Date, ~Material, ~Quantity,
    "2020-01-01",   "nails",       10L,
    "2020-01-01",   "nails",      100L,
    "2020-02-02",   "nails",       50L,
    "2020-02-02",   "nails",       70L,
    "2020-01-01",  "hammer",       20L,
    "2020-01-01",  "hammer",       80L,
    "2020-02-02",  "hammer",       20L,
    "2020-02-02",  "hammer",       50L
    )

This is the example data set. The original data set is way longer with 100 items and dates.

  1. First I subset data for one day
  2. Sum all item quantity in that day
  3. Divide each item quantity with the sum(from previous step), lets call the result weight
  4. multiply each items quantity with the corresponding weight and sum the results for each item.
  5. Divide result from step 4 with the sum of all weights.
  6. Repeat for all days.

The result from the example should be:

  1. First all items in day 2020-01-01
  2. 10 100 20 80 = 210
  3. 10/210 = 0.048, 100/210 = 0.48, 20/210 = 0.095, 80/210 = 0.38
  4. 10 * 0.048 100 * 0.48 20 * 0.095 80 * 0.38 = 0.48 48 1.9 30.4 = 80.78
  5. 80.78/(0.048 0.48 0.095 0.38) = 80.78/1.003 = 80.54
  6. And the same process for 2020-02-02

I though that maybe its doable with a for loop, this is how far I got and now I am stuck:

inv <- read.csv("Book1.csv")
inv$Date <- as.Date(inv$Date)
n <- unique(inv$Date)
item <- vector()
length(n)
for (i in 1:length(n)){
  day <- subset(inv, subset = (Date == n[i]))
  day_total <- sum(day$Quantity)
  m <- unique(day$Material)
  for (j in 1:length(m)){
    material <- subset(day, subset = (Material == m[j]))
    material_total <- sum(material$Quantity)
    material_weight <- material_total/day_total
    item[j] <- material_total/day_total
    
    
  }
}

If you have any suggestion it will be greatly appreciated

CodePudding user response:

Here's the dplyr version of your algorithm:

library(dplyr)
inv %>%
  group_by(Date) %>%
  mutate(
    weight = Quantity / sum(Quantity),
  ) %>%
  summarize(
    result = sum(Quantity * weight)
  ) 
# # A tibble: 2 × 2
#   Date       result
#   <chr>       <dbl>
# 1 2020-01-01   80.5
# 2 2020-02-02   54.2

Or we can use the built-in weighted.mean function directly for the same result:

inv %>%
  group_by(Date) %>%
  summarize(
    result = weighted.mean(Quantity, w = Quantity / sum(Quantity))
  ) 

If I'm misunderstanding the goal, please edit your question to show the desired output for the sample input.

  • Related