Home > Mobile >  Sumproduct in R using dplyr
Sumproduct in R using dplyr

Time:06-30

Using dplyr, I´m trying to sumproduct in R two columns as it can be done in Excel but although I´ve tried several aproaches, none worked properly.

A small piece of my dataframe (almost 61000 rows and 20 columns) looks like this:

df <-structure(list(dem_sect = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("AB", 
"EP", "FE", "MF", "PA"), class = "factor"), cod_estr = c("a", 
"a", "b", "b", "c", "b", "b", "b", "a", "b", "c", "b", "b", "a", 
"b", "c", "c", "b", "c"), sect_estr = structure(c(10L, 10L, 11L, 
11L, 12L, 11L, 11L, 11L, 10L, 11L, 12L, 11L, 11L, 10L, 11L, 12L, 
12L, 11L, 12L), .Label = c("ABa", "ABb", "ABc", "EPa", "EPb", 
"EPc", "FEa", "FEb", "FEc", "MFa", "MFb", "MFc", "PAa", "PAb", 
"PAc"), class = "factor"), area = c(14L, 14L, 24L, 24L, 11L, 
24L, 24L, 24L, 14L, 24L, 11L, 24L, 24L, 14L, 24L, 11L, 11L, 24L, 
11L), kg_med = c(5.88125, 5.88125, 6.8505, 6.8505, 
3.0852, 6.8505, 6.8505, 6.8505, 5.88125, 6.8505, 3.0852, 6.8505, 
6.8505, 5.88125, 6.8505, 3.0852, 3.0852, 6.8505, 3.0852)), class = "data.frame", row.names = c(NA, 
-19L))

I want to add a new column (called for example sumprod) which would be the sumproduct of the unique values of both "area" and "kg_med" grouped by "dem_sect"

The expect output will be a column with the value 280.69 (If I did the calculation correctly in Excel)

Any help will be more than welcome. Thanks in advance.

CodePudding user response:

library(tidyverse)

df %>% 
  group_by(dem_sect) %>% 
  filter(!duplicated(area, kg_med)) %>% 
  summarise(sumproduct = sum(area * kg_med)) %>% 
  as.data.frame() # to view the result with more significant digits

#>   dem_sect sumproduct
#> 1       MF   280.6867

CodePudding user response:

df %>% 
  group_by(dem_sect) %>% 
  distinct(area, kg_med) %>% 
  summarise(sumprod=sum(area*kg_med))
  dem_sect sumprod
  <fct>      <dbl>
1 MF          281.

This solution assumes that each value of area is associated with a single value of kg_med and vice-versa. The desired behaviour with different numbers of unique values in each of the two columns is not well specified.

To check the accuracy of the answer to more than three significant digits...

old <- options(pillar.sigfig = 7)
df %>% 
  group_by(dem_sect) %>% 
  distinct(area, kg_med) %>% 
  summarise(sumprod=sum(area*kg_med))

# A tibble: 1 × 2
  dem_sect  sumprod
  <fct>       <dbl>
1 MF       280.6867

options(old)
  • Related