I have two dataframes that look somewhat like this (they're much bigger)
df1 <- data.frame(center = c("5012","5012","5025"),
product = c("Apple","Grape","Apple"),
value = c(20,30,50))
df1:
Center Product Value
1 5012 Apple 20
2 5012 Grape 30
3 5025 Apple 50
df2 <- data.frame(center = c("5012","5012","5012","5012","5012","5025"),
profitCenter = c("A","B","C","D","A","A"),
product = c("Apple","Apple","Apple", "Apple","Grape","Apple"),
volume = c(20,30,50,70,60,80))
df2:
Center Profitcenter Product Volume
1 5012 A Apple 20
2 5012 B Apple 30
3 5012 C Apple 50
4 5012 D Apple 70
5 5012 A Grape 60
6 5025 A Apple 80
I wanted to get the sum of the "volume" column from DF2 by "center" and "product" into DF1. In excel I would do a sumif, but I'm struggling to think on how to properly do this with R:
DF1:
Center Product Value Volume
1 5012 Apple 20 170
2 5012 Grape 30 60
3 5025 Apple 50 80
Currently I'm creating an aggregated version of DF2 (with dplyr's group_by) and then doing a left_join, but I have to do this a few more times and I'm sure there's a better way.
CodePudding user response:
Similar to @MrFlick commented but I would prefer to first summarise
in df2
and then add the resulting column to df1
. But outcome and performance are the same. As @r2evans pointed out:
library(tidyverse)
df2 %>%
group_by(center, product) %>%
summarise(Volume=sum(volume)) %>%
right_join(df1,by=c("center","product"))
Or:
library(tidyverse)
df1 %>%
left_join(
df2 %>%
group_by(center,product) %>%
summarise(sum(volume)),
by=c("center","product")
)
Output:
center product value sum(volume)
1 5012 Apple 20 170
2 5012 Grape 30 60
3 5025 Apple 50 80