I am using a simple command with dyplr to first filter a dataframe by two columns and then report the sum of another column. However I would like to create a loop so that the filtering criteria can be automated by a list of values. For example the code for a single instance:
library(dplyr)
df = data.frame(Category1 = sample(c("FilterMe","DoNotFilterMe"), 15, replace=TRUE),
Category2 = sample(c("1","3","5","10"),15, replace=TRUE),
Value = 1:15)
df %>%
filter(Category1=="FilterMe" & Category2="1") %>%
summarize(result=sum(Value))
This works perfectly and I get a single value of 15. However I would like to loop the command such that I can do multiple values for Category2 defined by a list of integers (not sequential). I want it to loop for each value of i and provide a different output value each time. I tried the code below but was left with a null value.
library(dplyr)
for (i in c(1,3,5,10){
df %>%
filter(Category1=="FilterMe" & Category2="i") %>%
summarize(result=sum(Value))}
If there is another way besides loop that would fulfill the same objective that is fine by me.
CodePudding user response:
If I understood what you want to do, you are looking for group_by.
library(dplyr)
df %>%
filter(Category1 =="FilterMe") %>%
group_by(Category2) %>%
summarize(result=sum(Value))
CodePudding user response:
We don't need a loop. It can be simplified with %in%
instead of ==
and then do group_by
sum
approach
library(dplyr)
df %>%
filter(Category1=="FilterMe" & Category2 %in% c(1, 3, 5, 10)) %>%
group_by(Category2) %>%
summarize(result=sum(Value))
-output
# A tibble: 4 × 2
Category2 result
<chr> <int>
1 1 4
2 10 15
3 3 17
4 5 19
With a for
loop, we need to store the output in each of the iteration i.e. a list
v1 <- c(1, 3, 5, 10)
lst1 <- vector('list', length(v1))
for (i in seq_along(v1)){
lst1[[i]] <- df %>%
filter(Category1=="FilterMe" & Category2 ==v1[i]) %>%
summarize(result=sum(Value))
}
-output
> lst1
[[1]]
result
1 4
[[2]]
result
1 17
[[3]]
result
1 19
[[4]]
result
1 15
Or may directly store the output in a list
with map
/lapply
library(purrr)
map(c(1, 3, 5, 10), ~
df %>%
filter(Category1 == "FilterMe", Category2 == .x) %>%
summarise(result = sum(Value)))
-output
[[1]]
result
1 4
[[2]]
result
1 17
[[3]]
result
1 19
[[4]]
result
1 15