Home > database >  How can I use the filter function (dplyr) in a for loop?
How can I use the filter function (dplyr) in a for loop?

Time:06-09

I have a macro that I want to return to me the minimum value in the established range on the data frame. So I did that

for (i in 36:39) {
  a <- emi_sigma1 %>% filter(between(emi_sigma$V1, i, i   0.15)) 
  b <- a %>% slice_min(emi_sigma1$V2)
  print(b)
  
  i == i   0.15
  
}

The expected result should be 20 lines corresponding the 20 min values in each range, but the macro returns me only 3 lines and two of them are empty. I already test the function outside the loop and it worked. Any guess?

CodePudding user response:

I am not sure what you wish to achieve. Could you put a sample dataset so we can run this function?

From my understanding, you may want to do:

for(i in seq_along(36, 39-0.15, by = 0.15)) {
  a <- emi_sigma1 %>% filter(between(emi_sigma$V1, i, i   0.15)) 
  b <- a %>% slice_min(emi_sigma1$V2)
  print(b)
}

or:

# Initiate i
i = 36
while (i<39) {
  a <- emi_sigma1 %>% filter(between(emi_sigma$V1, i, i   0.15)) 
  b <- a %>% slice_min(emi_sigma1$V2)
  print(b)
  
  i = i   0.15
}

CodePudding user response:

How can one go about doing something like this with the use of for loops? I understand that this particular exercise does not require the use of fancy loops as one can easily get the average petal length of all the species by using, for example, the "group_by" function in dplyr, but I am looking to output close to a 100 unique table and PDF files with the dataset that I am working with and knowing how to use for loops would really help for that purpose.

解决方案 It is unfortunate that your code didn't raise any errors. If you run your code line by line you'll understand what I'm saying. For this example I will choose the first iteration of your loop, let's replace i for "setosa":

iris %>% filter(iris$Species == unique(iris$Species)["setosa"]) [1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<0 rows> (or 0-length row.names) Your filter yields a data frame with no observations, so no point in going ahead, but for this example, let's run the rest of the code:

iris %>% filter(iris$Species == unique(iris$Species)["setosa"]) %>%

  • summarize(mean(iris$Petal.Length)) mean(iris$Petal.Length) 1 3.758 What happened is that you're calling the iris dataset from within your code, a more obvious example would be:

filter(iris, iris$Species == unique(iris$Species)["setosa"]) %>%

  • summarize(mean(mtcars$cyl)) mean(mtcars$cyl) 1 6.1875 That's why you don't get the answer you expected, your filter didn't work and you got a summary statistic from another dataset.

As TJ Mahr mentioned, your code without specifying the dataset runs fine:

for (i in unique(iris$Species))

  • {
  • iris %>% filter(Species==i) %>%
    
  •     summarize(mean(Petal.Length)) %>% print()
    
  • print(i) 
    
  • } mean(Petal.Length) 1 1.462 [1] "setosa" mean(Petal.Length) 1 4.26 [1] "versicolor" mean(Petal.Length) 1 5.552 [1] "virginica"
  • Related