Home > Mobile >  use for loop for unique element in r
use for loop for unique element in r

Time:12-24

I have a question about for loop in r. I have used the following for loop

for (i in 1:length(unique(iris$Species))) {
  
  datu <- data.frame(ID = unique(i),
                    Sl = mean(iris$Sepal.Length),
                    Sw = mean(iris$Sepal.Width))
                    
  
}

to get the mean of each unique species in iris. But my final data only has one observation. However my desired output is separate for setosa versicolor virginica. What should i change in this code? Thanks

CodePudding user response:

We don't need a loop. It can be done with group by approach

setNames(aggregate(.~ Species, iris[c(1, 2, 5)], mean), c("ID", "Sl", "Sw"))

-output

        ID    Sl    Sw
1     setosa 5.006 3.428
2 versicolor 5.936 2.770
3  virginica 6.588 2.974

Or with tidyverse

library(dplyr)
library(stringr)
iris %>% 
  group_by(ID = Species) %>%
  summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE), 
     .names = "{str_to_title(str_remove_all(.col, '[a-z.] '))}"))

-output

# A tibble: 3 × 3
  ID            Sl    Sw
  <fct>      <dbl> <dbl>
1 setosa      5.01  3.43
2 versicolor  5.94  2.77
3 virginica   6.59  2.97

In the loop, the unique(i) is just i, instead if we meant unique(iris$Species)[i]. In addition, the datu will get updated in each iteration, returning only the last output from the iteration. Instead, it can be stored in a list and rbind later or use

datu <- data.frame()
for (i in 1:length(unique(iris$Species))) {
  unqSp <- unique(iris$Species)[i]
  i1 <- iris$Species == unqSp
  datu <- rbind(datu, data.frame(ID = unqSp,
                    Sl = mean(iris$Sepal.Length[i1]),
                    Sw = mean(iris$Sepal.Width[i1])))
                    
  
}

-output

> datu
          ID    Sl    Sw
1     setosa 5.006 3.428
2 versicolor 5.936 2.770
3  virginica 6.588 2.974

CodePudding user response:

A tidyverse approach using .

dplyr::summarize
‘summarise()’ creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input.

library(dplyr)

iris %>% 
  group_by(Species) %>% 
  summarize(Sl = mean(Sepal.Length), Sw = mean(Sepal.Width))
# A tibble: 3 × 3
  Species       Sl    Sw
  <fct>      <dbl> <dbl>
1 setosa      5.01  3.43
2 versicolor  5.94  2.77
3 virginica   6.59  2.97
  •  Tags:  
  • r
  • Related