Home > Back-end >  Combine dplyr::group_by with dplyr::summarise and dplyr::summarise_if in a single step
Combine dplyr::group_by with dplyr::summarise and dplyr::summarise_if in a single step

Time:11-23

I would like to combine the summarise_if statement (summarise all numeric variables) with the summarise to count the amount of observations. In the iris example, I would like to

  1. count the number of observations per Species and add this count as a column in the new table
  2. summarise all numeric variables (Sepal.Length,Sepal.Width, Petal.Length, Petal.Width) by Species.

Number 1) I can get by this code:

iris %>% 
group_by(Species)%>% 
summarise(n = n())

Number 2) I can get by this code:

iris %>% 
group_by(Species)%>% 
summarise_if(is.numeric, median, na.rm = TRUE)  

But I am struggling with combining both. Just pipeing one after the other does give me a different result. My desired output would be this: enter image description here

CodePudding user response:

Use across:

iris %>%
  group_by(Species) %>%
  summarise(n = n(), across(where(is.numeric), median, na.rm = TRUE))

For those interested, the same thing in data.table:

setDT(iris)
iris[, j = data.frame(n = .N, lapply(.SD, median, na.rm = TRUE)),
     .SDcols = names(iris)[sapply(iris, is.numeric)],
     by = Species]
  • Related