Home > Software engineering >  Mapping pipes to multiple columns in tidyverse
Mapping pipes to multiple columns in tidyverse

Time:10-03

I'm working with a table for which I need to count the number of rows satisfying some criterion and I ended up with basically multiple repetitions of the same pipe differing only in the variable name.

Say I want to know how many cars are better than Valiant in mtcars on each of the variables there. An example of the code with two variables is below:

library(tidyverse)

reference <- mtcars %>% 
     slice(6)

mpg <- mtcars  %>% 
  filter(mpg > reference$mpg) %>%
  count() %>% 
  pull()

cyl <- mtcars  %>% 
  filter(cyl > reference$cyl) %>%
  count() %>% 
  pull()

tibble(mpg, cyl)

Except, suppose I need to do it for like 100 variables so there must be a more optimal way to just repeat the process.

What would be the way to rewrite the code above in an optimal way (maybe, using map() or anything else that works with pipes nicely so that the result would be a tibble with the counts for all the variables in mtcars?

I feel the solution should be very easy but I'm stuck. Thank you!

CodePudding user response:

You could use summarise across to count observations greater than a certain value in each column.

library(dplyr)

mtcars %>%
  summarise(across(everything(), ~ sum(. > .[6])))

#   mpg cyl disp hp drat wt qsec vs am gear carb
# 1  18  14   15 22   30 11    1  0 13   17   25

  • base solution:
# (1)
colSums(mtcars > mtcars[rep(6, nrow(mtcars)), ])

# (2)
colSums(sweep(as.matrix(mtcars), 2, mtcars[6, ], ">"))

# mpg  cyl disp   hp drat   wt qsec   vs   am gear carb
#  18   14   15   22   30   11    1    0   13   17   25

CodePudding user response:

You can do it in a loop for example. Like this:

library(tidyverse)

reference <- mtcars %>% 
  slice(6)

# Empty list to save outcome
list_outcome <- list()

# Get the columnnames to loop over
loop_var <- colnames(reference)
for(i in loop_var){
  nr <- mtcars  %>% 
    filter(mtcars[, i] > reference[, i]) %>%
    count() %>% 
    pull()
  # Save every iteration in the loop as the ith element of the list
  list_outcome[[i]] <- data.frame(Variable = i, Value = nr)
}

# combine all the data frames in the list to one final data frame
df_result <- do.call(rbind, list_outcome)
  • Related