Home > Blockchain >  Creating multiple variables from single call to piping?
Creating multiple variables from single call to piping?

Time:10-18

I am new to tidyverse and want to use pipes to create two new variables, one representing the sum of the petal lengths by Species, and one representing the number of instances of each Species, and then to represent that in a new list alongside the Species names.

The following code does the job, but

library(dplyr)

petal_lengths <- iris %>% group_by(Species) %>% summarise(total_petal_length = sum(Petal.Length))

totals_per_species <- iris %>% count(Species, name="Total")

combined_data <- modifyList(petal_lengths,totals_per_species)

My questions are:

  1. Is it possible to do this without the creating those two intermediate variables petal_lengths and totals_per_species, i.e. through a single line of piping code rather than two.

  2. If so, is doing this desirable, either abstractly or according to standard conceptions of good tidyverse coding style?

I read here that

The pipe can only transport one object at a time, meaning it’s not so suited to functions that need multiple inputs or produce multiple outputs.

which makes me think maybe the answer to my first question is "No", but I'm not sure.

CodePudding user response:

You could achieve your desired result in one pipeline like so:

library(dplyr)

iris %>% 
  group_by(Species) %>% 
  summarise(total_petal_length = sum(Petal.Length), Total = n())
#> # A tibble: 3 × 3
#>   Species    total_petal_length Total
#>   <fct>                   <dbl> <int>
#> 1 setosa                   73.1    50
#> 2 versicolor              213      50
#> 3 virginica               278.     50

CodePudding user response:

I think Stefan's answer is the correct one for this particular example, and in general you can get the pipe to work with most data manipulation tasks without writing intermediate variables. However, there is perhaps a broader question here.

There are some situations in which the writing of intermediate variables is necessary, and other situations where you have to write more complicated code in the pipe to avoid creating intermediate variables.

I have used a little helper function in some situations to avoid this, which writes a new variable as a side effect. This variable can be re-used within the same pipeline:

branch <- function(.data, newvar, value) {
  newvar <- as.character(as.list(match.call())$newvar)
  assign(newvar, value, parent.frame(2))
  return(.data)
}

You would use it in the pipeline like this:

iris %>% 
  branch(totals_per_species, count(., Species, name = "Total")) %>%
  group_by(Species) %>% 
  summarise(total_petal_length = sum(Petal.Length)) %>%
  modifyList(totals_per_species)

#> # A tibble: 3 x 3
#>   Species    total_petal_length Total
#>   <fct>                   <dbl> <int>
#> 1 setosa                   73.1    50
#> 2 versicolor              213      50
#> 3 virginica               278.     50

This function works quite well in interactive sessions, but there are probably scoping problems when used in more complex settings. It's certainly not standard coding practice, though I have often wondered whether a more robust version might be a useful addition to the tidyverse.

  • Related