Home > Software design >  How to keep a chain processing going referring to unnamed outputs in an intermediate process in R?
How to keep a chain processing going referring to unnamed outputs in an intermediate process in R?

Time:10-27

I'm trying to connect multiple processes with the native pipe |> in R.

In the following MWE,

  1. I give the data frame iris to the first and second process simultaneously in (\(x){...})();
  2. In (\(x){...})(), the output of the first process is named as setosa.Sepal.Length.np and that of the second process is named as versicolor.Sepal.Length.np by the double arrow assignment operator ->>;
  3. The correlation between the data of these two outputs is calculated with cor(), and this calculation is not directly connected to the previous chain-process.
iris |>
  (\(x){
    ## First process
    filter(
      x,
      Species == "setosa"
      ) |>
    dplyr::select(Sepal.Length) ->>
    setosa.Sepal.Length.np

    ## Second process    
    filter(
      x,
      Species == "versicolor"
      ) |>
    dplyr::select(Sepal.Length) ->>
    versicolor.Sepal.Length.np
  })()

cor(setosa.Sepal.Length.np, versicolor.Sepal.Length.np)

I want to directly connect the process of correlation calculation to the previous processes with |>. To do so, I have to remain the output of the first and second process unnamed, and I must not create the object setosa.Sepal.Length.np and versicolor.Sepal.Length.np. However, how should I refer to the output of the first and second process, then?

iris |>
  (\(x){
    filter(
      x,
      Species == "setosa"
      ) %>%
    dplyr::select(Sepal.Length) # remain the first output unnamed

    filter(
      x,
      Species == "versicolor"
      ) %>%
    dplyr::select(Sepal.Length) # remain the second output unnamed
  })() |>                       # send the outputs in this process to the next `cor()`
  cor(????, ????)               # How should I refer to the first and second output here?

Supplement code

The first MWE is equivalent to the following codes with magrittr's %>%.

require(magrittr)
iris %>%
  {
    filter(
      .,
      Species == "setosa"
      ) %>%
    dplyr::select(Sepal.Length) ->>
    setosa.Sepal.Length

    filter(
      .,
      Species == "versicolor"
      ) %>%
    dplyr::select(Sepal.Length) ->>
    versicolor.Sepal.Length
  }

cor(setosa.Sepal.Length, versicolor.Sepal.Length)

CodePudding user response:

With the native pipe you can only pipe values into the first parameter. You can never pipe values into the second parameter of cor(). There's just no way to do that. You'd need an intermediate function that could take a list of values and return a list from the previous step. For example here we can create a helper function corlist to accept the list.

corlist <- function(x) cor(x[[1]], x[[2]])
iris |>
  (\(x){
    list(
      filter(
        x,
        Species == "setosa"
      ) %>%
        dplyr::select(Sepal.Length),
      filter(
        x,
        Species == "versicolor"
      ) %>%
        dplyr::select(Sepal.Length)
    )
  })() |> 
  corlist()
#              Sepal.Length
# Sepal.Length  -0.08084973

CodePudding user response:

If you must call variable names specifically, one possible way using with

library(dplyr)
iris %>% {
  list(
    subset(., Species == 'setosa', select = 'Sepal.Length') %>%
                                   rename(., 'setosa.Sepal.Length' = 'Sepal.Length'),
    subset(., Species == 'versicolor', select = 'Sepal.Length') %>% 
                                   rename(., 'versicolor.Sepal.Length' = 'Sepal.Length')
  )
} %>%
  do.call(cbind, .) %>% with(., cor(setosa.Sepal.Length, versicolor.Sepal.Length))

If only cor is necessary, can codegolf this a bit and avoid use of dplyr:

iris %>% {
  list(
    subset(., Species == 'setosa', select = 'Sepal.Length'),
    subset(., Species == 'versicolor', select = 'Sepal.Length')
  )
} %>%
  do.call(cbind, .) %>% cor                # or {cor(.[,1],.[,2])} although less elegant

  • Related