I'm trying to connect multiple processes with the native pipe |>
in R.
In the following MWE,
- I give the data frame
iris
to the first and second process simultaneously in(\(x){...})()
; - In
(\(x){...})()
, the output of the first process is named assetosa.Sepal.Length.np
and that of the second process is named asversicolor.Sepal.Length.np
by the double arrow assignment operator->>
; - The correlation between the data of these two outputs is calculated with
cor()
, and this calculation is not directly connected to the previous chain-process.
iris |>
(\(x){
## First process
filter(
x,
Species == "setosa"
) |>
dplyr::select(Sepal.Length) ->>
setosa.Sepal.Length.np
## Second process
filter(
x,
Species == "versicolor"
) |>
dplyr::select(Sepal.Length) ->>
versicolor.Sepal.Length.np
})()
cor(setosa.Sepal.Length.np, versicolor.Sepal.Length.np)
I want to directly connect the process of correlation calculation to the previous processes with |>
. To do so, I have to remain the output of the first and second process unnamed, and I must not create the object setosa.Sepal.Length.np
and versicolor.Sepal.Length.np
. However, how should I refer to the output of the first and second process, then?
iris |>
(\(x){
filter(
x,
Species == "setosa"
) %>%
dplyr::select(Sepal.Length) # remain the first output unnamed
filter(
x,
Species == "versicolor"
) %>%
dplyr::select(Sepal.Length) # remain the second output unnamed
})() |> # send the outputs in this process to the next `cor()`
cor(????, ????) # How should I refer to the first and second output here?
Supplement code
The first MWE is equivalent to the following codes with magrittr
's %>%
.
require(magrittr)
iris %>%
{
filter(
.,
Species == "setosa"
) %>%
dplyr::select(Sepal.Length) ->>
setosa.Sepal.Length
filter(
.,
Species == "versicolor"
) %>%
dplyr::select(Sepal.Length) ->>
versicolor.Sepal.Length
}
cor(setosa.Sepal.Length, versicolor.Sepal.Length)
CodePudding user response:
With the native pipe you can only pipe values into the first parameter. You can never pipe values into the second parameter of cor()
. There's just no way to do that. You'd need an intermediate function that could take a list of values and return a list from the previous step. For example here we can create a helper function corlist
to accept the list.
corlist <- function(x) cor(x[[1]], x[[2]])
iris |>
(\(x){
list(
filter(
x,
Species == "setosa"
) %>%
dplyr::select(Sepal.Length),
filter(
x,
Species == "versicolor"
) %>%
dplyr::select(Sepal.Length)
)
})() |>
corlist()
# Sepal.Length
# Sepal.Length -0.08084973
CodePudding user response:
If you must call variable names specifically, one possible way using with
library(dplyr)
iris %>% {
list(
subset(., Species == 'setosa', select = 'Sepal.Length') %>%
rename(., 'setosa.Sepal.Length' = 'Sepal.Length'),
subset(., Species == 'versicolor', select = 'Sepal.Length') %>%
rename(., 'versicolor.Sepal.Length' = 'Sepal.Length')
)
} %>%
do.call(cbind, .) %>% with(., cor(setosa.Sepal.Length, versicolor.Sepal.Length))
If only cor
is necessary, can codegolf this a bit and avoid use of dplyr
:
iris %>% {
list(
subset(., Species == 'setosa', select = 'Sepal.Length'),
subset(., Species == 'versicolor', select = 'Sepal.Length')
)
} %>%
do.call(cbind, .) %>% cor # or {cor(.[,1],.[,2])} although less elegant