Home > database >  How does the new native pipe placeholder works, exactly?
How does the new native pipe placeholder works, exactly?

Time:06-03

I don't understand how the new native pipe placeholder works. Prior to R 4.2, the native pipe didn't have a placeholder so you needed to create a dedicated anonymous function in order to pass the piped object to function arguments other than the first. Now, after the release of R 4.2, the native pipe got a dedicated placeholder _ too. I'm also aware that this new placeholder only works if the name of the argument that will receive the placeholder is directly stated: R 4.2.0 Native Placeholder. However I'm still facing some trouble and can't fully understand how to implement it.

I'll give you an example. I wrote a simple piped code chunk that takes an object and returns how many missing values there are in each column.

x = c(NA, NA, 1, NA, 1, 2)
m = matrix(x, nrow = 3, ncol = 2)
m

#      [,1] [,2]
# [1,]   NA   NA
# [2,]   NA    1
# [3,]    1    2


#### CHECK FOR MISSING VALUES ####
m |> 
  { \(.) .colSums(is.na(.), NROW(.), NCOL(.)) }() |> 
  { \(sum.NA) rbind(names(m), sum.NA) }() |> 
  t()

#      sum.NA
# [1,]      2
# [2,]      1

The previous code uses the anonymous function method and works nicely. I'm not able to change this code into properly using the new placeholder. Do you have any suggestion?

CodePudding user response:

The placeholder was introduced in . From R News, CHANGES IN R 4.2.0, section NEW FEATURES, my emphasis.

  • In a forward pipe |> expression it is now possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs.

You can use the placeholder once only in a rhs named argument.

Though the blog post linked to in the question mentions the named argument obligation and gives right and wrong ways of using the placeholder, it does not mention that it can be used only once.


In the question's case, there is no need to use the new placeholder _.

x = c(NA, NA, 1, NA, 1, 2)
m = matrix(x, nrow = 3, ncol = 2)
m
#>      [,1] [,2]
#> [1,]   NA   NA
#> [2,]   NA    1
#> [3,]    1    2

m |>
  is.na() |>
  colSums() |>
  matrix(dimnames = list(NULL, 'sum.NA'))
#>      sum.NA
#> [1,]      2
#> [2,]      1

Created on 2022-06-02 by the reprex package (v2.0.1)


Another way, one function per step, this time using the placeholder.
(I only remembered to use cbind after reading Gabor's answer.)

m |>
  is.na() |>
  colSums() |>
  cbind(sum.NA = _)
#>      sum.NA
#> [1,]      2
#> [2,]      1

Created on 2022-06-02 by the reprex package (v2.0.1)

CodePudding user response:

You will need to restructure this a bit to take advantage of _ . _ does not directly address the problem of using the LHS multiple times on the RHS and does not address the problem of nesting functions on the RHS, both of which are problems that the code faces. Also note that the code in the question reuses m again within the code which really defeats the left to right idea of pipes. Also names(m) is NULL since m has no names.

We create a list with a single element named x and then use that in the next line to solve the problem of having to refer to it 3 times and also to address the nested calls. In the rbind we eliminated reference to m since rbinding NULL is pointless. We did manage to use _ twice and eliminate all the anonymous functions while keeping mostly to the idea of the code in the question.

m |>
  list(x = _) |>
  with(.colSums(is.na(x), NROW(x), NCOL(x))) |>
  rbind(sum.NA = _) |>
  t()
  •  Tags:  
  • r
  • Related