R: transition from magrittr to native pipe and translation of a function


Please have a look at the reprex at the end of the post. For various reasons, I am transitioning from %>% to the native pipe. I struggle a bit sometimes and I need some comments on a couple of functions. In the first case (complete_data() function to be rewritten using |> ), I do not understand why I certain approach works and another one does not.

In the second case, (move_row() function), I have found a workaround but this does not generalize well to other functions I have. With magrittr, I can create a series of pipes which contain nrow(.) to pass the number of rows of whatever tibble I have at that point to a function. How can I do the same with the native pipe? Thanks a lot!

#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union

## First look at these functions. They just try to discard incomplete rows in
## a tibble

complete_data <- function(data){

res <- data %>% filter(complete.cases(.))



## By trial and error, I wrote this

complete_data_native <- function(data){

res <- data |>  (\(data) filter(data, complete.cases(data)))()



## this was my first attempt, but why does it fail?

complete_data_native_wrong <- function(data){

res <- data |>  (\(x) filter(x, complete.cases(x)))()



df <- structure(list(x = c(1, 2, NA, 4), y = c(NA, NA, 3, 4)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))

#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3    NA     3
#> 4     4     4

df |> complete_data()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     4     4

df |> complete_data_native()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     4     4

df |> complete_data_native_wrong() ### why does this fail
#> # A tibble: 3 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3     4     4

## Now another function. Given a tibble, it moves a row from ini_pos to fin_pos

move_row <- function(df, ini_pos, fin_pos){

    row_pick <- slice(df, ini_pos)

    if (fin_pos=="last"){

           res <- df   %>%
        slice(-ini_pos)  %>% 
        add_row(row_pick, .before = nrow(.))    
} else{
    res <- df   %>%
        slice(-ini_pos)  %>% 
        add_row(row_pick, .before = fin_pos)    


move_row_native_attempt <-  function(df, ini_pos, fin_pos){

    ll <- nrow(df) ## it gets the job done, but I do not want this

row_pick <- slice(df, ini_pos)

    if (fin_pos=="last"){

           res <- df  |>  
        slice(-ini_pos) |> 
            add_row(row_pick, .before = ll) ##I want to use the native pipe
        ## to write the equivalent of nrow(.)
        ## with magrittr placeholder but I cannot do that
} else{
    res <- df |>  
        slice(-ini_pos) |>  
        add_row(row_pick, .before = fin_pos)    


#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3    NA     3
#> 4     4     4

df |> move_row(1,"last")
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2    NA
#> 2    NA     3
#> 3     1    NA
#> 4     4     4

df |> move_row_native_attempt(1,"last") ## gets the job done, but it is not what I want. See comments in the function definition
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2    NA
#> 2    NA     3
#> 3     4     4
#> 4     1    NA

Created on 2022-06-29 by the reprex package (v2.0.1)

complete_data_native_wrong <- function(data){

res <- data |>  (\(x) filter(x, complete.cases(x)))()



Data masking is the reason that this lovely function doesn't work as expected.

"So, what actually happens?", you ask.

dplyr::filter() checks for a column named x, it indeed finds it, then passes the contents of that column to complete.cases(). The same happens when you use y instead of x.

complete.cases() ends up acting on a "vector" instead of a data.frame, hence the results.

"But... How do I ensure dplyr::filter() doesn't act that way?", you enquire.

That's where the bang-bang operator !! comes in. And we can now have complete_data_native_right():

complete_data_native_right <- function(data){

res <- data |>  (\(x) filter(x, complete.cases(!!x)))()
# res <- data |>  (\(y) filter(y, complete.cases(!!y)))()




For this one you can use the shorthand function notation without any hiccups:

move_row_native_attempt <-  function(df, ini_pos, fin_pos){
  row_pick <- slice(df, ini_pos)
  if (fin_pos=="last"){
    res <- df |> 
      slice(-ini_pos) |> 
      (\(x) add_row(x, row_pick, .before = nrow(x)))()
  } else{
    res <- df |> 
      slice(-ini_pos) |> 
      add_row(row_pick, .before = fin_pos)

I think it's simply because there is a column x in the data frame, and filter is using this x instead of the argument x to your in-line function. If you change the variable name from x to z in your function declaration, I think it works. Please see below.

Still, I think it's a strike against the base pipe that iris |> filter(complete.cases(_)) throws an error. Is the limitation that _ can only be used as a named argument to the piped function, and can't be used as a variable like . can?

#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union

complete_data_native_wrong <- function(data){
  res <- data |>  (\(z) filter(z, complete.cases(z)))()  # change to z

df <- structure(
  list(x = c(1, 2, NA, 4), 
       y = c(NA, NA, 3, 4)), 
  class = c("tbl_df", 
            "tbl", "data.frame"), 
  row.names = c(NA, -4L)

df |> complete_data_native_wrong()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     4     4

Created on 2022-06-29 by the reprex package (v2.0.1)

