Home > Net >  Why mutate across in dplyr create columns with "[,1]" at the end?
Why mutate across in dplyr create columns with "[,1]" at the end?

Time:12-31

See code below.

the mutate(across(everything(), scale, .names = "{.col}_z")) part of the syntax is generating columns with [,1]appended at the end.

Two questions:

  1. Why is this happening?
  2. How can I avoid or remove it?
library(dplyr)

# Input
df_test <- tibble(x = c(1, 2, 3, 4), y = c(5, 6, 7, 8))

# My code generating x_z and y_z
df_scaled <- df_test %>% 
  mutate(across(everything(), scale, .names = "{.col}_z"))

# Output
df_scaled
#> # A tibble: 4 × 4
#>       x     y x_z[,1] y_z[,1]
#>   <dbl> <dbl>   <dbl>   <dbl>
#> 1     1     5  -1.16   -1.16 
#> 2     2     6  -0.387  -0.387
#> 3     3     7   0.387   0.387
#> 4     4     8   1.16    1.16

Expected output

#> # A tibble: 4 × 4
#>       x     y     x_z     y_z
#>   <dbl> <dbl>   <dbl>   <dbl>
#> 1     1     5  -1.16   -1.16 
#> 2     2     6  -0.387  -0.387
#> 3     3     7   0.387   0.387
#> 4     4     8   1.16    1.16

Created on 2022-12-30 with reprex v2.0.2

CodePudding user response:

scale returns a matrix. We may either use c or extract the column with [ or use as.numeric to remove the dim attributes

library(dplyr)
df_test %>% 
  mutate(across(everything(),
     ~ as.numeric(scale(.x)), .names = "{.col}_z"))

-output

# A tibble: 4 × 4
      x     y    x_z    y_z
  <dbl> <dbl>  <dbl>  <dbl>
1     1     5 -1.16  -1.16 
2     2     6 -0.387 -0.387
3     3     7  0.387  0.387
4     4     8  1.16   1.16 

i.e. check the output on a single column

> scale(df_test[[1]])
           [,1]
[1,] -1.1618950
[2,] -0.3872983
[3,]  0.3872983
[4,]  1.1618950
attr(,"scaled:center")
[1] 2.5
attr(,"scaled:scale")
[1] 1.290994

If we check the source code

> scale.default
function (x, center = TRUE, scale = TRUE) 
{
    x <- as.matrix(x) # it is converting to matrix
...

and is required in applying apply/colMeans/sweep, thus when we pass a vector to the scale, it does convert it to a single column matrix

> as.matrix(df_test$x)
     [,1]
[1,]    1
[2,]    2
[3,]    3
[4,]    4
  • Related