See code below.
the mutate(across(everything(), scale, .names = "{.col}_z"))
part of the syntax is generating columns with [,1]
appended at the end.
Two questions:
- Why is this happening?
- How can I avoid or remove it?
library(dplyr)
# Input
df_test <- tibble(x = c(1, 2, 3, 4), y = c(5, 6, 7, 8))
# My code generating x_z and y_z
df_scaled <- df_test %>%
mutate(across(everything(), scale, .names = "{.col}_z"))
# Output
df_scaled
#> # A tibble: 4 × 4
#> x y x_z[,1] y_z[,1]
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 5 -1.16 -1.16
#> 2 2 6 -0.387 -0.387
#> 3 3 7 0.387 0.387
#> 4 4 8 1.16 1.16
Expected output
#> # A tibble: 4 × 4
#> x y x_z y_z
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 5 -1.16 -1.16
#> 2 2 6 -0.387 -0.387
#> 3 3 7 0.387 0.387
#> 4 4 8 1.16 1.16
Created on 2022-12-30 with reprex v2.0.2
CodePudding user response:
scale
returns a matrix
. We may either use c
or extract the column with [
or use as.numeric
to remove the dim
attributes
library(dplyr)
df_test %>%
mutate(across(everything(),
~ as.numeric(scale(.x)), .names = "{.col}_z"))
-output
# A tibble: 4 × 4
x y x_z y_z
<dbl> <dbl> <dbl> <dbl>
1 1 5 -1.16 -1.16
2 2 6 -0.387 -0.387
3 3 7 0.387 0.387
4 4 8 1.16 1.16
i.e. check the output on a single column
> scale(df_test[[1]])
[,1]
[1,] -1.1618950
[2,] -0.3872983
[3,] 0.3872983
[4,] 1.1618950
attr(,"scaled:center")
[1] 2.5
attr(,"scaled:scale")
[1] 1.290994
If we check the source code
> scale.default
function (x, center = TRUE, scale = TRUE)
{
x <- as.matrix(x) # it is converting to matrix
...
and is required in applying apply/colMeans/sweep
, thus when we pass a vector to the scale
, it does convert it to a single column matrix
> as.matrix(df_test$x)
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4