I have a data frame like so in R:
> race <- factor(c(0,1,0,1,1))
> income <- factor(c(1,1,1,0,0))
> df <- data.frame(race, income)
> df
race income
1 0 1
2 1 1
3 0 1
4 1 0
5 1 0
I want to convert it to a data matrix
When i do so i get this where my values get added by 1:
t <- data.matrix(df)
> t
race income
[1,] 1 2
[2,] 2 2
[3,] 1 2
[4,] 2 1
[5,] 2 1
Why does this happen and how do i ensure the values of the data matrix are the same as the data frame?
CodePudding user response:
You can subtract 1 from the matrix to make the values align.
t <- data.matrix(df)
t - 1
#> race income
#> [1,] 0 1
#> [2,] 1 1
#> [3,] 0 1
#> [4,] 1 0
#> [5,] 1 0
Or you can first convert to character
and then to a matrix
.
lapply(df, as.character) |>
lapply(as.numeric) |>
as.data.frame() |>
as.matrix()
#> race income
#> [1,] 0 1
#> [2,] 1 1
#> [3,] 0 1
#> [4,] 1 0
#> [5,] 1 0
Edit: Just saw you comment about only changing factor
vars to numeric
. In that
case try this:
library(dplyr)
df |>
mutate(across(where(is.factor),
\(x) as.numeric(as.character(x)))) |>
as.matrix()
#> race income
#> [1,] 0 1
#> [2,] 1 1
#> [3,] 0 1
#> [4,] 1 0
#> [5,] 1 0
Some background: When converting a factor
to numeric
R will conserve the order of the
factor levels by counting up starting with 1
, not 0
.
factor("A") |> as.numeric()
#> [1] 1
factor(0) |> as.numeric()
#> [1] 1
factor(c(1,3,7)) |> as.numeric()
#> [1] 1 2 3
If your factor levels are numbers, you can conserve the exact values by
converting to character
first.
factor(0) |> as.character() |> as.numeric()
#> [1] 0
factor(c(1,3,7)) |> as.character() |> as.numeric()
#> [1] 1 3 7
CodePudding user response:
Perhaps you can try class<-
as.matrix
like below
> `class<-`(as.matrix(df),"numeric")
race income
[1,] 0 1
[2,] 1 1
[3,] 0 1
[4,] 1 0
[5,] 1 0