Converting dataframe with factors to data matrix in R-CodePudding

I have a data frame like so in R:

> race <- factor(c(0,1,0,1,1))
> income <- factor(c(1,1,1,0,0))
> df <- data.frame(race, income)
> df
  race income
1    0      1
2    1      1
3    0      1
4    1      0
5    1      0

I want to convert it to a data matrix

When i do so i get this where my values get added by 1:

t <- data.matrix(df)
> t
     race income
[1,]    1      2
[2,]    2      2
[3,]    1      2
[4,]    2      1
[5,]    2      1

Why does this happen and how do i ensure the values of the data matrix are the same as the data frame?

CodePudding user response：

You can subtract 1 from the matrix to make the values align.

t <- data.matrix(df)
t - 1
#>      race income
#> [1,]    0      1
#> [2,]    1      1
#> [3,]    0      1
#> [4,]    1      0
#> [5,]    1      0

Or you can first convert to character and then to a matrix.

lapply(df, as.character) |> 
  lapply(as.numeric) |> 
  as.data.frame() |> 
  as.matrix()
#>      race income
#> [1,]    0      1
#> [2,]    1      1
#> [3,]    0      1
#> [4,]    1      0
#> [5,]    1      0

Edit: Just saw you comment about only changing factor vars to numeric. In that case try this:

library(dplyr)

df |> 
  mutate(across(where(is.factor), 
                \(x) as.numeric(as.character(x)))) |> 
  as.matrix()
#>      race income
#> [1,]    0      1
#> [2,]    1      1
#> [3,]    0      1
#> [4,]    1      0
#> [5,]    1      0

Some background: When converting a factor to numeric R will conserve the order of the factor levels by counting up starting with 1, not 0.

factor("A") |> as.numeric()
#> [1] 1
factor(0) |> as.numeric()
#> [1] 1
factor(c(1,3,7)) |> as.numeric()
#> [1] 1 2 3

If your factor levels are numbers, you can conserve the exact values by converting to character first.

factor(0) |> as.character() |> as.numeric()
#> [1] 0
factor(c(1,3,7)) |> as.character() |> as.numeric()
#> [1] 1 3 7

CodePudding user response：

Perhaps you can try class<- as.matrix like below

> `class<-`(as.matrix(df),"numeric")
     race income
[1,]    0      1
[2,]    1      1
[3,]    0      1
[4,]    1      0
[5,]    1      0