Home > other >  Making all categorical variables start at 0 in R
Making all categorical variables start at 0 in R

Time:03-09

I have a data frame like so where every column is a categorical encoding:

> race <- factor(c(0,1,0,1,1))
> income <- factor(c(1,1,1,0,0))
> sex <- factor(c(1,1,1,3,2))
> df <- data.frame(race, income, sex)
> df
  race income  sex
1    0      1    1
2    1      1    1
3    0      1    1
4    1      0    3
5    1      0    2

How can I dynamically program in R dplyr so that every column starts at 0. For example race and income wouldn't be changed because the lowest value is already 0. But Sex would be need to changed so that every number is subtracted by 1.

expected output:

  race income  sex
1    0      1    0
2    1      1    0
3    0      1    0
4    1      0    2
5    1      0    1

Ideally the solution would use mutate and across, but I can't seem to get a solution.

CodePudding user response:

df %>%
  mutate(
    across(everything(), as.numeric),
    across(everything(), ~.-min(.)),
    across(everything(), as.factor)
  )
)

CodePudding user response:

You may change the factor levels to numeric and then subtract 1 from it. Change it back to factor if you want to maintain them as categorical.

Use across to apply the function to multiple columns.

library(dplyr)

relevel_to_0 <- function(x) {
  factor(as.integer(x) - 1)
}

df %>% mutate(across(.fns = relevel_to_0))

#  race income sex
#1    0      1   0
#2    1      1   0
#3    0      1   0
#4    1      0   0
#5    1      0   1

CodePudding user response:

you can add levels param when you make factor vector

race <- factor(c(0,1,0,1,1))
income <- factor(c(1,1,1,0,0))
sex <- factor(c(1,1,1,1,2), levels=0:2)

df <- data.frame(race, income, sex)
df[1,] <- 0
df

#   race income sex
# 1    0      0   0
# 2    1      1   1
# 3    0      1   1
# 4    1      0   1
# 5    1      0   2

CodePudding user response:

You can use the following codes:

library(tidyverse)

race <- factor(c(0,1,0,1,1))
income <- factor(c(1,1,1,0,0))
sex <- factor(c(1,1,1,3,2))
df <- data.frame(race, income, sex)

> df
  race income sex
1    0      1   1
2    1      1   1
3    0      1   1
4    1      0   3
5    1      0   2

df %>% mutate_if(is.factor, .funs = ~ as.numeric(.)) %>% mutate_if(.predicate = ~ min(.) > 0, .funs = ~ as.factor( . - 1))

  race income sex
1    0      1   0
2    1      1   0
3    0      1   0
4    1      0   2
5    1      0   1

or combine two mutate_if functions in one:

df  %>% mutate_if(.predicate = ~ (is.factor(.) & min(as.numeric(.)) > 0) , .funs = ~ as.factor( as.numeric(.) - 1))

  race income sex
1    0      1   0
2    1      1   0
3    0      1   0
4    1      0   2
5    1      0   1
  • Related