Home > Enterprise >  Recode a column in a data frame in R, base on < or > conditions
Recode a column in a data frame in R, base on < or > conditions

Time:07-01

I have a data frame df with 2 columns one of them income the other one level, level it's ok, it's categorical, but income is numerical, I want to recode it to categorical as well, for example if income < 130000 then using the name income = "Less than 130000" , if income < 500000 but >=130000 then using the name income = "Between 130000 and 500000" finally if income > 500000 but <= 2000000 then using the name income = "Between 5000000 and 20000000"

df %>%  mutate_at(vars(one_of(df$income)), 
            function(x) case_when(
              x < 130000 ~ "less than 130000",
              x <500000 ~ "between 130000 and 500000",
              x <=20000000  ~ "between 500000 and 2000000"
            )) 

But it doesn't work, any help it's appreciated.

This is head(df) please read ingresoph as income enter image description here

CodePudding user response:

Please look below, we can remove the need for the function(x) along with the _at

df %>%  mutate(income =  
             case_when(
              income < 130000 ~ "less than 130000",
              income <500000 ~ "between 130000 and 500000",
              income <=20000000  ~ "between 500000 and 2000000",
              T ~ as.character(NA)
            )) 

Basically only use mutate_at if there is a specific reason to (i.e. I want to pull all numeric columns, or all character columns, etc.)

Also, if you attempt to do NA for any other outside values, make sure to wrap it in a as.character() as your mutate will throw an error due to different datatypes (logical and character).

CodePudding user response:

A for loop with conditional expressions can also accomplish this using the base package.

#written in R version 4.2.1
#example data frame
level = letters[c(1,1,2,2,3,3,3,3,4,4)]
income =   c(997413.1,1922400.2 ,488274.1,1016208.6,806846.4,100000.0,15000000.0   ,907597.5 ,810698.2 ,2057985.5)

df = data.frame(income, factor(level));df$desc = 0
for(i in 1:dim(df)[1]){
if(df$income[i] < 130000){
df$desc[i] = "less than 130000"}
if(df$income[i] >= 130000 & df$income[i] < 500000){
df$desc[i] = "Between 130000 and 500000"}
if(df$income[i] > 500000 & df$income[i] <= 2000000){
df$desc[i] = "Between 500000 and 2000000"}
if(df$desc[i] == 0){
df$desc[i] = "Other"}}
df$desc = factor(df$desc)
#

Result:

df
#       income level                       desc
#1    997413.1     a Between 500000 and 2000000
#2   1922400.2     a Between 500000 and 2000000
#3    488274.1     b  Between 130000 and 500000
#4   1016208.6     b Between 500000 and 2000000
#5    806846.4     c Between 500000 and 2000000
#6    100000.0     c           less than 130000
#7  15000000.0     c                      Other
#8    907597.5     c Between 500000 and 2000000
#9    810698.2     d Between 500000 and 2000000
#10  2057985.5     d                      Other

 str(df)
#'data.frame':   10 obs. of  3 variables:
# $ income       : num  997413 1922400 488274 1016209 806846 ...
# $ factor.level.: Factor w/ 4 levels "a","b","c","d": 1 1 2 2 3 3 3 3 4 4
# $ desc         : Factor w/ 4 levels "Between 130000 and 500000",..: 2 2 1 2 2 3 4 2 2 4
  •  Tags:  
  • r
  • Related