I have a data frame df
with 2 columns one of them income
the other one level
, level it's ok, it's categorical, but income
is numerical, I want to recode it to categorical as well, for example if income < 130000
then using the name income = "Less than 130000"
, if income < 500000 but >=130000
then using the name income = "Between 130000 and 500000"
finally if income > 500000 but <= 2000000
then using the name income = "Between 5000000 and 20000000"
df %>% mutate_at(vars(one_of(df$income)),
function(x) case_when(
x < 130000 ~ "less than 130000",
x <500000 ~ "between 130000 and 500000",
x <=20000000 ~ "between 500000 and 2000000"
))
But it doesn't work, any help it's appreciated.
This is head(df)
please read ingresoph as income
CodePudding user response:
Please look below, we can remove the need for the function(x)
along with the _at
df %>% mutate(income =
case_when(
income < 130000 ~ "less than 130000",
income <500000 ~ "between 130000 and 500000",
income <=20000000 ~ "between 500000 and 2000000",
T ~ as.character(NA)
))
Basically only use mutate_at
if there is a specific reason to (i.e. I want to pull all numeric columns, or all character columns, etc.)
Also, if you attempt to do NA for any other outside values, make sure to wrap it in a as.character()
as your mutate will throw an error due to different datatypes (logical and character).
CodePudding user response:
A for loop with conditional expressions can also accomplish this using the base package.
#written in R version 4.2.1
#example data frame
level = letters[c(1,1,2,2,3,3,3,3,4,4)]
income = c(997413.1,1922400.2 ,488274.1,1016208.6,806846.4,100000.0,15000000.0 ,907597.5 ,810698.2 ,2057985.5)
df = data.frame(income, factor(level));df$desc = 0
for(i in 1:dim(df)[1]){
if(df$income[i] < 130000){
df$desc[i] = "less than 130000"}
if(df$income[i] >= 130000 & df$income[i] < 500000){
df$desc[i] = "Between 130000 and 500000"}
if(df$income[i] > 500000 & df$income[i] <= 2000000){
df$desc[i] = "Between 500000 and 2000000"}
if(df$desc[i] == 0){
df$desc[i] = "Other"}}
df$desc = factor(df$desc)
#
Result:
df
# income level desc
#1 997413.1 a Between 500000 and 2000000
#2 1922400.2 a Between 500000 and 2000000
#3 488274.1 b Between 130000 and 500000
#4 1016208.6 b Between 500000 and 2000000
#5 806846.4 c Between 500000 and 2000000
#6 100000.0 c less than 130000
#7 15000000.0 c Other
#8 907597.5 c Between 500000 and 2000000
#9 810698.2 d Between 500000 and 2000000
#10 2057985.5 d Other
str(df)
#'data.frame': 10 obs. of 3 variables:
# $ income : num 997413 1922400 488274 1016209 806846 ...
# $ factor.level.: Factor w/ 4 levels "a","b","c","d": 1 1 2 2 3 3 3 3 4 4
# $ desc : Factor w/ 4 levels "Between 130000 and 500000",..: 2 2 1 2 2 3 4 2 2 4