Home > Back-end >  mutate column based on another dataframe
mutate column based on another dataframe

Time:12-08

I need to assign a numeric value to a large column of 70 character variables

When there are a handful of character variables I just mutate by hand (probably not the best way but it's quick and works) but this is not practical for 70 variables:

mutate(df,gender = ifelse(gender=="Female",0, ifelse(gender=="Male",1, 2))

I thought it would be best to create a new df(spec) with the 72 variables in column 1 and a number in column 2 to reference:

speciality Code
a 1
b 2
c 3
d 4
e 5
f 6

...

I can't figure out now how to mutate my data to swap speciality for numeric code.

Any help appreciated - especially if i'm going down the wrong route to do this in the first place

R knowledge is still quite basic. i've tried

mutate(df,speciality = ifelse(speciality==spec[,1],spec[,2],0))

but get an error

Error in env_has(env, name, inherit = TRUE) : 
  attempt to use zero-length variable name

CodePudding user response:

We may use inbuilt letters for matching

df$speciality <-  with(df, match(speciality, letters))

CodePudding user response:

Your approach will depend on how you want to assign numbers to each value. Here are a few options, using this example data:

set.seed(13)

df <- data.frame(speciality = sample(letters[1:4], 8, replace = TRUE))
df
#   speciality
# 1          b
# 2          d
# 3          d
# 4          a
# 5          d
# 6          d
# 7          c
# 8          a

If you want to code based on the order values appear in your dataset:

#either
mutate(df, speciality = match(speciality, unique(speciality)))

#or
mutate(df, speciality = as.integer(factor(speciality, unique(speciality))))

Both of these yield:

  speciality
1          1
2          2
3          2
4          3
5          2
6          2
7          4
8          3

If you instead want to code based on alphabetical order:

#either
mutate(df, speciality = match(speciality, sort(unique(speciality))))

#or
mutate(df, speciality = as.integer(factor(speciality)))

Both yield:

  speciality
1          2
2          4
3          4
4          1
5          4
6          4
7          3
8          1

If you don't care about the order, you can use any of these approaches.

  • Related