I need to assign a numeric value to a large column of 70 character variables
When there are a handful of character variables I just mutate by hand (probably not the best way but it's quick and works) but this is not practical for 70 variables:
mutate(df,gender = ifelse(gender=="Female",0, ifelse(gender=="Male",1, 2))
I thought it would be best to create a new df(spec) with the 72 variables in column 1 and a number in column 2 to reference:
speciality | Code |
---|---|
a | 1 |
b | 2 |
c | 3 |
d | 4 |
e | 5 |
f | 6 |
...
I can't figure out now how to mutate my data to swap speciality for numeric code.
Any help appreciated - especially if i'm going down the wrong route to do this in the first place
R knowledge is still quite basic. i've tried
mutate(df,speciality = ifelse(speciality==spec[,1],spec[,2],0))
but get an error
Error in env_has(env, name, inherit = TRUE) :
attempt to use zero-length variable name
CodePudding user response:
We may use inbuilt letters
for match
ing
df$speciality <- with(df, match(speciality, letters))
CodePudding user response:
Your approach will depend on how you want to assign numbers to each value. Here are a few options, using this example data:
set.seed(13)
df <- data.frame(speciality = sample(letters[1:4], 8, replace = TRUE))
df
# speciality
# 1 b
# 2 d
# 3 d
# 4 a
# 5 d
# 6 d
# 7 c
# 8 a
If you want to code based on the order values appear in your dataset:
#either
mutate(df, speciality = match(speciality, unique(speciality)))
#or
mutate(df, speciality = as.integer(factor(speciality, unique(speciality))))
Both of these yield:
speciality
1 1
2 2
3 2
4 3
5 2
6 2
7 4
8 3
If you instead want to code based on alphabetical order:
#either
mutate(df, speciality = match(speciality, sort(unique(speciality))))
#or
mutate(df, speciality = as.integer(factor(speciality)))
Both yield:
speciality
1 2
2 4
3 4
4 1
5 4
6 4
7 3
8 1
If you don't care about the order, you can use any of these approaches.