I have a table with multiple variables.
One of the variables is the Sex column of the participants of a study conducted by some other lab.
the problem is, the Sex is defined sometimes as F, FEMALE or female, the same goes to the males.
I need to make all the females go under F and all the males under M, using the plyr package (it's very old I know) -> mapvalues
function.
this is the data frame, it's pretty basic
mre_data <- structure(list(name = c("John", "Clara", "Smith", "Ray", "karen", "Ruba", "Josh", "Jennifer", "David", "Maria", "Sam"),
sex = c("Male", "F", "MALE", "M", "FEMALE", "female", "MALE", "F", "male", "FEMALE", "M"),
age = c(30L, 32L, 54L, 42L, 11L, 34L, 67L, 49L, 27L, 18L, 30L)),
class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
CodePudding user response:
sex <- c("fema", "f", "F", "M", "male", "MALe")
We don't need plyr
or any external packages:
toUpper(substr(sex,1,1))
Returns:
[1] "F" "F" "F" "M" "M" "M"
Edit
Use the solution above like this (in R, as opposed to other programming languages we very rarely mutate an object in place, so we almost always want to assign our values):
df$Sex <- toUpper(substr(df$Sex,1,1))
Or if you want to preserve the original column:
df$Sex_fixed <- toUpper(substr(df$Sex,1,1))
Another Edit:
As requested solutions using plyr::mapvalues
:
mre_data$sex_fixed <- plyr::mapvalues(mre_data$sex, c("Male", "MALE", "M", "male", "F", "FEMALE", "female"), c("M", "M", "M", "M", "F", "F", "F"))
mre_data
Is now:
name sex age sex_fixed
1 John Male 30 M
2 Clara F 32 F
3 Smith MALE 54 M
4 Ray M 42 M
5 karen FEMALE 11 F
6 Ruba female 34 F
7 Josh MALE 67 M
8 Jennifer F 49 F
9 David male 27 M
10 Maria FEMALE 18 F
11 Sam M 30 M
While this works it' doesn't make sense to me since we have to specify each replacement pair individually. But what we actually want is to apply a rule (first letter of word, uppercase) to each entry... (mapvalues
is intended for usecases where we don't have a simple rule and have to specify each replacement pair).
OR, and this doesn't make sense at all and you really should use the direct route via x <- toUpper(substr(...))
, but here we go - We can combine the "automatic" rules solution with the mapvalues
solution to create a very confused and over complicated solution ;) (but at least we don't have to hardcode each replacement pair):
mre_data$sex_fixed2 <- plyr::mapvalues(mre_data$sex, unique(mre_data$sex), toUpper(substr(unique(mre_data$sex),1,1)))
mre_data
Is now:
name sex age sex_fixed sex_fixed2
1 John Male 30 M M
2 Clara F 32 F F
3 Smith MALE 54 M M
4 Ray M 42 M M
5 karen FEMALE 11 F F
6 Ruba female 34 F F
7 Josh MALE 67 M M
8 Jennifer F 49 F F
9 David male 27 M M
10 Maria FEMALE 18 F F
11 Sam M 30 M M