Home > Back-end >  How to use mapvalues function in order to change values inside a specific column ? (Using R)
How to use mapvalues function in order to change values inside a specific column ? (Using R)


I have a table with multiple variables. One of the variables is the Sex column of the participants of a study conducted by some other lab. the problem is, the Sex is defined sometimes as F, FEMALE or female, the same goes to the males. I need to make all the females go under F and all the males under M, using the plyr package (it's very old I know) -> mapvalues function. this is the data frame, it's pretty basic

mre_data <- structure(list(name = c("John", "Clara", "Smith", "Ray", "karen", "Ruba", "Josh", "Jennifer", "David", "Maria", "Sam"), 
               sex = c("Male",  "F", "MALE", "M", "FEMALE", "female", "MALE", "F", "male", "FEMALE", "M"), 
               age = c(30L, 32L, 54L, 42L, 11L, 34L, 67L, 49L, 27L, 18L, 30L)), 
          class = "data.frame", row.names = c("1", "2", "3", "4",  "5", "6", "7", "8", "9", "10", "11"))

CodePudding user response:

sex <- c("fema", "f", "F", "M", "male", "MALe")

We don't need plyr or any external packages:



[1] "F" "F" "F" "M" "M" "M"


Use the solution above like this (in R, as opposed to other programming languages we very rarely mutate an object in place, so we almost always want to assign our values):

df$Sex <- toUpper(substr(df$Sex,1,1))

Or if you want to preserve the original column:

df$Sex_fixed <- toUpper(substr(df$Sex,1,1))

Another Edit:

As requested solutions using plyr::mapvalues:

mre_data$sex_fixed <- plyr::mapvalues(mre_data$sex, c("Male", "MALE", "M", "male", "F", "FEMALE", "female"), c("M", "M", "M", "M", "F", "F", "F"))


Is now:

         name    sex age sex_fixed
  1      John   Male  30         M
  2     Clara      F  32         F
  3     Smith   MALE  54         M
  4       Ray      M  42         M
  5     karen FEMALE  11         F
  6      Ruba female  34         F
  7      Josh   MALE  67         M
  8  Jennifer      F  49         F
  9     David   male  27         M
  10    Maria FEMALE  18         F
  11      Sam      M  30         M

While this works it' doesn't make sense to me since we have to specify each replacement pair individually. But what we actually want is to apply a rule (first letter of word, uppercase) to each entry... (mapvalues is intended for usecases where we don't have a simple rule and have to specify each replacement pair).

OR, and this doesn't make sense at all and you really should use the direct route via x <- toUpper(substr(...)), but here we go - We can combine the "automatic" rules solution with the mapvalues solution to create a very confused and over complicated solution ;) (but at least we don't have to hardcode each replacement pair):

mre_data$sex_fixed2 <- plyr::mapvalues(mre_data$sex, unique(mre_data$sex), toUpper(substr(unique(mre_data$sex),1,1)))


Is now:

         name    sex age sex_fixed sex_fixed2
  1      John   Male  30         M          M
  2     Clara      F  32         F          F
  3     Smith   MALE  54         M          M
  4       Ray      M  42         M          M
  5     karen FEMALE  11         F          F
  6      Ruba female  34         F          F
  7      Josh   MALE  67         M          M
  8  Jennifer      F  49         F          F
  9     David   male  27         M          M
  10    Maria FEMALE  18         F          F
  11      Sam      M  30         M          M
  • Related