Home > Net >  Mutate entries in a column in dplyr
Mutate entries in a column in dplyr

Time:07-23

I have the following dataframe in R. Is there a way in which I can clean this column to have all of the "y" or "yes" entries displayed as "Yes" (similarly all the "nop" entries displayed as "No") in dplyr?

structure(list(has_elevator = c("Yes", "y", "y", "yes", "y", 
"Yes", "yes", "y", "Yes", "yes", "yes", "Yes", "Yes", "y", "Yes", 
"No", "Yes", "No", "y", "nop", "Yes", "yes", "Yes", "No", "Yes", 
"y", "Yes", "yes", "nop", "yes", "Yes", "nop", "yes", "Yes", 
"y", "y", "Yes", "no", "y", "Yes", "nop", "y", "y", "y", "No", 
"no", "y", "y", "Yes", "no")), class = "data.frame", row.names = c(NA, 
-50L))

CodePudding user response:

Here is an alternative approach: We could use str_detect with its argument ignore_case = T wrapped in an ifelse statement.

library(dplyr)
library(stringr)

df %>% 
  mutate(has_elevator  = ifelse(str_detect(has_elevator,  regex('y', ignore_case = T)), "Yes", "No"))
 has_elevator
1           Yes
2           Yes
3           Yes
4           Yes
5           Yes
6           Yes
7           Yes
8           Yes
9           Yes
10          Yes
11          Yes
12          Yes
13          Yes
14          Yes
15          Yes
16           No
17          Yes
18           No
19          Yes
20           No
21          Yes
22          Yes
23          Yes
24           No
25          Yes
26          Yes
27          Yes
28          Yes
29           No
30          Yes
31          Yes
32           No
33          Yes
34          Yes
35          Yes
36          Yes
37          Yes
38           No
39          Yes
40          Yes
41           No
42          Yes
43          Yes
44          Yes
45           No
46           No
47          Yes
48          Yes
49          Yes
50           No

CodePudding user response:

You can use case_when() within mutate() to recode your variable. As I also found that you had some values no rather than No, I also recoded those for you.

# Your example data
df <- structure(list(has_elevator = c("Yes", "y", "y", "yes", "y", 
                                "Yes", "yes", "y", "Yes", "yes", "yes", "Yes", "Yes", "y", "Yes", 
                                "No", "Yes", "No", "y", "nop", "Yes", "yes", "Yes", "No", "Yes", 
                                "y", "Yes", "yes", "nop", "yes", "Yes", "nop", "yes", "Yes", 
                                "y", "y", "Yes", "no", "y", "Yes", "nop", "y", "y", "y", "No", 
                                "no", "y", "y", "Yes", "no")), class = "data.frame", row.names = c(NA, 
                                                                                                   -50L))

Using case_when()

library(dplyr)

# Using case_when()
df_new <- df %>% mutate(
  has_elevator = case_when(
    has_elevator %in% c("y", "yes") ~ "Yes",
    has_elevator %in% c("nop", "no") ~ "No",
    TRUE ~ has_elevator
  )
)

df_new$has_elevator %>% table()
#> .
#>  No Yes 
#>  11  39

Using recode()

library(dplyr)

df_new <- df %>% mutate(
  has_elevator = recode(has_elevator, y = "Yes", yes = "Yes", nop = "No", no = "No")
)

df_new$has_elevator %>% table()
#> .
#>  No Yes 
#>  11  39

Combining string substitution with either function

You can skip recoding values to the proper case with a regular expression that automatically capitalizes the first letter of the string, whatever it is. This avoids possible oversight of case of values.

This is also a base approach that doesn't require the stringr package.

df_new <- df %>% mutate(
  has_elevator = case_when(
    has_elevator %in% c("y") ~ "Yes",
    has_elevator %in% c("no") ~ "No",
    TRUE ~ has_elevator),
  has_elevator = has_elevator %>% sub('^(\\w?)', '\\U\\1', ., perl=T)
)
  • Related