Given the following using R:
County_or_City <- c("Butte County", "Oroville", "Solano Cnty", "Redding", "Maripossa county")
data.frame(County_or_City)
County_or_City
1 Butte County
2 Oroville
3 Solano Cnty
4 Redding
5 Maripossa county
I would like to create a new column with a dummy variable for rows that contain Cnty, County, or county. Sorry I know this is very basic, but I'm learning. What do I do???
CodePudding user response:
Using base R
transform(data.frame(County_or_City),
dummy = grepl('C(ou)?nty', County_or_City, ignore.case = TRUE))
-output
County_or_City dummy
1 Butte County TRUE
2 Oroville FALSE
3 Solano Cnty TRUE
4 Redding FALSE
5 Maripossa county TRUE
CodePudding user response:
Code
library(dplyr)
library(stringr)
county_words <- c("County","county","Cnty")
data.frame(County_or_City) %>%
mutate(dummy = str_detect(County_or_City,county_words))
Output
County_or_City dummy
1 Butte County TRUE
2 Oroville FALSE
3 Solano Cnty TRUE
4 Redding FALSE
5 Maripossa county TRUE
CodePudding user response:
In base R you could use grepl
(which searches for patterns in strings and returns a boolean TRUE/FALSE) with paste
and specify collapse = "|"
(which means search for this "or" that term) to search for your terms and return a boolean (TRUE/FALSE) for each county, and then add * 1
to turn it into a dichotomous dummy variable (0 = FALSE/1 = TRUE):
County_or_City <- c("Butte County", "Oroville", "Solano Cnty", "Redding", "Maripossa county")
df <- data.frame(County_or_City)
srchtrms <- c("County","county","Cnty")
df$new <- grepl(paste(srchtrms, collapse = "|"), df$County_or_City) * 1
df
Output:
County_or_City new
1 Butte County 1
2 Oroville 0
3 Solano Cnty 1
4 Redding 0
5 Maripossa county 1