I have this sample dataset
province region_vn region_en sub_region_vn sub_region_en province_latin
<chr> <chr> <chr> <chr> <chr> <chr>
1 Điện Biên Bắc Bộ Northern Tây Bắc Bộ Northwest Dien Bien
2 Lạng Sơn Bắc Bộ Northern Tây Bắc Bộ Northeast Lang Son
How do I join the two sub_region_en
of Northwest
and Northeast
and rename it to Northern midlands and mountain areas
?
The outcome would be
province region_vn region_en sub_region_vn sub_region_en province_latin
<chr> <chr> <chr> <chr> <chr> <chr>
1 Điện Biên Bắc Bộ Northern Tây Bắc Bộ Northern midlands and mountain areas Dien Bien
2 Lạng Sơn Bắc Bộ Northern Tây Bắc Bộ Northern midlands and mountain areas Lang Son
I would appreciate any help.
CodePudding user response:
For example, if your dataset is called "df"
You can simply do the following:
for(i in 1:dim(df)[1]){
if(df$sub_region_en[i] %in% c("Northwest", "Northeast")){
df$sub_region_en[i] <- "Northern midlands and mountain areas"
}
}
CodePudding user response:
Another option is to use regular expressions to identify the pattern, and then use gsub()
function to substitute the pattern. Here is the step:
# A simplified version of your data
yourdf <- structure(list(region_en = c("Northern", "Northern"), sub_region_en = c("Northwest",
"Northeast")), class = "data.frame", row.names = c(NA, -2L))
yourdf
# region_en sub_region_en
#1 Northern Northwest
#2 Northern Northeast
# Substitute the data
yourdf$sub_region_en <- gsub("Northwest|Northeast",
"Northern midlands and mountain areas",
yourdf$sub_region_en)
# The result
yourdf
# region_en sub_region_en
#1 Northern Northern midlands and mountain areas
#2 Northern Northern midlands and mountain areas