R, I tried changing some strings in a particular column with mutate() and str_replace function, but I'm surprised 3 out of the 5 strings changed and 2 remain unchanged.
food_quantity <- food_quantity %>%
mutate(Food_Group = str_replace(Food_Group, "Live meat animals (1000) 2/", "Live meat animals")) %>%
mutate(Food_Group = str_replace(Food_Group, "Fish and shellfish 3/", "Fish and shellfish")) %>%
mutate(Food_Group = str_replace(Food_Group, "Fruits 4/", "Fruits")) %>%
mutate(Food_Group = str_replace(Food_Group, "Other foods 5/", "Other foods")) %>%
mutate(Food_Group = str_replace(Food_Group, "Beverages (1000 KL) 6/", "Beverages"))
The image shows the Live Meat Animals (1000) unchanged, and Beverages (1000 KL) remain unchanged as well, while other strings changed.
CodePudding user response:
Why not use a single regex, since the strings you want to exclude do follow rules, i.e., they all start with either (
or a digit:
library(dplyr)
data.frame(Food_group) %>%
mutate(Food_group = sub("\\s(\\(|\\d).*$", "", Food_group))
Food_group
1 Live meat animals
2 Fish and shellfish
If you prefer stringr
:
library(dplyr)
library(stringr)
data.frame(Food_group) %>%
mutate(Food_group = str_replace(Food_group, "\\s(\\(|\\d).*$", ""))
Data:
Food_group <- c("Live meat animals (1000) 2/", "Fish and shellfish 3/")
Note:
The string "Beverages (1000 KL) 6/" remains unchanged when you input it into str_replace
(or any other regex function) because parentheses are metacharacters that need to be escaped. So the correct input would be "Beverages \(1000 KL\) 6/" (with double backslashes n R)
CodePudding user response:
Looks like you're after recode
rather than str_replace_all
as you want to recode exact categories:
library(dplyr)
food_quantity |>
mutate(Food_Group_new = recode(Food_Group,
"Live meat animals (1000) 2/" = "Live meat animals",
"Fish and shellfish 3/" = "Fish and shellfish",
"Fruits 4/" = "Fruits",
"Other foods 5/" = "Other foods",
"Beverages (1000 KL) 6/" = "Beverages"))
Output:
# A tibble: 5 × 2
Food_Group Food_Group_new
<chr> <chr>
1 Live meat animals (1000) 2/ Live meat animals
2 Fish and shellfish 3/ Fish and shellfish
3 Fruits 4/ Fruits
4 Other foods 5/ Other foods
5 Beverages (1000 KL) 6/ Beverages
If you want to use stringr
you could use fixed
(or coll
) to match a fixed string (approximately), so you avoid escaping the parenthesis.
library(stringr)
library(dplyr)
food_quantity %>%
mutate(Food_Group = str_replace(Food_Group, fixed("Live meat animals (1000) 2/"), "Live meat animals")) %>%
mutate(Food_Group = str_replace(Food_Group, fixed("Fish and shellfish 3/"), "Fish and shellfish")) %>%
mutate(Food_Group = str_replace(Food_Group, fixed("Fruits 4/"), "Fruits")) %>%
mutate(Food_Group = str_replace(Food_Group, fixed("Other foods 5/"), "Other foods")) %>%
mutate(Food_Group = str_replace(Food_Group, fixed("Beverages (1000 KL) 6/"), "Beverages"))
Output:
# A tibble: 5 × 1
Food_Group
<chr>
1 Live meat animals
2 Fish and shellfish
3 Fruits
4 Other foods
5 Beverages
Data:
library(tibble)
food_quantity <- tibble(Food_Group = c("Live meat animals (1000) 2/",
"Fish and shellfish 3/",
"Fruits 4/",
"Other foods 5/",
"Beverages (1000 KL) 6/"))
CodePudding user response:
You could use str_replace_all()
to perform multiple replacements by passing a named vector (c(pattern1 = replacement1)
) to it.
food_quantity %>%
mutate(Food_Group = str_replace_all(Food_Group,
c("Live meat animals (1000) 2/" = "Live meat animals",
"Fish and shellfish 3/" = "Fish and shellfish",
"Fruits 4/" = "Fruits",
"Other foods 5/" = "Other foods",
"Beverages (1000 KL) 6/" = "Beverages")))