Home > Software engineering >  How to effectively do find and replace in R
How to effectively do find and replace in R

Time:08-31

R, I tried changing some strings in a particular column with mutate() and str_replace function, but I'm surprised 3 out of the 5 strings changed and 2 remain unchanged.

food_quantity <- food_quantity %>% 
  mutate(Food_Group = str_replace(Food_Group, "Live meat animals (1000) 2/", "Live meat animals")) %>%
  mutate(Food_Group = str_replace(Food_Group, "Fish and shellfish 3/", "Fish and shellfish")) %>% 
  mutate(Food_Group = str_replace(Food_Group, "Fruits 4/", "Fruits")) %>%
  mutate(Food_Group = str_replace(Food_Group, "Other foods 5/", "Other foods")) %>%
  mutate(Food_Group = str_replace(Food_Group, "Beverages (1000 KL) 6/", "Beverages"))

The image shows the Live Meat Animals (1000) unchanged, and Beverages (1000 KL) remain unchanged as well, while other strings changed.

CodePudding user response:

Why not use a single regex, since the strings you want to exclude do follow rules, i.e., they all start with either ( or a digit:

library(dplyr)
data.frame(Food_group) %>%
  mutate(Food_group = sub("\\s(\\(|\\d).*$", "", Food_group))
          Food_group
1  Live meat animals
2 Fish and shellfish

If you prefer stringr:

library(dplyr)
library(stringr)
data.frame(Food_group) %>%
  mutate(Food_group = str_replace(Food_group, "\\s(\\(|\\d).*$", ""))

Data:

Food_group <- c("Live meat animals (1000) 2/", "Fish and shellfish 3/")

Note:

The string "Beverages (1000 KL) 6/" remains unchanged when you input it into str_replace (or any other regex function) because parentheses are metacharacters that need to be escaped. So the correct input would be "Beverages \(1000 KL\) 6/" (with double backslashes n R)

CodePudding user response:

Looks like you're after recode rather than str_replace_all as you want to recode exact categories:

library(dplyr)

food_quantity |> 
  mutate(Food_Group_new = recode(Food_Group,
                                 "Live meat animals (1000) 2/" = "Live meat animals",
                                 "Fish and shellfish 3/" = "Fish and shellfish",
                                 "Fruits 4/" = "Fruits",
                                 "Other foods 5/" = "Other foods",
                                 "Beverages (1000 KL) 6/" = "Beverages"))

Output:

# A tibble: 5 × 2
  Food_Group                  Food_Group_new    
  <chr>                       <chr>             
1 Live meat animals (1000) 2/ Live meat animals 
2 Fish and shellfish 3/       Fish and shellfish
3 Fruits 4/                   Fruits            
4 Other foods 5/              Other foods       
5 Beverages (1000 KL) 6/      Beverages         

If you want to use stringr you could use fixed (or coll) to match a fixed string (approximately), so you avoid escaping the parenthesis.

library(stringr)
library(dplyr)

food_quantity %>% 
  mutate(Food_Group = str_replace(Food_Group, fixed("Live meat animals (1000) 2/"), "Live meat animals")) %>%
  mutate(Food_Group = str_replace(Food_Group, fixed("Fish and shellfish 3/"), "Fish and shellfish")) %>% 
  mutate(Food_Group = str_replace(Food_Group, fixed("Fruits 4/"), "Fruits")) %>%
  mutate(Food_Group = str_replace(Food_Group, fixed("Other foods 5/"), "Other foods")) %>%
  mutate(Food_Group = str_replace(Food_Group, fixed("Beverages (1000 KL) 6/"), "Beverages"))

Output:

# A tibble: 5 × 1
  Food_Group        
  <chr>             
1 Live meat animals 
2 Fish and shellfish
3 Fruits            
4 Other foods       
5 Beverages     

Data:

library(tibble)

food_quantity <- tibble(Food_Group  = c("Live meat animals (1000) 2/",
                                        "Fish and shellfish 3/",
                                        "Fruits 4/",
                                        "Other foods 5/",
                                        "Beverages (1000 KL) 6/"))

CodePudding user response:

You could use str_replace_all() to perform multiple replacements by passing a named vector (c(pattern1 = replacement1)) to it.

food_quantity %>% 
  mutate(Food_Group = str_replace_all(Food_Group,
    c("Live meat animals (1000) 2/" = "Live meat animals",
      "Fish and shellfish 3/"       = "Fish and shellfish",
      "Fruits 4/"                   = "Fruits",
      "Other foods 5/"              = "Other foods",
      "Beverages (1000 KL) 6/"      = "Beverages")))
  • Related