Home > database >  Remove a Varying Substring from a Character column
Remove a Varying Substring from a Character column

Time:07-12

I have a dataframe that looks like so:

                                                                  Item Quantity Price Totals
1                                  24" Box Desert Museum (14" Caliper)       10    92    920
2                                     24" Box Mastic tree (1" Caliper)       28   135   3780
3                              36" Box Thornless Mesquite (2" Caliper)        9   335   3015

I am trying to remove the the parentheses, along with what is inside of it. In the end, I want it to look like so:

                                                                  Item Quantity Price Totals
1                                                24" Box Desert Museum       10    92    920
2                                                  24" Box Mastic tree       28   135   3780
3                                           36" Box Thornless Mesquite        9   335   3015

The issue I am having is not so much about removing the parentheses and its contents, its the fact that the number varies in digits, which makes every case slightly different form each other.

CodePudding user response:

Assuming the term in parentheses always occur at the end of the item description, we can use sub() as follows:

df$item <- sub("\\s*\\(.*?\\)$", "", df$item)

We can also use str_replace to do this the stringr way:

df$item <- str_replace(df$item, "\\s*\\(.*?\\)$", "")
  • Related