Home > database >  Removing different words from vector in R
Removing different words from vector in R

Time:12-02

Lets say I have in R a long data frame like this:

var1 <- c("Los Angeles - CA", "New York - NY", "Seattle - WA", "Los Angeles - CA", "New York - NY")
var2 <- c(1, 2, 3, 4, 5)

df <- data.frame(var1, var2)

I want to remove the " - State", to get a result like:

var1 <- c("Los Angeles", "New York", "Seattle", "Los Angeles", "New York")
var2 <- c(1, 2, 3, 4, 5)
df <- data.frame(var1, var2)

I wasn't able to figure out how to do so since I have more than 5,000 rows and cannot use gsub because I'd have to state every state abbreviation to remove. I mean, there's dozens of patterns (-State) that I'd have to define a priori before using such functions,

Is there an easy way to remove all "-State" from that column at once by using some splitting pattern that I haven't figured out yet?

CodePudding user response:

Couple of options.

Most basic would be to just remove the last 5 characters.

library(stringr)
str_sub(var1, 1L, -6L)

Or maybe search for the pattern and delete that:

gsub(" - \\w $","",var1)

or

str_remove_all(var1, " - \\w $")

All will get you the same result

[1] "Los Angeles" "New York"    "Seattle"     "Los Angeles" "New York"   

CodePudding user response:

var1 <- c("Los Angeles - CA", "New York - NY", "Seattle - WA", "Los Angeles - CA", "New York - NY")
gsub(" - [A-Z] $", "", var1)
[1] "Los Angeles" "New York"    "Seattle"     "Los Angeles" "New York" 
  • Related