Home > Software engineering >  Trying to remove "ZCTA" from rows
Trying to remove "ZCTA" from rows

Time:10-27

I am trying to extract only the zip code values from my imported ACS data file, however, the rows all include "ZCTA" before the 5 digit zip code. Is there a way to remove that so just the 5 digit zip code remains?

Example:

Image of data frame with ZCTA and Zip

I tried using strtrim on the data but I can't figure out how to target the last 5 digits. I image there is a function or loop that could also do this since the dataset is so large.

CodePudding user response:

To remove "ZCTA5":

gsub("ZCTA5", "", df$zip) # df - your data.frame name

or

library(stringr)
str_replace(df$zip,"ZCTA5","")

To extract ZIP CODE:

str_sub(df$zip,-5,-1)

CodePudding user response:

Here is a few others for fun:

#option 1
stringr::str_extract(df$zip, "(?<=\\s)\\d $")

#option 2
gsub("^.*\\s(\\d )$", "\\1", df$zip)
  • Related