I am trying to extract only the zip code values from my imported ACS data file, however, the rows all include "ZCTA" before the 5 digit zip code. Is there a way to remove that so just the 5 digit zip code remains?
Example:
I tried using strtrim on the data but I can't figure out how to target the last 5 digits. I image there is a function or loop that could also do this since the dataset is so large.
CodePudding user response:
To remove "ZCTA5":
gsub("ZCTA5", "", df$zip) # df - your data.frame name
or
library(stringr)
str_replace(df$zip,"ZCTA5","")
To extract ZIP CODE:
str_sub(df$zip,-5,-1)
CodePudding user response:
Here is a few others for fun:
#option 1
stringr::str_extract(df$zip, "(?<=\\s)\\d $")
#option 2
gsub("^.*\\s(\\d )$", "\\1", df$zip)