I have a dataset on population in counties in the US. I want to add a column for what state the county is in and one for the county code. Both are already available in the dataset but "hid".
For instance, from the output we can see that the first observation says NAME = "Ada County, Idaho" and GEOID = "16001". I want one column with State = "Idaho" and one column with StateID = "16".
Thank you!
structure(list(NAME = c("Ada County, Idaho", "Ada County, Idaho",
"Ada County, Idaho", "Ada County, Idaho", "Ada County, Idaho",
"Ada County, Idaho"), GEOID = c("16001", "16001", "16001", "16001",
"16001", "16001"), year = c("2007", "2007", "2007", "2007", "2007",
"2007"), POP25 = c(205888, 205888, 205888, 205888, 205888, 205888
), EMPLOY25 = c(205888, 208506, 212770, 212272, 216058, 220856
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L), groups = structure(list(NAME = "Ada County, Idaho", GEOID = "16001",
.rows = structure(list(1:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))
CodePudding user response:
Perhaps this helps - remove the substring in 'NAME' till the ,
followed by one or more spaces (\\s
) to create the 'State' and the 'StateID' from the first two characters of 'GEOID' column using substr
library(dplyr)
library(stringr)
df1 %>%
ungroup %>%
mutate(State = str_remove(NAME, ".*,\\s "),
StateID = substr(GEOID, 1, 2))
CodePudding user response:
Here is an alternative using str_extract
and str_sub
:
library(dplyr)
library(stringr)
pattern <- paste(state.name, collapse="|")
df %>%
mutate(State = str_extract(NAME, pattern),
StateID = str_sub(GEOID, 1, 2))
NAME GEOID year POP25 EMPLOY25 State StateID
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
1 Ada County, ~ 16001 2007 205888 205888 Idaho 16
2 Ada County, ~ 16001 2007 205888 208506 Idaho 16
3 Ada County, ~ 16001 2007 205888 212770 Idaho 16
4 Ada County, ~ 16001 2007 205888 212272 Idaho 16
5 Ada County, ~ 16001 2007 205888 216058 Idaho 16
6 Ada County, ~ 16001 2007 205888 220856 Idaho 16