Home > Net >  How can I add columns to a data frame with a value determined by values in other columns?
How can I add columns to a data frame with a value determined by values in other columns?

Time:03-31

I have a dataset on population in counties in the US. I want to add a column for what state the county is in and one for the county code. Both are already available in the dataset but "hid".

For instance, from the output we can see that the first observation says NAME = "Ada County, Idaho" and GEOID = "16001". I want one column with State = "Idaho" and one column with StateID = "16".

Thank you!

structure(list(NAME = c("Ada County, Idaho", "Ada County, Idaho", 
"Ada County, Idaho", "Ada County, Idaho", "Ada County, Idaho", 
"Ada County, Idaho"), GEOID = c("16001", "16001", "16001", "16001", 
"16001", "16001"), year = c("2007", "2007", "2007", "2007", "2007", 
"2007"), POP25 = c(205888, 205888, 205888, 205888, 205888, 205888
), EMPLOY25 = c(205888, 208506, 212770, 212272, 216058, 220856
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L), groups = structure(list(NAME = "Ada County, Idaho", GEOID = "16001", 
    .rows = structure(list(1:6), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE)) 

CodePudding user response:

Perhaps this helps - remove the substring in 'NAME' till the , followed by one or more spaces (\\s ) to create the 'State' and the 'StateID' from the first two characters of 'GEOID' column using substr

library(dplyr)
library(stringr)
df1 %>% 
  ungroup %>%
  mutate(State = str_remove(NAME, ".*,\\s "), 
     StateID = substr(GEOID, 1, 2))

CodePudding user response:

Here is an alternative using str_extract and str_sub:

library(dplyr)
library(stringr)
pattern <- paste(state.name, collapse="|")

df %>% 
  mutate(State = str_extract(NAME, pattern),
         StateID = str_sub(GEOID, 1, 2))
  NAME          GEOID year   POP25 EMPLOY25 State StateID
  <chr>         <chr> <chr>  <dbl>    <dbl> <chr> <chr>  
1 Ada County, ~ 16001 2007  205888   205888 Idaho 16     
2 Ada County, ~ 16001 2007  205888   208506 Idaho 16     
3 Ada County, ~ 16001 2007  205888   212770 Idaho 16     
4 Ada County, ~ 16001 2007  205888   212272 Idaho 16     
5 Ada County, ~ 16001 2007  205888   216058 Idaho 16     
6 Ada County, ~ 16001 2007  205888   220856 Idaho 16  
  •  Tags:  
  • r
  • Related