I want to take the state name from one data frame, and put it in another based on the county fips code. Below is a snippet of the two
> head(arr)
state_abb fips_state_county_code
1 CO 8001
2 CO 8001
3 CO 8001
4 CO 8001
5 CO 8001
6 CO 8001
> head(countyname)
county countyname
245 8001 Adams County
246 8003 Alamosa County
247 8005 Arapahoe County
248 8007 Archuleta County
249 8009 Baca County
250 8011 Bent County
When I use left_join
, I get a ton of duplicates
xcountyname= left_join(arr, countyname, by =c("county" = "fips_state_county_code"))
> head(xcountyname)
county countyname state_abb
1 8001 Adams County CO
2 8001 Adams County CO
3 8001 Adams County CO
4 8001 Adams County CO
5 8001 Adams County CO
6 8001 Adams County CO
It should only be adding the state column, but it has now gone from 535 rows to over 71k rows. Is there a better way to do this so that the state only fills in the existing rows of the xcountyname data frame?
Edit: arr is a subset of another dataset, that's why there are duplicates. But it still stands that I only want the rows with the state name to transfer with the existing fips codes in xcountyname
CodePudding user response:
Update:
One state abb:
xcountyname= left_join(arr, countyname, by =c("fips_state_county_code"="county")) %>%
distinct()
state_abb fips_state_county_code countyname
1 CO 8001 Adams County
All states abb:
xcountyname= full_join(arr, countyname, by =c("fips_state_county_code"="county")) %>%
distinct()
state_abb fips_state_county_code countyname
1 CO 8001 Adams County
2 <NA> 8003 Alamosa County
3 <NA> 8005 Arapahoe County
4 <NA> 8007 Archuleta County
5 <NA> 8009 Baca County
6 <NA> 8011 Bent County
First answer:
Do a full_join
:
If you want to remove the duplicates, just add distinct()
in the final line:
library(dplyr)
xcountyname= full_join(arr, countyname, by =c("fips_state_county_code"="county"))
state_abb fips_state_county_code countyname
1 CO 8001 Adams County
2 CO 8001 Adams County
3 CO 8001 Adams County
4 CO 8001 Adams County
5 CO 8001 Adams County
6 CO 8001 Adams County
7 <NA> 8003 Alamosa County
8 <NA> 8005 Arapahoe County
9 <NA> 8007 Archuleta County
10 <NA> 8009 Baca County
11 <NA> 8011 Bent County
CodePudding user response:
You only want one state abb for each fips, so use distinct()
left_join(
distinct(arr %>% select(state_abb, county=fips_state_county_code)),
countyname
)