I have two datasets, one for migration inflow to county A from other counties and other for migration outflow from county A to other counties. In order to combine the two data sets as:
Desired output:
Key County State FIPS Inflow Outflow FiscalYear Year
510012012 Accomack County VA 51001 NA 27 2011 - 2012 2012
160012012 Ada County ID 16001 12 18 2011 - 2012 2012
80012012 Adams County CO 8001 22 39 2011 - 2012 2012
80012011 Adams County CO 8001 42 31 2010 - 2011 2011
450032012 Aiken County SC 45003 NA 21 2011 - 2012 2012
120012012 Alachua County FL 12001 433 NA 2011 - 2012 2012
How can I combine the two into one dataset in such a way that I don't have to hardcode each and every common county and state name and FIPS and Year? Missing values would be NA.
The common value between the two data sets is the key
.
My original migration outflow data has 517 observations and migration inflow has 441, thus different number of counties in each dataset.
Sample data:
# People moving out of county A to other counties
inflow_df = structure(list(Origin_FIPS = c(12001L, 8001L, 16001L, 12001L,
8001L, 16001L), Origin_StateName = c("FL", "CO", "ID", "FL",
"CO", "ID"), Origin_Place = c("Alachua County", "Adams County",
"Ada County", "Alachua County", "Adams County", "Ada County"),
InIndividuals = c(433L, 30L, 16L, 381L, 42L, 21L), FiscalYear = c("2011 - 2012",
"2011 - 2012", "2011 - 2012", "2010 - 2011", "2010 - 2011",
"2010 - 2011"), Year = c(2012L, 2012L, 2012L, 2011L, 2011L,
2011L), Key = c(120012012L, 80012012L, 160012012L, 120012011L,
80012011L, 160012011L)), class = "data.frame", row.names = c(NA,
-6L))
# People moving in county A from other counties
outflow_df = structure(list(Dest_FIPS = c(51001L, 16001L, 8001L, 8001L, 45003L
), Dest_StateName = c("VA", "ID", "CO", "CO", "SC"), Dest_Place = c("Accomack County",
"Ada County", "Adams County", "Adams County", "Aiken County"),
OutIndividuals = c(27L, 16L, 39L, 31L, 21L), FiscalYear = c("2011 - 2012",
"2011 - 2012", "2011 - 2012", "2010 - 2011", "2011 - 2012"
), Year = c(2012L, 2012L, 2012L, 2011L, 2012L), Key = c(510012012L,
160012012L, 80012012L, 80012011L, 450032012L)), class = "data.frame", row.names = c(NA,
-5L))
CodePudding user response:
We can collate the two tables by giving them consistent names (presumably Origin_Place
in one should match with Dest_Place
in the other) and then performing a join. full_join
outputs all the keys found in either table, in this case c("Key", "County", "State", "FIPS", "FiscalYear", "Year")
.
I would have expected that the inflow_df
would reflect the counties that are seeing inflows (ie the destinations) and outflow_df
would reflect the counties that have outflows (ie the origins), so it seems possible the table names are swapped in the question.
inflow2 <-
inflow_df %>%
transmute(Key,
County = Origin_Place,
State = Origin_StateName,
FIPS = Origin_FIPS,
Inflow = InIndividuals,
FiscalYear,
Year)
outflow2 <-
outflow_df %>%
transmute(Key,
County = Dest_Place,
State = Dest_StateName,
FIPS = Dest_FIPS,
Outflow = OutIndividuals,
FiscalYear,
Year)
inflow2 %>%
full_join(outflow2)
Result (btw, the desired output doesn't seem consistent with the given data, but I hope this is what you're looking for)
Joining, by = c("Key", "County", "State", "FIPS", "FiscalYear", "Year")
Key County State FIPS Inflow FiscalYear Year Outflow
1 120012012 Alachua County FL 12001 433 2011 - 2012 2012 NA
2 80012012 Adams County CO 8001 30 2011 - 2012 2012 39
3 160012012 Ada County ID 16001 16 2011 - 2012 2012 16
4 120012011 Alachua County FL 12001 381 2010 - 2011 2011 NA
5 80012011 Adams County CO 8001 42 2010 - 2011 2011 31
6 160012011 Ada County ID 16001 21 2010 - 2011 2011 NA
7 510012012 Accomack County VA 51001 NA 2011 - 2012 2012 27
8 450032012 Aiken County SC 45003 NA 2011 - 2012 2012 21