Home > Blockchain >  Combine migration in and out data
Combine migration in and out data

Time:11-02

I have two datasets, one for migration inflow to county A from other counties and other for migration outflow from county A to other counties. In order to combine the two data sets as:

Desired output:

Key         County          State   FIPS    Inflow  Outflow FiscalYear  Year
510012012   Accomack County VA      51001   NA      27      2011 - 2012 2012
160012012   Ada County      ID      16001   12      18      2011 - 2012 2012
80012012    Adams County    CO      8001    22      39      2011 - 2012 2012
80012011    Adams County    CO      8001    42      31      2010 - 2011 2011
450032012   Aiken County    SC      45003   NA      21      2011 - 2012 2012
120012012   Alachua County  FL      12001   433     NA      2011 - 2012 2012

How can I combine the two into one dataset in such a way that I don't have to hardcode each and every common county and state name and FIPS and Year? Missing values would be NA.

The common value between the two data sets is the key.

My original migration outflow data has 517 observations and migration inflow has 441, thus different number of counties in each dataset.

Sample data:

# People moving out of county A to other counties
        inflow_df =  structure(list(Origin_FIPS = c(12001L, 8001L, 16001L, 12001L, 
8001L, 16001L), Origin_StateName = c("FL", "CO", "ID", "FL", 
"CO", "ID"), Origin_Place = c("Alachua County", "Adams County", 
"Ada County", "Alachua County", "Adams County", "Ada County"), 
    InIndividuals = c(433L, 30L, 16L, 381L, 42L, 21L), FiscalYear = c("2011 - 2012", 
    "2011 - 2012", "2011 - 2012", "2010 - 2011", "2010 - 2011", 
    "2010 - 2011"), Year = c(2012L, 2012L, 2012L, 2011L, 2011L, 
    2011L), Key = c(120012012L, 80012012L, 160012012L, 120012011L, 
    80012011L, 160012011L)), class = "data.frame", row.names = c(NA, 
-6L))
        
# People moving in county A from other counties
  outflow_df =  structure(list(Dest_FIPS = c(51001L, 16001L, 8001L, 8001L, 45003L
    ), Dest_StateName = c("VA", "ID", "CO", "CO", "SC"), Dest_Place = c("Accomack County", 
    "Ada County", "Adams County", "Adams County", "Aiken County"), 
        OutIndividuals = c(27L, 16L, 39L, 31L, 21L), FiscalYear = c("2011 - 2012", 
        "2011 - 2012", "2011 - 2012", "2010 - 2011", "2011 - 2012"
        ), Year = c(2012L, 2012L, 2012L, 2011L, 2012L), Key = c(510012012L, 
        160012012L, 80012012L, 80012011L, 450032012L)), class = "data.frame", row.names = c(NA, 
    -5L))

CodePudding user response:

We can collate the two tables by giving them consistent names (presumably Origin_Place in one should match with Dest_Place in the other) and then performing a join. full_join outputs all the keys found in either table, in this case c("Key", "County", "State", "FIPS", "FiscalYear", "Year").

I would have expected that the inflow_df would reflect the counties that are seeing inflows (ie the destinations) and outflow_df would reflect the counties that have outflows (ie the origins), so it seems possible the table names are swapped in the question.

inflow2 <- 
  inflow_df %>%
  transmute(Key,
            County = Origin_Place,
            State  = Origin_StateName,
            FIPS   = Origin_FIPS,
            Inflow = InIndividuals,
            FiscalYear,
            Year)

outflow2 <- 
  outflow_df %>%
  transmute(Key,
            County  = Dest_Place,
            State   = Dest_StateName,
            FIPS    = Dest_FIPS,
            Outflow = OutIndividuals,
            FiscalYear,
            Year)

inflow2 %>%
  full_join(outflow2)

Result (btw, the desired output doesn't seem consistent with the given data, but I hope this is what you're looking for)

Joining, by = c("Key", "County", "State", "FIPS", "FiscalYear", "Year")
        Key          County State  FIPS Inflow  FiscalYear Year Outflow
1 120012012  Alachua County    FL 12001    433 2011 - 2012 2012      NA
2  80012012    Adams County    CO  8001     30 2011 - 2012 2012      39
3 160012012      Ada County    ID 16001     16 2011 - 2012 2012      16
4 120012011  Alachua County    FL 12001    381 2010 - 2011 2011      NA
5  80012011    Adams County    CO  8001     42 2010 - 2011 2011      31
6 160012011      Ada County    ID 16001     21 2010 - 2011 2011      NA
7 510012012 Accomack County    VA 51001     NA 2011 - 2012 2012      27
8 450032012    Aiken County    SC 45003     NA 2011 - 2012 2012      21
  • Related