Combine migration in and out data by different common columns-CodePudding

I have two datasets, one for migration inflow to county A from other counties and other for migration outflow from county A to other counties. In order to combine the two data sets as:

County State Key Inflow Outflow Year

The common columns between the two datasets are Origin_Place, Origin_StateName and Year in migration inflow and Dest_place, Dest_StateName and Year in migration outflow.

Part of the problem is that the common columns have unequal number of rows. Another problem is that different counties can belong to the same state as shown in the dummy data below. So, what I am thinking is to have a unique key (Key) for both destination and origin county by concatenating FIPS (unique for each county) and Year. That way I can combine the counties with their respective states and the rest of the associated columns values in a single row.

How can I combine the two into one dataset in such a way that I don't have to hardcode each and every common county and state name and FIPS and Year? Missing values would be NA.

My original migration outflow data has 517 observations and migration inflow has 441, thus different number of counties in each dataset.

Desired output:

County  State   Key     Inflow      Outflow Year
A       FL      12019   111         223     2019
A       FL      12019   8888        224     2019
A       FL      12019   NA          2333    2019
A       FL      12019   NA          4444    2019
A       FL      12019   NA          5555    2019
A       FL      12019   NA          6666    2019
A       FL      12019   NA          7777    2019
A       FL      12020   9999        NA      2020
B       BB      22019   223         NA      2019
C       CC      32019   224         NA      2019
D       FL      42019   2333        111     2019
E       FL      52019   4444        8888    2019
F       FL      62019   5555        9999    2020
G       GG      72019   6666        NA      2019
H       HH      82019   7777        NA      2019

Dummy data:

# People moving out of county A to other counties
Origin_Place = c("A", "A", "A", "A", "A", "A", "A")

FIPS_Origin_County = c(1, 1, 1, 1, 1, 1, 1)

Origin_StateName = c("FL", "FL", "FL", "FL", "FL", "FL", "FL")

Individuals = c(223, 224, 2333, 4444, 5555, 6666, 7777)

Dest_place = c("B", "C", "D", "E", "F", "G", "H")

FIPS_Dest_County = c(2, 3, 4, 5, 6, 7, 8)

Dest_StateName = c("BB", "CC", "FL", "FL", "FL", "GG", "HH")

Year = c(2019, 2019, 2019, 2019, 2020, 2020, 2020)

Outflow_df = data.frame(Origin_County_Name, FIPS_Origin_County,  Origin_StateName,  Individuals, Dest_place, FIPS_Dest_County, Dest_StateName, Year)

# People moving in county A from other counties
Origin_Place = c("D", "E", "F")

FIPS_Origin_County = c(5, 6, 7)    

Origin_StateName = c("FL", "FL", "FL")    

Individuals = c(111, 8888, 9999)

Dest_place = c("A", "A", "A")

FIPS_Dest_County = c(1, 1, 1)

Dest_StateName = c("FL", "FL", "FL")

Year = c(2019, 2019, 2020)

Inflow_df = data.frame(Origin_Place, FIPS_Origin_County,  Origin_StateName,  Individuals, Dest_County_Name, FIPS_Dest_County,  Dest_StateName, Year)

CodePudding user response：

Perhaps this helps

library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
bind_rows(lst(Inflow_df, Outflow_df), .id = 'datname') %>% 
  pivot_longer(cols = contains("_"), names_to = ".value",
   names_pattern = ".*_([^_] $)") %>% 
  mutate(Key = str_c(County, Year), rn = rowid(Key, datname))  %>% 
  pivot_wider(names_from = datname, values_from = Individuals) %>% 
  arrange(rn) %>% 
  select(-rn)