Combine migration in and out data by county-CodePudding

I have two datasets, one for migration inflow to county A from other counties and other for migration outflow from county A to other counties. In order to combine the two data sets as:

County Inflow Outflow

The common column between the two datasets is Origin_Place in migration inflow and Dest_place in migration outflow. Part of the problem is that Origin_Place and Dest_place has unequal number of rows.

How can combine the two into one dataset in such a way that I don't have to hardcode each and every common county name?

My original migration outflow data has 517 observations and migration inflow has 441, thus different number of counties in each dataset.

Dummy data:

# People moving out of county A to other counties
Origin_County_Name = c("A", "A", "A", "A", "A", "A", "A")

Individuals = c(223, 224, 2333, 4444, 5555, 6666, 7777)

Dest_place = c("B", "C", "D", "E", "F", "G", "H")

Outflow_df = data.frame(Origin_County_Name, Individuals, Dest_place)

# People moving in county A from other counties
Origin_Place = c("D", "E", "F")

Individuals = c(111, 8888, 9999)

Dest_County_Name = c("A", "A", "A")

Inflow_df = data.frame(Origin_Place, Individuals, Dest_County_Name)

CodePudding user response：

I think this is what you are after?

names(Outflow_df)[names(Outflow_df) == "Dest_place"] <- "County"
names(Outflow_df)[names(Outflow_df) == "Individuals"] <- "Outflow"
names(Inflow_df)[names(Inflow_df) == "Origin_Place"] <- "County"
names(Inflow_df)[names(Inflow_df) == "Individuals"] <- "Inflow"

merge(Outflow_df[, c("County", "Outflow")], Inflow_df[, c("County", "Inflow")], all = TRUE)

CodePudding user response：

You need to do an outer merge. Use the all=T argument for the merge() function

dfNew <- merge(Outflow_df,Inflow_df,by.x='Dest_place',by.y='Origin_Place',all=TRUE)
dfNew <- dfNew[,c('Dest_place','Individuals.y','Individuals.x')] 
dfNew <- setnames(dfNew,old=c('Dest_place','Individuals.y','Individuals.x'),
              new=c('County','Inflow','Outflow'))
dfNew[is.na(dfNew)] <- ''