I have two datasets, one for migration inflow
to county A
from other counties and other for migration outflow
from county A
to other counties. In order to combine the two data sets as:
County Inflow Outflow
The common column between the two datasets is Origin_Place
in migration inflow
and Dest_place
in migration outflow
. Part of the problem is that Origin_Place
and Dest_place
has unequal number of rows.
How can combine the two into one dataset in such a way that I don't have to hardcode each and every common county name?
My original migration outflow data has 517 observations and migration inflow has 441, thus different number of counties in each dataset.
Dummy data:
# People moving out of county A to other counties
Origin_County_Name = c("A", "A", "A", "A", "A", "A", "A")
Individuals = c(223, 224, 2333, 4444, 5555, 6666, 7777)
Dest_place = c("B", "C", "D", "E", "F", "G", "H")
Outflow_df = data.frame(Origin_County_Name, Individuals, Dest_place)
# People moving in county A from other counties
Origin_Place = c("D", "E", "F")
Individuals = c(111, 8888, 9999)
Dest_County_Name = c("A", "A", "A")
Inflow_df = data.frame(Origin_Place, Individuals, Dest_County_Name)
CodePudding user response:
I think this is what you are after?
names(Outflow_df)[names(Outflow_df) == "Dest_place"] <- "County"
names(Outflow_df)[names(Outflow_df) == "Individuals"] <- "Outflow"
names(Inflow_df)[names(Inflow_df) == "Origin_Place"] <- "County"
names(Inflow_df)[names(Inflow_df) == "Individuals"] <- "Inflow"
merge(Outflow_df[, c("County", "Outflow")], Inflow_df[, c("County", "Inflow")], all = TRUE)
CodePudding user response:
You need to do an outer merge. Use the all=T
argument for the merge() function
dfNew <- merge(Outflow_df,Inflow_df,by.x='Dest_place',by.y='Origin_Place',all=TRUE)
dfNew <- dfNew[,c('Dest_place','Individuals.y','Individuals.x')]
dfNew <- setnames(dfNew,old=c('Dest_place','Individuals.y','Individuals.x'),
new=c('County','Inflow','Outflow'))
dfNew[is.na(dfNew)] <- ''