Below is the sample code. Seems simple enough but what I find online makes this far too complex. I am simply wanting to create a third data frame that contains one column.. the areas that common to firstdf and seconddf.
area1 <- c("001","005","007","009","011","013","015","017","019","021","023","025")
Employment1 <- c(2,4,6,8,110,12,14,15,16,17,12,20)
firstdf <- data.frame(area1,Employment1)
area2 <- c("005","007","011","013","015","021","027","033")
Employment2 <- c(100,101,102,103,104,111,321,522)
seconddf <- data.frame(area2,Employment2)
intended result
thirddf =>
area
005
007
011
013
015
021
CodePudding user response:
We can use intersect
:
data.frame(area = intersect(firstdf$area1, seconddf$area2))
# area
# 1 005
# 2 007
# 3 011
# 4 013
# 5 015
# 6 021
though there's no need for the frame, it can easily be intersect(..)
without the wrapping data.frame(.)
.
Candidly, though, it is common to want to preserve the other columns from either or both frames. Some thoughts:
### rows where `area` is in both
merge(firstdf, seconddf, by.x = "area1", by.y = "area2")
# area1 Employment1 Employment2
# 1 005 4 100
# 2 007 6 101
# 3 011 110 102
# 4 013 12 103
# 5 015 14 104
# 6 021 17 111
### always keep first frame
merge(firstdf, seconddf, by.x = "area1", by.y = "area2", all.x = TRUE)
# area1 Employment1 Employment2
# 1 001 2 NA
# 2 005 4 100
# 3 007 6 101
# 4 009 8 NA
# 5 011 110 102
# 6 013 12 103
# 7 015 14 104
# 8 017 15 NA
# 9 019 16 NA
# 10 021 17 111
# 11 023 12 NA
# 12 025 20 NA
### always keep second frame
merge(firstdf, seconddf, by.x = "area1", by.y = "area2", all.y = TRUE)
# area1 Employment1 Employment2
# 1 005 4 100
# 2 007 6 101
# 3 011 110 102
# 4 013 12 103
# 5 015 14 104
# 6 021 17 111
# 7 027 NA 321
# 8 033 NA 522
### keep all rows, NAs abound
merge(firstdf, seconddf, by.x = "area1", by.y = "area2", all = TRUE)
# area1 Employment1 Employment2
# 1 001 2 NA
# 2 005 4 100
# 3 007 6 101
# 4 009 8 NA
# 5 011 110 102
# 6 013 12 103
# 7 015 14 104
# 8 017 15 NA
# 9 019 16 NA
# 10 021 17 111
# 11 023 12 NA
# 12 025 20 NA
# 13 027 NA 321
# 14 033 NA 522
There are dplyr
-variants of the merge(..)
calls above that might be easier to read if you're already learning/using that dialect. The joining is over-kill if you never want all of the other columns, but I offer it in case that's the direction you're headed.
CodePudding user response:
library(dplyr)
tibble(intersect(firstdf$area1, seconddf$area2))
1 005
2 007
3 011
4 013
5 015
6 021