I am trying to figure out how to work with Joins in dplyr. When I join a and b with full_join, I get 4 states with missing values for statefips.
- Is there a better way to join, and avoid this problem altogether, without loosing any data?
- Can I add the statefips after joining a and b (the real a and b contains 4000 rows)?
library(tidyverse)
# create df's
a <- tibble::tribble(
~statename, ~statefips, ~date, ~emp,
"Alabama", 1, "2020-01-14", 2,
"California", 6, "2020-01-14", 2,
"Alabama", 1, "2020-01-15", 2,
"California", 6, "2020-01-15", 2,
"Alabama", 1, "2020-01-16", 3,
"California", 6, "2020-01-16", 3,
"Alabama", 1, "2020-01-17", 3,
"California", 6, "2020-01-17", 3,
"Alabama", 1, "2020-01-18", 4,
"California", 6, "2020-01-18", 4,
"Alabama", 1, "2020-01-19", 4,
"California", 6, "2020-01-19", 4,
"Alabama", 1, "2020-01-20", 4,
"California", 6, "2020-01-20", 5,
"Alabama", 1, "2020-01-21", 5,
"California", 6, "2020-01-21", 5,
"Alabama", 1, "2020-01-22", 5,
"California", 6, "2020-01-22", 5,
"Alabama", 1, "2020-01-21", 5,
"California", 6, "2020-01-21", 4,
"Alabama", 1, "2020-01-22", 4,
"California", 6, "2020-01-22", 4,
"Alabama", 1, "2020-01-23", 4,
"California", 6, "2020-01-23", 4,
"Alabama", 1, "2020-01-24", 4,
"California", 6, "2020-01-24", 4
)
b <- tibble::tribble(
~statename, ~date, ~ui_claims,
"Alabama", "2020-01-04", "0.5",
"California", "2020-01-04", "0.5",
"Alabama", "2020-01-11", "0.5",
"California", "2020-01-11", "2.5",
"Alabama", "2020-01-18", "2.5",
"California", "2020-01-18", "1.5"
)
# Join a and b
full_join <- full_join(a, b, by = c("statename", "date")) %>% arrange(date)
# my try to fix missing NA's (doesn't work)
state_id <- tibble::tribble(
~statename, ~statefips,
"Alabama", 1,
"California", 6
)
full_join_fix <- full_join(full_join, state_id, by = "statename") %>% arrange(date)
CodePudding user response:
I am not quite sure, if this is what you are lookung for, but after a full_join
we could arrange
and then fill
:
library(dplyr)
library(tidyr)
a %>%
full_join(b, by = c("statename", "date")) %>%
arrange(statename) %>%
fill(statefips, .direction = "down") %>%
print(n=40)
statename statefips date emp ui_claims
<chr> <dbl> <chr> <dbl> <chr>
1 Alabama 1 2020-01-14 2 NA
2 Alabama 1 2020-01-15 2 NA
3 Alabama 1 2020-01-16 3 NA
4 Alabama 1 2020-01-17 3 NA
5 Alabama 1 2020-01-18 4 2.5
6 Alabama 1 2020-01-19 4 NA
7 Alabama 1 2020-01-20 4 NA
8 Alabama 1 2020-01-21 5 NA
9 Alabama 1 2020-01-22 5 NA
10 Alabama 1 2020-01-21 5 NA
11 Alabama 1 2020-01-22 4 NA
12 Alabama 1 2020-01-23 4 NA
13 Alabama 1 2020-01-24 4 NA
14 Alabama 1 2020-01-04 NA 0.5
15 Alabama 1 2020-01-11 NA 0.5
16 California 6 2020-01-14 2 NA
17 California 6 2020-01-15 2 NA
18 California 6 2020-01-16 3 NA
19 California 6 2020-01-17 3 NA
20 California 6 2020-01-18 4 1.5
21 California 6 2020-01-19 4 NA
22 California 6 2020-01-20 5 NA
23 California 6 2020-01-21 5 NA
24 California 6 2020-01-22 5 NA
25 California 6 2020-01-21 4 NA
26 California 6 2020-01-22 4 NA
27 California 6 2020-01-23 4 NA
28 California 6 2020-01-24 4 NA
29 California 6 2020-01-04 NA 0.5
30 California 6 2020-01-11 NA 2.5