Home > database >  How do I change / avoid NA's in column with dplyr Joins?
How do I change / avoid NA's in column with dplyr Joins?

Time:10-04

I am trying to figure out how to work with Joins in dplyr. When I join a and b with full_join, I get 4 states with missing values for statefips.

  1. Is there a better way to join, and avoid this problem altogether, without loosing any data?
  2. Can I add the statefips after joining a and b (the real a and b contains 4000 rows)?
library(tidyverse)


# create df's
a <- tibble::tribble(
                                                        ~statename, ~statefips,        ~date, ~emp,
                                                         "Alabama",          1, "2020-01-14",    2,
                                                      "California",          6, "2020-01-14",    2,
                                                         "Alabama",          1, "2020-01-15",    2,
                                                      "California",          6, "2020-01-15",    2,
                                                         "Alabama",          1, "2020-01-16",    3,
                                                      "California",          6, "2020-01-16",    3,
                                                         "Alabama",          1, "2020-01-17",    3,
                                                      "California",          6, "2020-01-17",    3,
                                                         "Alabama",          1, "2020-01-18",    4,
                                                      "California",          6, "2020-01-18",    4,
                                                         "Alabama",          1, "2020-01-19",    4,
                                                      "California",          6, "2020-01-19",    4,
                                                         "Alabama",          1, "2020-01-20",    4,
                                                      "California",          6, "2020-01-20",    5,
                                                         "Alabama",          1, "2020-01-21",    5,
                                                      "California",          6, "2020-01-21",    5,
                                                         "Alabama",          1, "2020-01-22",    5,
                                                      "California",          6, "2020-01-22",    5,
                                                         "Alabama",          1, "2020-01-21",    5,
                                                      "California",          6, "2020-01-21",    4,
                                                         "Alabama",          1, "2020-01-22",    4,
                                                      "California",          6, "2020-01-22",    4,
                                                         "Alabama",          1, "2020-01-23",    4,
                                                      "California",          6, "2020-01-23",    4,
                                                         "Alabama",          1, "2020-01-24",    4,
                                                      "California",          6, "2020-01-24",    4
                                                      )
b <- tibble::tribble(
                                                        ~statename,        ~date, ~ui_claims,
                                                         "Alabama", "2020-01-04",      "0.5",
                                                      "California", "2020-01-04",      "0.5",
                                                         "Alabama", "2020-01-11",      "0.5",
                                                      "California", "2020-01-11",      "2.5",
                                                         "Alabama", "2020-01-18",      "2.5",
                                                      "California", "2020-01-18",      "1.5"
                                                      )
# Join a and b
full_join <- full_join(a, b, by = c("statename", "date")) %>% arrange(date)

# my try to fix missing NA's (doesn't work)

state_id <- tibble::tribble(
                                                        ~statename, ~statefips,
                                                         "Alabama",          1,
                                                      "California",          6
                                                      )

full_join_fix <- full_join(full_join, state_id, by = "statename") %>% arrange(date)

CodePudding user response:

I am not quite sure, if this is what you are lookung for, but after a full_join we could arrange and then fill:

library(dplyr)
library(tidyr)

a %>% 
  full_join(b, by = c("statename", "date")) %>% 
  arrange(statename) %>% 
  fill(statefips, .direction = "down") %>% 
  print(n=40)
   statename  statefips date         emp ui_claims
   <chr>          <dbl> <chr>      <dbl> <chr>    
 1 Alabama            1 2020-01-14     2 NA       
 2 Alabama            1 2020-01-15     2 NA       
 3 Alabama            1 2020-01-16     3 NA       
 4 Alabama            1 2020-01-17     3 NA       
 5 Alabama            1 2020-01-18     4 2.5      
 6 Alabama            1 2020-01-19     4 NA       
 7 Alabama            1 2020-01-20     4 NA       
 8 Alabama            1 2020-01-21     5 NA       
 9 Alabama            1 2020-01-22     5 NA       
10 Alabama            1 2020-01-21     5 NA       
11 Alabama            1 2020-01-22     4 NA       
12 Alabama            1 2020-01-23     4 NA       
13 Alabama            1 2020-01-24     4 NA       
14 Alabama            1 2020-01-04    NA 0.5      
15 Alabama            1 2020-01-11    NA 0.5      
16 California         6 2020-01-14     2 NA       
17 California         6 2020-01-15     2 NA       
18 California         6 2020-01-16     3 NA       
19 California         6 2020-01-17     3 NA       
20 California         6 2020-01-18     4 1.5      
21 California         6 2020-01-19     4 NA       
22 California         6 2020-01-20     5 NA       
23 California         6 2020-01-21     5 NA       
24 California         6 2020-01-22     5 NA       
25 California         6 2020-01-21     4 NA       
26 California         6 2020-01-22     4 NA       
27 California         6 2020-01-23     4 NA       
28 California         6 2020-01-24     4 NA       
29 California         6 2020-01-04    NA 0.5      
30 California         6 2020-01-11    NA 2.5 
  • Related