Is there a function in R to fill in missing data in a variable-CodePudding

I have a set of panel data, df1, that take the following form:

df1:

year     state          eligibility      coverage     
1990       AL             .87              .70
1991       AL             .78              .61
1992       AL             .82              .63
1993       AL             .79              .69
1994       AL             .82              .73
1990       AK             .91              .88
1991       AK             .83              .79
1992       AK             .82              .71
1993       AK             .77              .69
1994       AK             .82              .73

I need to add a variable "professionalism" from a different set of data, df2, but the problem is that df2 only has observations measured on even years. df2 thus takes the following form:

df2:

year     state          professionalism 
1990       AL             1.33           
1992       AL             1.40             
1994       AL             1.42             
1990       AK            -0.92             
1992       AK            -0.98              
1994       AK            -1.02

Is there a function R that will add the odd years into df2, copying the value of the year 1, producing the following output:

df2':

year     state          professionalism 
1990       AL             1.33 
1991       AL             1.40                       
1992       AL             1.40 
1993       AL             1.42            
1994       AL             1.42             
1990       AK            -0.92  
1991       AK            -0.98            
1992       AK            -0.98
1993       AK            -1.02               
1994       AK            -1.02

I can then merge the professionalism variable from the new df2' to df1... is this possible?

CodePudding user response：

We can use complete with fill

library(dplyr)
library(tidyr)
df2 %>%
   complete(year = 1990:1994, state) %>%
   group_by(state) %>%
   fill(professionalism, .direction = "updown") %>%
   ungroup %>%
   arrange(state, year)

-output

# A tibble: 10 x 3
    year state professionalism
   <int> <chr>           <dbl>
 1  1990 AK              -0.92
 2  1991 AK              -0.98
 3  1992 AK              -0.98
 4  1993 AK              -1.02
 5  1994 AK              -1.02
 6  1990 AL               1.33
 7  1991 AL               1.4 
 8  1992 AL               1.4 
 9  1993 AL               1.42
10  1994 AL               1.42

data

df2 <- structure(list(year = c(1990L, 1992L, 1994L, 1990L, 1992L, 1994L
), state = c("AL", "AL", "AL", "AK", "AK", "AK"), professionalism = c(1.33, 
1.4, 1.42, -0.92, -0.98, -1.02)), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response：

I think the easiest way to do it is to create a new column in df1 that rounds the year to an even value, and then left_join the data from df2:

library(tidyverse)

#Setting up example data
df1 <- tribble(
~year, ~state,   ~eligibility,   ~coverage,     
1990,  "AL",     .87,            .70,
1991,  "AL",     .78,            .61,
1992,  "AL",     .82,            .63,
1993,  "AL",     .79,            .69,
1994,  "AL",     .82,            .73,
1990,  "AK",     .91,            .88,
1991,  "AK",     .83,            .79,
1992,  "AK",     .82,            .71,
1993,  "AK",     .77,            .69,
1994,  "AK",     .82,            .73)

df2 <- tribble(
~year, ~state, ~professionalism,
1990,   "AL",             1.33,         
1992,   "AL",             1.40,           
1994,   "AL",             1.42,           
1990,   "AK",            -0.92,           
1992,   "AK",            -0.98,            
1994,   "AK",            -1.02)

#Create a "year even" variable in df1, then left join from df2
df1 <- df1 %>% mutate(year_even = ceiling(year/2)*2)
df1 <- left_join(df1, df2, by = c("year_even" = "year", "state" = "state"))