Home > database >  R dplyr full_join(x, y): fill NA with values from x
R dplyr full_join(x, y): fill NA with values from x

Time:06-21

I would like to joint two data frames:

library(dplyr)

set.seed(666)
x <- data.frame(id = 1:10, c1 = rnorm(10), c2 = rnorm(10), c3 = rnorm(10))
y <- data.frame(id = 1:10,  c1 = rnorm(10))

joined <- x |>
    full_join(y) |>
    arrange(id)

What is an elegant way to fill NAs of the new rows from y with the values from the columns of x?

target:

   id          c1          c2          c3
1   1  0.75331105  2.15004262 -0.69209929
2   1  0.75499616  2.15004262 -0.69209929
3   2  2.01435467 -1.77023084 -1.18304354
4   2 -0.64148890 -1.77023084 -1.18304354
                    ...

EDIT: tidyr::fill() works fine but appears to be extremely slow on moderate-large data sets (e.g. >100k rows, >20cols). I would be happy to see a data.table alternative

CodePudding user response:

Add a another more lines of code.

choose the columns you need to fill.

Edit: grouping isn't necessary in this situation. Since each new group starts with a number.

joined <- x |>
  full_join(y) |>
  arrange(id) |> 
  fill(c2:c3, .direction = "down")

 id      c1      c2      c3
   <int>   <dbl>   <dbl>   <dbl>
 1     1 -0.0822  1.18   -0.889 
 2     1  1.58    1.18   -0.889 
 3     2  0.120   0.0288  0.278 
 4     2  1.64    0.0288  0.278 
 5     3  0.0213 -0.166  -1.20  
 6     3 -0.404  -0.166  -1.20  
 7     4 -0.274  -1.53   -0.660 
 8     4 -0.0456 -1.53   -0.660 
 9     5 -0.881  -0.335  -1.02  
10     5 -2.47   -0.335  -1.02  

  • Related