I would like to joint two data frames:
library(dplyr)
set.seed(666)
x <- data.frame(id = 1:10, c1 = rnorm(10), c2 = rnorm(10), c3 = rnorm(10))
y <- data.frame(id = 1:10, c1 = rnorm(10))
joined <- x |>
full_join(y) |>
arrange(id)
What is an elegant way to fill NAs of the new rows from y
with the values from the columns of x
?
target:
id c1 c2 c3
1 1 0.75331105 2.15004262 -0.69209929
2 1 0.75499616 2.15004262 -0.69209929
3 2 2.01435467 -1.77023084 -1.18304354
4 2 -0.64148890 -1.77023084 -1.18304354
...
EDIT: tidyr::fill() works fine but appears to be extremely slow on moderate-large data sets (e.g. >100k rows, >20cols). I would be happy to see a data.table alternative
CodePudding user response:
Add a another more lines of code.
choose the columns you need to fill.
Edit: grouping isn't necessary in this situation. Since each new group starts with a number.
joined <- x |>
full_join(y) |>
arrange(id) |>
fill(c2:c3, .direction = "down")
id c1 c2 c3
<int> <dbl> <dbl> <dbl>
1 1 -0.0822 1.18 -0.889
2 1 1.58 1.18 -0.889
3 2 0.120 0.0288 0.278
4 2 1.64 0.0288 0.278
5 3 0.0213 -0.166 -1.20
6 3 -0.404 -0.166 -1.20
7 4 -0.274 -1.53 -0.660
8 4 -0.0456 -1.53 -0.660
9 5 -0.881 -0.335 -1.02
10 5 -2.47 -0.335 -1.02