I am trying to "clean" a dataset that has many "empty" rows deleted, however, I want these empty rows back (and adding NA
). Here is a toy dataset:
values <- rnorm(12)
data <- data.frame(ID = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 5),
event = c("A", "B", "C", "A", "B", "A", "B", "C", "B", "A", "B", "C"),
value = values) #values are random
What I want is to insert rows that are missing, i.e. ID 2 is missing group C, and 4 is missing A and C. And the expected result is as follows:
data_expanded <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5),
event = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"),
value = c(values[1:5], NA, values[6:8], NA, values[9], NA, values[10:12]))
The rows with NA
can be added at the end of the data frame (not necessarily to be grouped as in the example I provided). My real dataset has many rows, therefore, a method that is memory-efficient is highly appreciated. I do prefer the method using R, tidyr (or tidyverse).
CodePudding user response:
tidyr::complete()
does exactly what you want:
library(tidyr)
values <- rnorm(12)
data <- data.frame(ID = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 5),
event = c("A", "B", "C", "A", "B", "A", "B", "C", "B", "A", "B", "C"),
value = values) #values are random
data |>
complete(ID, event)
#> # A tibble: 15 × 3
#> ID event value
#> <dbl> <chr> <dbl>
#> 1 1 A 0.397
#> 2 1 B -0.595
#> 3 1 C 0.743
#> 4 2 A -0.0421
#> 5 2 B 1.47
#> 6 2 C NA
#> 7 3 A 0.218
#> 8 3 B -0.525
#> 9 3 C 1.05
#> 10 4 A NA
#> 11 4 B -1.79
#> 12 4 C NA
#> 13 5 A 1.18
#> 14 5 B -1.39
#> 15 5 C 0.748
Created on 2022-12-12 with reprex v2.0.2