Home > Blockchain >  Expanding a column with row of NA when there is no match in R
Expanding a column with row of NA when there is no match in R

Time:12-13

I am trying to "clean" a dataset that has many "empty" rows deleted, however, I want these empty rows back (and adding NA). Here is a toy dataset:

values <- rnorm(12)
data <- data.frame(ID = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 5),
                   event = c("A", "B", "C", "A", "B", "A", "B", "C", "B", "A", "B", "C"),
                   value = values) #values are random

What I want is to insert rows that are missing, i.e. ID 2 is missing group C, and 4 is missing A and C. And the expected result is as follows:

data_expanded <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5),
                   event = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"),
                   value = c(values[1:5], NA, values[6:8], NA, values[9], NA, values[10:12]))

The rows with NA can be added at the end of the data frame (not necessarily to be grouped as in the example I provided). My real dataset has many rows, therefore, a method that is memory-efficient is highly appreciated. I do prefer the method using R, tidyr (or tidyverse).

CodePudding user response:

tidyr::complete() does exactly what you want:

library(tidyr)

values <- rnorm(12)
data <- data.frame(ID = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 5),
                   event = c("A", "B", "C", "A", "B", "A", "B", "C", "B", "A", "B", "C"),
                   value = values) #values are random

data |> 
  complete(ID, event)

#> # A tibble: 15 × 3
#>       ID event   value
#>    <dbl> <chr>   <dbl>
#>  1     1 A      0.397 
#>  2     1 B     -0.595 
#>  3     1 C      0.743 
#>  4     2 A     -0.0421
#>  5     2 B      1.47  
#>  6     2 C     NA     
#>  7     3 A      0.218 
#>  8     3 B     -0.525 
#>  9     3 C      1.05  
#> 10     4 A     NA     
#> 11     4 B     -1.79  
#> 12     4 C     NA     
#> 13     5 A      1.18  
#> 14     5 B     -1.39  
#> 15     5 C      0.748

Created on 2022-12-12 with reprex v2.0.2

  • Related