Home > Blockchain >  adding rows in datasets for missing values with R
adding rows in datasets for missing values with R

Time:06-10

I am working with R. i have a list of datasets where each of those sets should have a row length 5 for each month (Jan-May). it should look like this:

data.frame(name = rep("B", 5), 
           doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"), 
           i_name = rep("Aa",5), 
           aggregation = rep("34"), 5)

but some of my datasets dont have data for certain months and therefore have a shorter row length. like this:

data.frame(name = "A", 
           doc_month = "2022.01", 
           i_name = "Aa", 
           aggregation = "34")

I would like to extend each dataset with the specific months , copy all the other information into the row and put a 0 for aggregation.

I tried to use extend and complete by tidyr but couldnt make it work.

CodePudding user response:

With tidyr's complete.

Also tweaked aggregation = rep(34, 5).

library(tidyverse)

df1 <- data.frame(name = rep("B", 5), 
           doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"), 
           i_name = rep("Aa",5), 
           aggregation = rep(34, 5))

df2 <- data.frame(name = "A", 
           doc_month = "2022.01", 
           i_name = "Aa", 
           aggregation = 34)

bind_rows(df1, df2) |> 
  complete(doc_month, nesting(name, i_name), fill = list(aggregation = 0))
#> # A tibble: 10 × 4
#>    doc_month name  i_name aggregation
#>    <chr>     <chr> <chr>        <dbl>
#>  1 2022.01   A     Aa              34
#>  2 2022.01   B     Aa              34
#>  3 2022.02   A     Aa               0
#>  4 2022.02   B     Aa              34
#>  5 2022.03   A     Aa               0
#>  6 2022.03   B     Aa              34
#>  7 2022.04   A     Aa               0
#>  8 2022.04   B     Aa              34
#>  9 2022.05   A     Aa               0
#> 10 2022.05   B     Aa              34

Created on 2022-06-09 by the reprex package (v2.0.1)

CodePudding user response:

You could create a skeleton dataset with the five months and then join it to each of your partial datasets.

library(dplyr)
library(tidyr)

reference <- data.frame(doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"))

data_A |>
        full_join(reference, by = "doc_month") |> 
        mutate(aggregation = replace_na(aggregation, "0"))
  • Related