I am working with R. i have a list of datasets where each of those sets should have a row length 5 for each month (Jan-May). it should look like this:
data.frame(name = rep("B", 5),
doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"),
i_name = rep("Aa",5),
aggregation = rep("34"), 5)
but some of my datasets dont have data for certain months and therefore have a shorter row length. like this:
data.frame(name = "A",
doc_month = "2022.01",
i_name = "Aa",
aggregation = "34")
I would like to extend each dataset with the specific months , copy all the other information into the row and put a 0 for aggregation.
I tried to use extend and complete by tidyr but couldnt make it work.
CodePudding user response:
With tidyr's complete
.
Also tweaked aggregation = rep(34, 5)
.
library(tidyverse)
df1 <- data.frame(name = rep("B", 5),
doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"),
i_name = rep("Aa",5),
aggregation = rep(34, 5))
df2 <- data.frame(name = "A",
doc_month = "2022.01",
i_name = "Aa",
aggregation = 34)
bind_rows(df1, df2) |>
complete(doc_month, nesting(name, i_name), fill = list(aggregation = 0))
#> # A tibble: 10 × 4
#> doc_month name i_name aggregation
#> <chr> <chr> <chr> <dbl>
#> 1 2022.01 A Aa 34
#> 2 2022.01 B Aa 34
#> 3 2022.02 A Aa 0
#> 4 2022.02 B Aa 34
#> 5 2022.03 A Aa 0
#> 6 2022.03 B Aa 34
#> 7 2022.04 A Aa 0
#> 8 2022.04 B Aa 34
#> 9 2022.05 A Aa 0
#> 10 2022.05 B Aa 34
Created on 2022-06-09 by the reprex package (v2.0.1)
CodePudding user response:
You could create a skeleton dataset with the five months and then join it to each of your partial datasets.
library(dplyr)
library(tidyr)
reference <- data.frame(doc_month = c("2022.01", "2022.02", "2022.03", "2022.04", "2022.05"))
data_A |>
full_join(reference, by = "doc_month") |>
mutate(aggregation = replace_na(aggregation, "0"))