Home > Software design >  creating very large data frame with multiple groups in columns and 0 values in R
creating very large data frame with multiple groups in columns and 0 values in R


I want to create a very large data frame that have 4 columns ( date, location_name, brand,DIV) , DIV column should have 0 value for all the rows and date column start from 27.12.2020 to current date.

I created small version of desired data frame as an example. Desired data frame:

date           location_name      brand        DIV
27.12.2020         x               a             0
27.12.2020         x               b             0
27.12.2020         y               a             0
27.12.2020         y               b             0
27.12.2020         z               a             0
27.12.2020         z               b             0
28.12.2020         x               a             0
28.12.2020         x               b             0
28.12.2020         y               a             0
28.12.2020         y               b             0
28.12.2020         z               a             0
28.12.2020         z               b             0

I am using this code but this does not give what I want :

dummy_df <- data.frame(DIV = rep(0, each =6),
                       date = rep('27.12.2020',each=6),
                       location_name = rep(c('x',"y","z"), each=2),
                       brand = rep(c('a',"b"),each=3))     

How can I write the code efficiently?

I appreciate any help.

CodePudding user response:

With expand.grid you can create a dataframe containing every combination of location_name and brand for each date. (DIV is always zero as requested)

dummy_df <- expand.grid(date = seq.Date(as.Date('2020-12-27'), Sys.Date(), by="day"),
                        location_name = c('x',"y","z"),
                        brand = c('a',"b"),
                        DIV = 0)
#> [1] 3294    4
#>         date location_name brand DIV
#> 1 2020-12-27             x     a   0
#> 2 2020-12-28             x     a   0
#> 3 2020-12-29             x     a   0
#> 4 2020-12-30             x     a   0
#> 5 2020-12-31             x     a   0
#> 6 2021-01-01             x     a   0

Created on 2022-06-28 by the reprex package (v2.0.1)

CodePudding user response:


  date = seq(as.Date("2020-12-27"), today(), by = "day") %>% 
  location_name = sample(c("x", "y", "z"), length(date), replace = TRUE), 
  brand = sample(c("a", "b", "c"), length(date), replace = TRUE),
  div = 0
#> # A tibble: 549 x 4
#>    date       location_name brand   div
#>    <chr>      <chr>         <chr> <dbl>
#>  1 27.12.2020 x             c         0
#>  2 28.12.2020 x             b         0
#>  3 29.12.2020 y             b         0
#>  4 30.12.2020 y             b         0
#>  5 31.12.2020 z             a         0
#>  6 01.01.2021 z             c         0
#>  7 02.01.2021 y             c         0
#>  8 03.01.2021 z             b         0
#>  9 04.01.2021 z             a         0
#> 10 05.01.2021 z             b         0
#> # ... with 539 more rows

Created on 2022-06-28 by the reprex package (v2.0.1)

  • Related