Home > Back-end >  adding multiple rows based on observation dates
adding multiple rows based on observation dates

Time:05-12

I have data for multiple products, their launch date, and sales; there are other variables as well. But, these are the ones that I am using for manipulation. I want to add a column and a row in my dataset; the new variable/ column indicates months after the product's launch (launch). So, launch 1 will indicate the first month for each product and 2 will be the second month, and so on. I also want to add an observation (row) for each product with launch as 0 and sales as 0.

months<- as.Date(c("2011-04-01", "2011-05-01" , "2011-06-01", 
                 "2012-10-01", "2012-11-01", "2012-12-01",
                 "2011-04-01", "2011-05-01" , "2011-06-01",
                 "2013-06-01", "2013-07-01", "2013-08-01"))


product <- c("A", "A" , "A", 
     "B", "B", "B",
     "C", "C" , "C",
     "D", "D", "D")

sales<- c(75, 78,80,
         67, 65, 75, 
         86, 87, 87,
         90, 92, 94)

#This is how data looks right now..

input_data<- data.frame(months, product, sales)

Right now, I can add the launch column and assign the launch value the same as row_number after group_ by product, and it populates the launch as 1,2,3, etc. based on the months. However, I don't know how to add additional observations for each product. Right now, I am identifying the entry date of each product and creating a data frame with the 0 launch and sales, and binding the dataset. But, it is tedious and I am sure it could be done more efficiently.

#Expected outcome:
#I don't care about the additional dates row too much it can remain as NA, here I added it for making data frame

months1 <- as.Date(c ("2011-03-01",  "2011-04-01", "2011-05-01" , "2011-06-01", 
                     "2012-9-01",  "2012-10-01", "2012-11-01", "2012-12-01" ,
                     " 2011-03-01",  "2011-04-01", "2011-05-01" , "2011-06-01" ,
                  "2013-06-01", "2013-06-01", "2013-07-01", "2013-08-01"))
                 
launch<- c(0, 1, 2, 3,
         0, 1, 2, 3, 
         0, 1, 2, 3,
         0, 1, 2, 3)

product1 <- c("A", "A" , "A", "A",
     "B", "B", "B", "B",
     "C", "C" , "C", "C",
     "D", "D", "D", "D")

sales1<- c(0, 75, 78,80,
         0, 67, 65, 75, 
         0, 86, 87, 87,
         0, 90, 92, 94)

output_data <- data.frame (months1, launch, product1, sales1)

CodePudding user response:

We may use complete to expand the data after grouping by 'product'

library(lubridate)
library(dplyr)
library(tidyr)
input_data %>% 
  group_by(product) %>% 
  complete(months = first(months) %m % months(-1:2), 
    fill = list(sales = 0)) %>%
  mutate(launch = row_number() - 1) %>%
  ungroup %>%
  select(months, launch, product, sales)

-output

# A tibble: 16 × 4
   months     launch product sales
   <date>      <dbl> <chr>   <dbl>
 1 2011-03-01      0 A           0
 2 2011-04-01      1 A          75
 3 2011-05-01      2 A          78
 4 2011-06-01      3 A          80
 5 2012-09-01      0 B           0
 6 2012-10-01      1 B          67
 7 2012-11-01      2 B          65
 8 2012-12-01      3 B          75
 9 2011-03-01      0 C           0
10 2011-04-01      1 C          86
11 2011-05-01      2 C          87
12 2011-06-01      3 C          87
13 2013-05-01      0 D           0
14 2013-06-01      1 D          90
15 2013-07-01      2 D          92
16 2013-08-01      3 D          94
  •  Tags:  
  • r
  • Related