I have data for multiple products, their launch date, and sales; there are other variables as well. But, these are the ones that I am using for manipulation. I want to add a column and a row in my dataset; the new variable/ column indicates months after the product's launch (launch). So, launch 1 will indicate the first month for each product and 2 will be the second month, and so on. I also want to add an observation (row) for each product with launch as 0 and sales as 0.
months<- as.Date(c("2011-04-01", "2011-05-01" , "2011-06-01",
"2012-10-01", "2012-11-01", "2012-12-01",
"2011-04-01", "2011-05-01" , "2011-06-01",
"2013-06-01", "2013-07-01", "2013-08-01"))
product <- c("A", "A" , "A",
"B", "B", "B",
"C", "C" , "C",
"D", "D", "D")
sales<- c(75, 78,80,
67, 65, 75,
86, 87, 87,
90, 92, 94)
#This is how data looks right now..
input_data<- data.frame(months, product, sales)
Right now, I can add the launch column and assign the launch value the same as row_number after group_ by product, and it populates the launch as 1,2,3, etc. based on the months. However, I don't know how to add additional observations for each product. Right now, I am identifying the entry date of each product and creating a data frame with the 0 launch and sales, and binding the dataset. But, it is tedious and I am sure it could be done more efficiently.
#Expected outcome:
#I don't care about the additional dates row too much it can remain as NA, here I added it for making data frame
months1 <- as.Date(c ("2011-03-01", "2011-04-01", "2011-05-01" , "2011-06-01",
"2012-9-01", "2012-10-01", "2012-11-01", "2012-12-01" ,
" 2011-03-01", "2011-04-01", "2011-05-01" , "2011-06-01" ,
"2013-06-01", "2013-06-01", "2013-07-01", "2013-08-01"))
launch<- c(0, 1, 2, 3,
0, 1, 2, 3,
0, 1, 2, 3,
0, 1, 2, 3)
product1 <- c("A", "A" , "A", "A",
"B", "B", "B", "B",
"C", "C" , "C", "C",
"D", "D", "D", "D")
sales1<- c(0, 75, 78,80,
0, 67, 65, 75,
0, 86, 87, 87,
0, 90, 92, 94)
output_data <- data.frame (months1, launch, product1, sales1)
CodePudding user response:
We may use complete
to expand the data after grouping by 'product'
library(lubridate)
library(dplyr)
library(tidyr)
input_data %>%
group_by(product) %>%
complete(months = first(months) %m % months(-1:2),
fill = list(sales = 0)) %>%
mutate(launch = row_number() - 1) %>%
ungroup %>%
select(months, launch, product, sales)
-output
# A tibble: 16 × 4
months launch product sales
<date> <dbl> <chr> <dbl>
1 2011-03-01 0 A 0
2 2011-04-01 1 A 75
3 2011-05-01 2 A 78
4 2011-06-01 3 A 80
5 2012-09-01 0 B 0
6 2012-10-01 1 B 67
7 2012-11-01 2 B 65
8 2012-12-01 3 B 75
9 2011-03-01 0 C 0
10 2011-04-01 1 C 86
11 2011-05-01 2 C 87
12 2011-06-01 3 C 87
13 2013-05-01 0 D 0
14 2013-06-01 1 D 90
15 2013-07-01 2 D 92
16 2013-08-01 3 D 94