I have a dataset in R and I'm trying to fill out two missing values at the same time. I had used the pad
function from library(padr)
to fill out the data frame with missing date values. Now I have two additional fields that are NA
.
I know what these values should be but I don't understand an easy way to code them into the dataframe and the dataframe is too long to do it manually.
The missing field for the sales column should be 0
. The harder part here is the store
column. There are three options for stores: store1
, store2
, store3
. And each value in the Date
will be listed three times. I don't know which store is missing for each day. In the example I'm including here, store2
is missing but later in the data frame it might be store1
or store3
. Is there a way to fill out the missing store by knowing the other two stores that are missing?
Here is a screenshot of my dataframe.
And here is a section of it so it's reproducible.
structure(list(date = structure(c(18628, 18628, 18628, 18629,
18629, 18629, 18630, 18630, 18630, 18631, 18631, 18631), class = "Date"),
store = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, NA, 3L, 1L,
2L, 2L), .Label = c("store1", "store2", "store3"), class = "factor"),
sales = c(153461, 2332, 1734, 176912, 53063, 17484, 243581,
NA, 412, 1739263, 427311, 9772)), row.names = c(NA, -12L), groups = structure(list(
store = structure(c(1L, 2L, 3L, NA), .Label = c("store1",
"store2", "store3"), class = "factor"), .rows = structure(list(
c(1L, 4L, 7L, 10L), c(2L, 5L, 11L, 12L), c(3L, 6L, 9L
), 8L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
CodePudding user response:
I guess you want a balanced pannel (for each date, three rows, one per store). I would go as follows:
Create a balanced dataset with dates and stores.
stores<-c('store_1','store_2','store_3')
dates<-seq(as.Date('2000-01-01'),as.Date('2001-01-01'),by='month')
data<-data.frame(expand.grid(stores,dates))
And now, left join your dataset. It will leave NA the sales column if it is not there, but you can fill it with a 0 easily.