Home > Back-end >  R missing variable that is conditional on the others that are present
R missing variable that is conditional on the others that are present

Time:03-19

I have a dataset in R and I'm trying to fill out two missing values at the same time. I had used the pad function from library(padr) to fill out the data frame with missing date values. Now I have two additional fields that are NA.

I know what these values should be but I don't understand an easy way to code them into the dataframe and the dataframe is too long to do it manually.

The missing field for the sales column should be 0. The harder part here is the store column. There are three options for stores: store1, store2, store3. And each value in the Date will be listed three times. I don't know which store is missing for each day. In the example I'm including here, store2 is missing but later in the data frame it might be store1 or store3. Is there a way to fill out the missing store by knowing the other two stores that are missing?

Here is a screenshot of my dataframe.

enter image description here

And here is a section of it so it's reproducible.

structure(list(date = structure(c(18628, 18628, 18628, 18629, 
18629, 18629, 18630, 18630, 18630, 18631, 18631, 18631), class = "Date"), 
    store = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, NA, 3L, 1L, 
    2L, 2L), .Label = c("store1", "store2", "store3"), class = "factor"), 
    sales = c(153461, 2332, 1734, 176912, 53063, 17484, 243581, 
    NA, 412, 1739263, 427311, 9772)), row.names = c(NA, -12L), groups = structure(list(
    store = structure(c(1L, 2L, 3L, NA), .Label = c("store1", 
    "store2", "store3"), class = "factor"), .rows = structure(list(
        c(1L, 4L, 7L, 10L), c(2L, 5L, 11L, 12L), c(3L, 6L, 9L
        ), 8L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

CodePudding user response:

I guess you want a balanced pannel (for each date, three rows, one per store). I would go as follows:

Create a balanced dataset with dates and stores.

stores<-c('store_1','store_2','store_3')
dates<-seq(as.Date('2000-01-01'),as.Date('2001-01-01'),by='month')
data<-data.frame(expand.grid(stores,dates))

And now, left join your dataset. It will leave NA the sales column if it is not there, but you can fill it with a 0 easily.

  •  Tags:  
  • r
  • Related