Home > Software design >  How to use fill inside the tidyverse complete function to fill all dataframe columns?
How to use fill inside the tidyverse complete function to fill all dataframe columns?

Time:05-13

I generate the following test dataframe output when running the code below, for original data dataframe and by running the function state_inflow:

> test
   Previous_State 1 2 3
1:             X0 2 0 0
2:             X1 0 0 0
3:             X2 0 0 1

    library(data.table)
    library(dplyr)
    library(tidyverse)
    
    data <- 
      data.frame(
        ID = c(1,1,1,2,2,2,3,3,3),
        Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
        Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
        Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
        State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X3")
      )
    
    state_inflow <- function(mydat, target_state, period_col_name, fct) {
      dcast(
        setDT(mydat)[, Previous_State := factor(shift(State, fill = target_state)), by = ID][
          , period_factor := lapply(.SD, factor), .SDcols = period_col_name],
        Previous_State ~ period_factor, fct, 
        value.var = "Values", subset = .(State == target_state), drop = FALSE
      ) 
    }
    
    test <- state_inflow(data, "X0", "Period_1", length) 

I'm adding a row to the dataframe to include those "state" combinations that never touch the target_state category (see ID 3 in the data dataframe; across periods it never touches the target state of x0 and is therefore excluded from the original test output shown above), and populating all of those columns added for that new row with 0's. I am now doing this as follows:

test %>%
  complete(Previous_State = unique(data$State)) %>%
  replace(is.na(.), 0)

which gives me correct output of:

# A tibble: 4 x 4
  Previous_State   `1`   `2`   `3`
  <chr>          <int> <int> <int>
1 X0                 2     0     0
2 X1                 0     0     0
3 X2                 0     0     1
4 X3                 0     0     0

See how row 4, "X3", was added with all 0's? That's correct output.

I'm trying to learn how to use complete(... ,fill = ...). How would I accomplish what I did above, but by instead using fill = ... inside the complete(...) function?

CodePudding user response:

The fill argument of complete expects a list to set the value for each individual column. By default, this is NA for all columns. You can change this by setting the desired filling value for each column separately:

test %>%
  complete(Previous_State = unique(data$State),
    fill = list(`1` = 0, `2` = 0, `3` = 0))

# A tibble: 4 x 4
#  Previous_State   `1`   `2`   `3`
#  <chr>          <dbl> <dbl> <dbl>
#1 X0                 2     0     0
#2 X1                 0     0     0
#3 X2                 0     0     1
#4 X3                 0     0     0

Since your question is about the tidyverse: A tidy data frame is normalized so that you usually have only one column for each property. This makes the completing much easier to archive the same result:

test %>%
  pivot_longer(matches("^[0-9] $")) %>%
  complete(Previous_State = unique(data$State), name,
    fill = list(value = 0)) %>%
  pivot_wider()
  • Related