I generate the following test dataframe output when running the code below, for original data
dataframe and by running the function state_inflow
:
> test
Previous_State 1 2 3
1: X0 2 0 0
2: X1 0 0 0
3: X2 0 0 1
library(data.table)
library(dplyr)
library(tidyverse)
data <-
data.frame(
ID = c(1,1,1,2,2,2,3,3,3),
Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X3")
)
state_inflow <- function(mydat, target_state, period_col_name, fct) {
dcast(
setDT(mydat)[, Previous_State := factor(shift(State, fill = target_state)), by = ID][
, period_factor := lapply(.SD, factor), .SDcols = period_col_name],
Previous_State ~ period_factor, fct,
value.var = "Values", subset = .(State == target_state), drop = FALSE
)
}
test <- state_inflow(data, "X0", "Period_1", length)
I'm adding a row to the dataframe to include those "state" combinations that never touch the target_state
category (see ID 3 in the data
dataframe; across periods it never touches the target state of x0 and is therefore excluded from the original test
output shown above), and populating all of those columns added for that new row with 0's. I am now doing this as follows:
test %>%
complete(Previous_State = unique(data$State)) %>%
replace(is.na(.), 0)
which gives me correct output of:
# A tibble: 4 x 4
Previous_State `1` `2` `3`
<chr> <int> <int> <int>
1 X0 2 0 0
2 X1 0 0 0
3 X2 0 0 1
4 X3 0 0 0
See how row 4, "X3", was added with all 0's? That's correct output.
I'm trying to learn how to use complete(... ,fill = ...)
. How would I accomplish what I did above, but by instead using fill = ...
inside the complete(...)
function?
CodePudding user response:
The fill
argument of complete
expects a list to set the value for each individual column. By default, this is NA
for all columns. You can change this by setting the desired filling value for each column separately:
test %>%
complete(Previous_State = unique(data$State),
fill = list(`1` = 0, `2` = 0, `3` = 0))
# A tibble: 4 x 4
# Previous_State `1` `2` `3`
# <chr> <dbl> <dbl> <dbl>
#1 X0 2 0 0
#2 X1 0 0 0
#3 X2 0 0 1
#4 X3 0 0 0
Since your question is about the tidyverse: A tidy data frame is normalized so that you usually have only one column for each property. This makes the completing much easier to archive the same result:
test %>%
pivot_longer(matches("^[0-9] $")) %>%
complete(Previous_State = unique(data$State), name,
fill = list(value = 0)) %>%
pivot_wider()