Home > Software engineering >  R Apply Complete-function after condition has been met (per group)
R Apply Complete-function after condition has been met (per group)

Time:11-28

I have monthly values for a factor variable Alphabet (containing factorlevels X, Y and Z) for 2021. See tibble DF below.

Some observations only start occuring later in the year. For example, recordings of X only start in june 2021.

My problem is that the source of the data omits values if they are zero. So for example I have values for Y in January (Sum is 2), but the data on for February is omitted by the source since there are no values recorded. Then March Y has a Sum of 5 so it's back.

What I want to do is complete this dataset. So for Y, I want a row that says feb 2021, Y, 0. I tried using the complete function for this.

complete(Month2, nesting (Alphabet), fill =  list( Sum=0)) 

However, this also completes the values for X for the whole year, which is incorrect. X only exists since june 2021. See tibble DF_Overcomplete below.

My solution is to have complete function conditional per factor level. So only after it has encountered a Sum that is not zero, it should start to complete and populate the Month2, Alphabat, and Sum variables. For Y it can complete the whole year (since January has a Sum !=0), but for X I only want to apply the complete function from June.

DF:

  structure(list(Month2 = structure(c(6L, 7L, 9L, 1L, 3L, 6L, 8L, 
    9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("jan 2021", 
    "feb 2021", "mrt 2021", "apr 2021", "mei 2021", "jun 2021", "jul 2021", 
    "aug 2021", "sep 2021", "okt 2021"), class = "factor"), Alphabet = structure(c(1L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L), .Label = c("X", "Y", "Z"), class = "factor"), Sum = c(1L, 
    1L, 2L, 2L, 5L, 10L, 4L, 2L, 2L, 2L, 9L, 24L, 46L, 71L, 91L, 
    97L, 54L, 57L, 81L)), row.names = c(NA, -19L), class = c("tbl_df", 
    "tbl", "data.frame"))

DF_Overcomplete:

structure(list(Month2 = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("jan 2021", "feb 2021", 
"mrt 2021", "apr 2021", "mei 2021", "jun 2021", "jul 2021", "aug 2021", 
"sep 2021", "okt 2021"), class = "factor"), Alphabet = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("X", 
"Y", "Z"), class = "factor"), Sum = c(0, 0, 0, 0, 0, 1, 1, 0, 
2, 0, 2, 0, 5, 0, 0, 10, 0, 4, 2, 2, 2, 9, 24, 46, 71, 91, 97, 
54, 57, 81)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-30L))

So basically I want this:

enter image description here

Thanks in advance!

CodePudding user response:

First change the factor Month2 to a Date so it's easily ordered. Then calculate the min and max month for each Alphabet. Use the complete as before, join and filter by the min and max.

library('dplyr')
library('tidyr')
df <- df %>% mutate(yr_mo = as.Date(paste('01', Month2), format = '%d %b %Y'))
start_end <- df %>% group_by(Alphabet) %>% summarise(min(yr_mo), max(yr_mo))
df_complete <- df %>% 
  complete(Month2, nesting(Alphabet), fill =  list(Sum = 0L)) %>% 
  mutate(yr_mo = as.Date(paste('01', Month2), format = '%d %b %Y')) %>% 
  left_join(start_end) %>% 
  filter(yr_mo >= `min(yr_mo)`, yr_mo <= `max(yr_mo)`) %>% 
  select(-c(`min(yr_mo)`, `max(yr_mo)`, yr_mo))

df_complete %>% print(n = Inf)
# A tibble: 24 x 3
   Month2   Alphabet   Sum
   <fct>    <fct>    <int>
 1 jan 2021 Y            2
 2 jan 2021 Z            2
 3 feb 2021 Y            0
 4 feb 2021 Z            9
 5 mrt 2021 Y            5
 6 mrt 2021 Z           24
 7 apr 2021 Y            0
 8 apr 2021 Z           46
 9 mei 2021 Y            0
10 mei 2021 Z           71
11 jun 2021 X            1
12 jun 2021 Y           10
13 jun 2021 Z           91
14 jul 2021 X            1
15 jul 2021 Y            0
16 jul 2021 Z           97
17 aug 2021 X            0
18 aug 2021 Y            4
19 aug 2021 Z           54
20 sep 2021 X            2
21 sep 2021 Y            2
22 sep 2021 Z           57
23 okt 2021 Y            2
24 okt 2021 Z           81
  • Related