Home > OS >  Sum across columns if date is not missing and if year of date is after a certain number
Sum across columns if date is not missing and if year of date is after a certain number

Time:02-08

I want to create a new variable called bankruptcy_c, which will equal to the number of times the year of bankruptcy1, bankruptcy2, ..., bankruptcy10 is between 2007 and 2019, inclusive. At first, I created a variable that counted the number of times the columns had a date using the following code:

vars <- df %>%
  mutate(bankruptcy_c = rowSums(!is.na(across(contains("bankruptcy")))))

However, now I'm looking to only count each instance if the year is between 2007 and 2019. Here is some sample data:

structure(list(rowid = c(16L, 28L, 29L, 37L, 80778L), bankruptcy1 = structure(c(9282, 
11438, NA, 8664, 14048), class = "Date"), bankruptcy2 = structure(c(12892, 
NA, NA, 14809, 15012), class = "Date"), bankruptcy3 = structure(c(14713, 
NA, NA, NA, NA), class = "Date"), bankruptcy4 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy5 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy6 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy7 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy8 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy9 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy10 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

Using the sample data above, bankruptcy_c would equal to 1 for rowid = 16, 0 for rowid = 28 (because the dates are out of range), 0 for rowid = 29 (since all of the 10 instances are missing), 1 for rowid = 37, and 2 for rowid = 80778. Any help would be greatly appreciated!

CodePudding user response:

One approach would be to reshape the data longer, filter, and count.

library(tidyverse)
df %>%
  pivot_longer(-rowid) %>%
  filter(year(as.Date(value)) %in% 2007:2019) %>%
  count(rowid) %>%
  complete(rowid = df$rowid, fill = list(n=0)) # EDIT to list all rowids


# alternate filter syntax if you care about specific dates:
filter(value >= as.Date("2007-01-01"), value < as.Date("2020-01-01")) %>% 

Result

# A tibble: 5 x 2
  rowid     n
  <int> <dbl>
1    16     1
2    28     0
3    29     0
4    37     1
5 80778     2
  •  Tags:  
  • Related