I want to create a new variable called bankruptcy_c
, which will equal to the number of times the year of bankruptcy1, bankruptcy2, ..., bankruptcy10
is between 2007 and 2019, inclusive. At first, I created a variable that counted the number of times the columns had a date using the following code:
vars <- df %>%
mutate(bankruptcy_c = rowSums(!is.na(across(contains("bankruptcy")))))
However, now I'm looking to only count each instance if the year is between 2007 and 2019. Here is some sample data:
structure(list(rowid = c(16L, 28L, 29L, 37L, 80778L), bankruptcy1 = structure(c(9282,
11438, NA, 8664, 14048), class = "Date"), bankruptcy2 = structure(c(12892,
NA, NA, 14809, 15012), class = "Date"), bankruptcy3 = structure(c(14713,
NA, NA, NA, NA), class = "Date"), bankruptcy4 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy5 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy6 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy7 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy8 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy9 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy10 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
Using the sample data above, bankruptcy_c
would equal to 1
for rowid = 16
, 0
for rowid = 28
(because the dates are out of range), 0
for rowid = 29
(since all of the 10 instances are missing), 1
for rowid = 37
, and 2
for rowid = 80778
. Any help would be greatly appreciated!
CodePudding user response:
One approach would be to reshape the data longer, filter, and count.
library(tidyverse)
df %>%
pivot_longer(-rowid) %>%
filter(year(as.Date(value)) %in% 2007:2019) %>%
count(rowid) %>%
complete(rowid = df$rowid, fill = list(n=0)) # EDIT to list all rowids
# alternate filter syntax if you care about specific dates:
filter(value >= as.Date("2007-01-01"), value < as.Date("2020-01-01")) %>%
Result
# A tibble: 5 x 2
rowid n
<int> <dbl>
1 16 1
2 28 0
3 29 0
4 37 1
5 80778 2