How to expand and count consecutively in R?-CodePudding

How do you expand a dataset and count consecutively by year?

Specifically, my dataset has 15 rows and 4 columns: the firm ID, sector code, year and month. This data has 15 monthly information over 2 years. A firm that took a monthly action is observed on that month. If every firm was observed each month, I would have had 30obs. I want to expand the data so that a firm that did not take a monthly action is recorded as 0 and a firm that did take an action is recorded as 1. I generate a series of 3 variables. The first is Treat, which equals 1 if a firm was previously in the dataset, and 0 if the firm's obs is added (because they did not take an action on that month). For example, firm 1 will have six 1s whereas firm 2 will get a . when month=3 and year=2010. Next, I generate Treat2, which replaces . with 0 and keeps the 1s unchanged. Finally, I generate a variable Treat3, which counts whether a given firm has consecutively taken an action at least three times within a year.

Here is what the original data looks like:

Firm_ID1	Sector	Year	Month
1	1	2010	1
1	1	2010	2
1	1	2010	3
1	1	2011	1
1	1	2011	2
1	1	2011	3
2	1	2010	1
2	1	2010	2
2	1	2011	1
2	1	2011	2
3	1	2010	1
4	2	2011	1
4	2	2011	2
5	2	2011	2
5	2	2011	3

Here is what I am looking for

Firm_ID1	Sector	Year	Month	Treat	Treat1	Treat2
1	1	2010	1	1	1	1
1	1	2010	2	1	1	1
1	1	2010	3	1	1	1
1	1	2011	1	1	1	1
1	1	2011	2	1	1	1
1	1	2011	3	1	1	1
2	1	2010	1	1	1	1
2	1	2010	2	1	1	1
2	1	2010	3	.	0	0
2	1	2011	1	1	1	0
2	1	2011	2	1	1	0
2	1	2011	3	.	0	0
3	1	2010	1	1	1	0
3	1	2010	2	.	0	0
3	1	2010	3	.	0	0
3	1	2011	1	.	0	0
3	1	2011	2	.	0	0
3	1	2011	3	.	0	0
4	2	2010	1	.	0	0
4	2	2010	2	.	0	0
4	2	2010	3	.	0	0
4	2	2011	1	1	0	0
4	2	2011	2	1	0	0
4	2	2011	3	.	0	0
5	2	2010	1	.	0	0
5	2	2010	2	.	0	0
5	2	2010	3	.	0	0
5	2	2011	1	.	0	0
5	2	2011	2	1	1	0
5	2	2011	3	1	1	0

This what I try without success:

  dfdata1 <- dfdata %>%
    group_by(Firm_ID1,Year)%>%
    mutate(isconsecutive =duplicated(rbind(dfdata, transform(dfdata, Month=Month 1), transform(dfdata, Month= Month-1)), fromLast = TRUE)[1:nrow(dfdata)]
    )

CodePudding user response：

You could use tidyr's complete function:

library(dplyr)
library(tidyr)

data %>%
  complete(Firm_ID1, Year, Month) %>% 
  mutate(Treat2 =  !is.na(Sector)) %>% 
  group_by(Firm_ID1, Year) %>% 
  mutate(Treat3 =  all(
    !is.na(Sector), 
    !is.na(lag(Sector, n = 1, default = TRUE)), 
    !is.na(lag(Sector, n = 2, default = TRUE))
    )
    ) %>% 
  ungroup()

This returns

# A tibble: 30 x 6
   Firm_ID1  Year Month Sector Treat2 Treat3
      <dbl> <dbl> <dbl>  <dbl>  <int>  <int>
 1        1  2010     1      1      1      1
 2        1  2010     2      1      1      1
 3        1  2010     3      1      1      1
 4        1  2011     1      1      1      1
 5        1  2011     2      1      1      1
 6        1  2011     3      1      1      1
 7        2  2010     1      1      1      0
 8        2  2010     2      1      1      0
 9        2  2010     3     NA      0      0
10        2  2011     1      1      1      0
11        2  2011     2      1      1      0
12        2  2011     3     NA      0      0
13        3  2010     1      1      1      0
14        3  2010     2     NA      0      0
15        3  2010     3     NA      0      0
16        3  2011     1     NA      0      0
17        3  2011     2     NA      0      0
18        3  2011     3     NA      0      0
19        4  2010     1     NA      0      0
20        4  2010     2     NA      0      0
21        4  2010     3     NA      0      0
22        4  2011     1      2      1      0
23        4  2011     2      2      1      0
24        4  2011     3     NA      0      0
25        5  2010     1     NA      0      0
26        5  2010     2     NA      0      0
27        5  2010     3     NA      0      0
28        5  2011     1     NA      0      0
29        5  2011     2      2      1      0
30        5  2011     3      2      1      0

I skipped the . part for Treat one because it breaks the integer type. It's easy to create if neccessary.

CodePudding user response：

here is a different data.table approach... not sure if this is what you are looking for, but it's how I would approach things

library(data.table)
# cast to wide and calculate length
new_DT <- dcast(DT, Firm_ID1   Year ~ Month, value.var = "Firm_ID1", 
                fun.aggregate = length, drop = FALSE)
# calculate year actions
new_DT[, Treat2 := ifelse(rowSums(.SD) >= 3, 1, 0), .SDcols = patterns("^[0-9] $")][]
#    Firm_ID1 Year 1 2 3 Treat2
# 1:        1 2010 1 1 1      1
# 2:        1 2011 1 1 1      1
# 3:        2 2010 1 1 0      0
# 4:        2 2011 1 1 0      0
# 5:        3 2010 1 0 0      0
# 6:        3 2011 0 0 0      0
# 7:        4 2010 0 0 0      0
# 8:        4 2011 1 1 0      0
# 9:        5 2010 0 0 0      0
#10:        5 2011 0 1 1      0

sample data

DT <- fread("Firm_ID1   Sector  Year    Month
1   1   2010    1
1   1   2010    2
1   1   2010    3
1   1   2011    1
1   1   2011    2
1   1   2011    3
2   1   2010    1
2   1   2010    2
2   1   2011    1
2   1   2011    2
3   1   2010    1
4   2   2011    1
4   2   2011    2
5   2   2011    2
5   2   2011    3")