Home > OS >  How to expand and count consecutively in R?
How to expand and count consecutively in R?

Time:12-27

How do you expand a dataset and count consecutively by year?

Specifically, my dataset has 15 rows and 4 columns: the firm ID, sector code, year and month. This data has 15 monthly information over 2 years. A firm that took a monthly action is observed on that month. If every firm was observed each month, I would have had 30obs. I want to expand the data so that a firm that did not take a monthly action is recorded as 0 and a firm that did take an action is recorded as 1. I generate a series of 3 variables. The first is Treat, which equals 1 if a firm was previously in the dataset, and 0 if the firm's obs is added (because they did not take an action on that month). For example, firm 1 will have six 1s whereas firm 2 will get a . when month=3 and year=2010. Next, I generate Treat2, which replaces . with 0 and keeps the 1s unchanged. Finally, I generate a variable Treat3, which counts whether a given firm has consecutively taken an action at least three times within a year.

Here is what the original data looks like:

Firm_ID1 Sector Year Month
1 1 2010 1
1 1 2010 2
1 1 2010 3
1 1 2011 1
1 1 2011 2
1 1 2011 3
2 1 2010 1
2 1 2010 2
2 1 2011 1
2 1 2011 2
3 1 2010 1
4 2 2011 1
4 2 2011 2
5 2 2011 2
5 2 2011 3

Here is what I am looking for

Firm_ID1 Sector Year Month Treat Treat1 Treat2
1 1 2010 1 1 1 1
1 1 2010 2 1 1 1
1 1 2010 3 1 1 1
1 1 2011 1 1 1 1
1 1 2011 2 1 1 1
1 1 2011 3 1 1 1
2 1 2010 1 1 1 1
2 1 2010 2 1 1 1
2 1 2010 3 . 0 0
2 1 2011 1 1 1 0
2 1 2011 2 1 1 0
2 1 2011 3 . 0 0
3 1 2010 1 1 1 0
3 1 2010 2 . 0 0
3 1 2010 3 . 0 0
3 1 2011 1 . 0 0
3 1 2011 2 . 0 0
3 1 2011 3 . 0 0
4 2 2010 1 . 0 0
4 2 2010 2 . 0 0
4 2 2010 3 . 0 0
4 2 2011 1 1 0 0
4 2 2011 2 1 0 0
4 2 2011 3 . 0 0
5 2 2010 1 . 0 0
5 2 2010 2 . 0 0
5 2 2010 3 . 0 0
5 2 2011 1 . 0 0
5 2 2011 2 1 1 0
5 2 2011 3 1 1 0

This what I try without success:

  dfdata1 <- dfdata %>%
    group_by(Firm_ID1,Year)%>%
    mutate(isconsecutive =duplicated(rbind(dfdata, transform(dfdata, Month=Month 1), transform(dfdata, Month= Month-1)), fromLast = TRUE)[1:nrow(dfdata)]
    )
  

CodePudding user response:

You could use tidyr's complete function:

library(dplyr)
library(tidyr)

data %>%
  complete(Firm_ID1, Year, Month) %>% 
  mutate(Treat2 =  !is.na(Sector)) %>% 
  group_by(Firm_ID1, Year) %>% 
  mutate(Treat3 =  all(
    !is.na(Sector), 
    !is.na(lag(Sector, n = 1, default = TRUE)), 
    !is.na(lag(Sector, n = 2, default = TRUE))
    )
    ) %>% 
  ungroup() 

This returns

# A tibble: 30 x 6
   Firm_ID1  Year Month Sector Treat2 Treat3
      <dbl> <dbl> <dbl>  <dbl>  <int>  <int>
 1        1  2010     1      1      1      1
 2        1  2010     2      1      1      1
 3        1  2010     3      1      1      1
 4        1  2011     1      1      1      1
 5        1  2011     2      1      1      1
 6        1  2011     3      1      1      1
 7        2  2010     1      1      1      0
 8        2  2010     2      1      1      0
 9        2  2010     3     NA      0      0
10        2  2011     1      1      1      0
11        2  2011     2      1      1      0
12        2  2011     3     NA      0      0
13        3  2010     1      1      1      0
14        3  2010     2     NA      0      0
15        3  2010     3     NA      0      0
16        3  2011     1     NA      0      0
17        3  2011     2     NA      0      0
18        3  2011     3     NA      0      0
19        4  2010     1     NA      0      0
20        4  2010     2     NA      0      0
21        4  2010     3     NA      0      0
22        4  2011     1      2      1      0
23        4  2011     2      2      1      0
24        4  2011     3     NA      0      0
25        5  2010     1     NA      0      0
26        5  2010     2     NA      0      0
27        5  2010     3     NA      0      0
28        5  2011     1     NA      0      0
29        5  2011     2      2      1      0
30        5  2011     3      2      1      0

I skipped the . part for Treat one because it breaks the integer type. It's easy to create if neccessary.

CodePudding user response:

here is a different data.table approach... not sure if this is what you are looking for, but it's how I would approach things

library(data.table)
# cast to wide and calculate length
new_DT <- dcast(DT, Firm_ID1   Year ~ Month, value.var = "Firm_ID1", 
                fun.aggregate = length, drop = FALSE)
# calculate year actions
new_DT[, Treat2 := ifelse(rowSums(.SD) >= 3, 1, 0), .SDcols = patterns("^[0-9] $")][]
#    Firm_ID1 Year 1 2 3 Treat2
# 1:        1 2010 1 1 1      1
# 2:        1 2011 1 1 1      1
# 3:        2 2010 1 1 0      0
# 4:        2 2011 1 1 0      0
# 5:        3 2010 1 0 0      0
# 6:        3 2011 0 0 0      0
# 7:        4 2010 0 0 0      0
# 8:        4 2011 1 1 0      0
# 9:        5 2010 0 0 0      0
#10:        5 2011 0 1 1      0
 

sample data

DT <- fread("Firm_ID1   Sector  Year    Month
1   1   2010    1
1   1   2010    2
1   1   2010    3
1   1   2011    1
1   1   2011    2
1   1   2011    3
2   1   2010    1
2   1   2010    2
2   1   2011    1
2   1   2011    2
3   1   2010    1
4   2   2011    1
4   2   2011    2
5   2   2011    2
5   2   2011    3")
  • Related