How do you expand a dataset and count consecutively by year?
Specifically, my dataset has 15 rows and 4 columns: the firm ID, sector code, year and month. This data has 15 monthly information over 2 years. A firm that took a monthly action is observed on that month. If every firm was observed each month, I would have had 30obs. I want to expand the data so that a firm that did not take a monthly action is recorded as 0 and a firm that did take an action is recorded as 1. I generate a series of 3 variables. The first is Treat, which equals 1 if a firm was previously in the dataset, and 0 if the firm's obs is added (because they did not take an action on that month). For example, firm 1 will have six 1s whereas firm 2 will get a . when month=3 and year=2010. Next, I generate Treat2, which replaces . with 0 and keeps the 1s unchanged. Finally, I generate a variable Treat3, which counts whether a given firm has consecutively taken an action at least three times within a year.
Here is what the original data looks like:
Firm_ID1 | Sector | Year | Month |
---|---|---|---|
1 | 1 | 2010 | 1 |
1 | 1 | 2010 | 2 |
1 | 1 | 2010 | 3 |
1 | 1 | 2011 | 1 |
1 | 1 | 2011 | 2 |
1 | 1 | 2011 | 3 |
2 | 1 | 2010 | 1 |
2 | 1 | 2010 | 2 |
2 | 1 | 2011 | 1 |
2 | 1 | 2011 | 2 |
3 | 1 | 2010 | 1 |
4 | 2 | 2011 | 1 |
4 | 2 | 2011 | 2 |
5 | 2 | 2011 | 2 |
5 | 2 | 2011 | 3 |
Here is what I am looking for
Firm_ID1 | Sector | Year | Month | Treat | Treat1 | Treat2 |
---|---|---|---|---|---|---|
1 | 1 | 2010 | 1 | 1 | 1 | 1 |
1 | 1 | 2010 | 2 | 1 | 1 | 1 |
1 | 1 | 2010 | 3 | 1 | 1 | 1 |
1 | 1 | 2011 | 1 | 1 | 1 | 1 |
1 | 1 | 2011 | 2 | 1 | 1 | 1 |
1 | 1 | 2011 | 3 | 1 | 1 | 1 |
2 | 1 | 2010 | 1 | 1 | 1 | 1 |
2 | 1 | 2010 | 2 | 1 | 1 | 1 |
2 | 1 | 2010 | 3 | . | 0 | 0 |
2 | 1 | 2011 | 1 | 1 | 1 | 0 |
2 | 1 | 2011 | 2 | 1 | 1 | 0 |
2 | 1 | 2011 | 3 | . | 0 | 0 |
3 | 1 | 2010 | 1 | 1 | 1 | 0 |
3 | 1 | 2010 | 2 | . | 0 | 0 |
3 | 1 | 2010 | 3 | . | 0 | 0 |
3 | 1 | 2011 | 1 | . | 0 | 0 |
3 | 1 | 2011 | 2 | . | 0 | 0 |
3 | 1 | 2011 | 3 | . | 0 | 0 |
4 | 2 | 2010 | 1 | . | 0 | 0 |
4 | 2 | 2010 | 2 | . | 0 | 0 |
4 | 2 | 2010 | 3 | . | 0 | 0 |
4 | 2 | 2011 | 1 | 1 | 0 | 0 |
4 | 2 | 2011 | 2 | 1 | 0 | 0 |
4 | 2 | 2011 | 3 | . | 0 | 0 |
5 | 2 | 2010 | 1 | . | 0 | 0 |
5 | 2 | 2010 | 2 | . | 0 | 0 |
5 | 2 | 2010 | 3 | . | 0 | 0 |
5 | 2 | 2011 | 1 | . | 0 | 0 |
5 | 2 | 2011 | 2 | 1 | 1 | 0 |
5 | 2 | 2011 | 3 | 1 | 1 | 0 |
This what I try without success:
dfdata1 <- dfdata %>%
group_by(Firm_ID1,Year)%>%
mutate(isconsecutive =duplicated(rbind(dfdata, transform(dfdata, Month=Month 1), transform(dfdata, Month= Month-1)), fromLast = TRUE)[1:nrow(dfdata)]
)
CodePudding user response:
You could use tidyr
's complete
function:
library(dplyr)
library(tidyr)
data %>%
complete(Firm_ID1, Year, Month) %>%
mutate(Treat2 = !is.na(Sector)) %>%
group_by(Firm_ID1, Year) %>%
mutate(Treat3 = all(
!is.na(Sector),
!is.na(lag(Sector, n = 1, default = TRUE)),
!is.na(lag(Sector, n = 2, default = TRUE))
)
) %>%
ungroup()
This returns
# A tibble: 30 x 6
Firm_ID1 Year Month Sector Treat2 Treat3
<dbl> <dbl> <dbl> <dbl> <int> <int>
1 1 2010 1 1 1 1
2 1 2010 2 1 1 1
3 1 2010 3 1 1 1
4 1 2011 1 1 1 1
5 1 2011 2 1 1 1
6 1 2011 3 1 1 1
7 2 2010 1 1 1 0
8 2 2010 2 1 1 0
9 2 2010 3 NA 0 0
10 2 2011 1 1 1 0
11 2 2011 2 1 1 0
12 2 2011 3 NA 0 0
13 3 2010 1 1 1 0
14 3 2010 2 NA 0 0
15 3 2010 3 NA 0 0
16 3 2011 1 NA 0 0
17 3 2011 2 NA 0 0
18 3 2011 3 NA 0 0
19 4 2010 1 NA 0 0
20 4 2010 2 NA 0 0
21 4 2010 3 NA 0 0
22 4 2011 1 2 1 0
23 4 2011 2 2 1 0
24 4 2011 3 NA 0 0
25 5 2010 1 NA 0 0
26 5 2010 2 NA 0 0
27 5 2010 3 NA 0 0
28 5 2011 1 NA 0 0
29 5 2011 2 2 1 0
30 5 2011 3 2 1 0
I skipped the .
part for Treat
one because it breaks the integer type. It's easy to create if neccessary.
CodePudding user response:
here is a different data.table
approach... not sure if this is what you are looking for, but it's how I would approach things
library(data.table)
# cast to wide and calculate length
new_DT <- dcast(DT, Firm_ID1 Year ~ Month, value.var = "Firm_ID1",
fun.aggregate = length, drop = FALSE)
# calculate year actions
new_DT[, Treat2 := ifelse(rowSums(.SD) >= 3, 1, 0), .SDcols = patterns("^[0-9] $")][]
# Firm_ID1 Year 1 2 3 Treat2
# 1: 1 2010 1 1 1 1
# 2: 1 2011 1 1 1 1
# 3: 2 2010 1 1 0 0
# 4: 2 2011 1 1 0 0
# 5: 3 2010 1 0 0 0
# 6: 3 2011 0 0 0 0
# 7: 4 2010 0 0 0 0
# 8: 4 2011 1 1 0 0
# 9: 5 2010 0 0 0 0
#10: 5 2011 0 1 1 0
sample data
DT <- fread("Firm_ID1 Sector Year Month
1 1 2010 1
1 1 2010 2
1 1 2010 3
1 1 2011 1
1 1 2011 2
1 1 2011 3
2 1 2010 1
2 1 2010 2
2 1 2011 1
2 1 2011 2
3 1 2010 1
4 2 2011 1
4 2 2011 2
5 2 2011 2
5 2 2011 3")