Below is the sample data. A bit clueless on how to attack this. The goal is simply to find the acct-run combinations with naics 454 in 2021 q4 that are present in both 2021 q4 and 2022 q1. Should I do I pivot wider and make the values into columns and from there, filter out those that are NA for the 2022 q1 set?
year <- c(2021,2021,2021,2021,2021,2021,2021,2022,2022,2022,2022,2022,2022,2022)
qtr <- c(4,4,4,4,4,4,4,1,1,1,1,1,1,1)
acct <- c("001234","001245","001256","001267","001278","001314","001314","001234","001245","001256","001200","001201","001314","001314")
run <- c(1,1,1,1,1,1,2,1,1,1,1,1,1,2)
naics <- c("454","454","454","451","451","451","451","441","442","441","451","451","451","451")
employment <- c(120,130,110,500,501,502,503,105,125,109,498,499,500,506)
test <- data.frame (year,qtr,acct,run,naics,employment)
Desired result
acct run employment year qtr
001234 1 120 2021 4
001245 1 130 2021 4
001256 1 110 2021 4
001314 1 502 2021 4
001314 2 503 2021 4
001234 1 105 2022 1
001245 1 125 2022 1
001256 1 109 2022 1
001314 1 500 2021 4
001314 2 506 2021 4
CodePudding user response:
Using a grouped filter in dplyr:
library(dplyr)
test %>%
group_by(acct, run) %>%
filter(
any(year == 2021 & qtr == 4 & naics == 454),
any(year == 2022 & qtr == 1)
) %>%
ungroup()
# A tibble: 6 × 6
year qtr acct run naics employment
<dbl> <dbl> <chr> <dbl> <chr> <dbl>
1 2021 4 001234 1 454 120
2 2021 4 001245 1 454 130
3 2021 4 001256 1 454 110
4 2022 1 001234 1 441 105
5 2022 1 001245 1 442 125
6 2022 1 001256 1 441 109
CodePudding user response:
We could do
library(dplyr)
library(stringr)
test %>%
group_by(acct, run) %>%
filter(all(c('20214454', '20221') %in% str_c(year, qtr,
replace(naics, naics != 454, '')))) %>%
ungroup
-output
# A tibble: 6 × 6
year qtr acct run naics employment
<dbl> <dbl> <chr> <dbl> <chr> <dbl>
1 2021 4 001234 1 454 120
2 2021 4 001245 1 454 130
3 2021 4 001256 1 454 110
4 2022 1 001234 1 441 105
5 2022 1 001245 1 442 125
6 2022 1 001256 1 441 109