Since it is hard for me to solve another puzzle posted on this site due to computation limits, I am trying something new as a substitute, which should work. I have two large datasets, one with monthly firm data, one with monthly bank data :
Data_firm <- data.frame(
Firm = c("A", "A", "B"),
time = c("1", "3", "2"),
postal= c("20", "20", "67")
)
Data_bank <- data.frame(
Bank = c("AB", "AC","BD"),
Postal_bank = c("20", "67","20"),
entry = c("1","1","2"),
exit = c("6","5","7"))
I would need to add a column with the number of banks present in the same department as the firm at each time, accounting for the fact that some banks enter and some do exit (entry, exit variables in "Databank"). In theory, I should have the fllowing column in the example above :
nbbankindepartment = c(1,2,1)
I have tried the following
for (row in 1:nrow(Data_firm){
dep <- Data_firm[row, "postal"]
time <- Data_firm[row, "time"]
count <- sum(Data_bank$Postal_bank == dep & Data_bank$entry <= time & Data_bank$exit > time, na.rm = TRUE)
Data_firm[row,"nbbankindepartment"]<-count
}
But I only get 0s as a result. Does anybody have a solution ? What I am not doing right ?
Thanks in advance,
CodePudding user response:
Here is my best guess for your problem:
library(dplyr)
Data_firm %>%
left_join(Data_bank, by = c("postal" = "Postal_bank")) %>%
filter(entry <= time, exit > time) %>%
group_by(Firm, time, postal) %>%
summarise(nb_bank_in_department = n(), .groups = "drop")
This returns
# A tibble: 3 x 4
Firm time postal nb_bank_in_department
<chr> <chr> <chr> <int>
1 A 1 20 1
2 A 3 20 2
3 B 2 67 1