Search second data set for values that occur in list from first data set-CodePudding

the below code creates two datasets

df1 <- 
  read.table(textConnection("ID   Code   Date1 Date2
1 I611 01/01/2021 03/01/2021
2 L111 04/01/2021 09/01/2021
3 L111 01/01/2021 03/01/2021
4 Z538 08/01/2021 11/01/2021
5 I613 08/08/2021 09/09/2021
"), header=TRUE)

df2 <- 
  read.table(textConnection("ID   State   
  1 Washington
  49 California
  1 Washington
  40 Texas
  1 Texas
  2 Texas
  2 Washington
  50 Minnesota
  60 Washington"), header=TRUE)

What I am looking to achieve is to search the second dataset for ID values that occur at least once in the first dataset 'ID' column and then group matches by the 'State' column in the second dataset. So the output would be:

State	IDs
Washington	3
Texas	2

Any assistance will be very much appreciated - thank you.

CodePudding user response：

Filter df2 based on df1, then aggregate:

aggregate(ID ~ State, data = df2[ df2$ID %in% df1$ID, ], length)
#        State ID
# 1      Texas  2
# 2 Washington  3

CodePudding user response：

dplyr::semi_join() is useful for filtering one table based on values in another. You can then count() occurrences of each state in your filtered table.

library(dplyr)

semi_join(df2, df1) %>%
  count(State, name = "IDs")

Joining, by = "ID"

       State IDs
1      Texas   2
2 Washington   3