the below code creates two datasets
df1 <-
read.table(textConnection("ID Code Date1 Date2
1 I611 01/01/2021 03/01/2021
2 L111 04/01/2021 09/01/2021
3 L111 01/01/2021 03/01/2021
4 Z538 08/01/2021 11/01/2021
5 I613 08/08/2021 09/09/2021
"), header=TRUE)
df2 <-
read.table(textConnection("ID State
1 Washington
49 California
1 Washington
40 Texas
1 Texas
2 Texas
2 Washington
50 Minnesota
60 Washington"), header=TRUE)
What I am looking to achieve is to search the second dataset for ID values that occur at least once in the first dataset 'ID' column and then group matches by the 'State' column in the second dataset. So the output would be:
State | IDs |
---|---|
Washington | 3 |
Texas | 2 |
Any assistance will be very much appreciated - thank you.
CodePudding user response:
Filter df2 based on df1, then aggregate:
aggregate(ID ~ State, data = df2[ df2$ID %in% df1$ID, ], length)
# State ID
# 1 Texas 2
# 2 Washington 3
CodePudding user response:
dplyr::semi_join()
is useful for filtering one table based on values in another. You can then count()
occurrences of each state in your filtered table.
library(dplyr)
semi_join(df2, df1) %>%
count(State, name = "IDs")
Joining, by = "ID"
State IDs
1 Texas 2
2 Washington 3