Home > Blockchain >  filter data by values (common values but different data entry) stored in another dataframe
filter data by values (common values but different data entry) stored in another dataframe

Time:11-15

Based on the data below how can I filter data by values stored in another dataframe object?

Sample data:

# Data to be filtered
Dest_FIPS = c(1,2,3,4)
Dest_county = c("West Palm Beach County","Brevard County","Bay County","Miami-Dade County")
Dest_State = c("FL", "FL", "FL", "FL")
OutFlow = c(111, 222, 333, 444)
Orig_county = c("Broward County", "Broward County", "Broward County", "Broward County")
Orig_FIPS = c(5,5,5,5)
Orig_State = c("FL", "FL", "FL", "FL") 

df = data.frame(Dest_FIPS, Dest_county, Dest_State, OutFlow, Orig_county, Orig_FIPS, Orig_State)

# rows to be filtered in column Dest_county based on the values in val_df 
COUNTY_NAM = c("WEST PALM BEACH","BAY","MIAMI-DADE") #(values are actually stored in a CSV, so will be imported as a dataframe)
val_df = data.frame(COUNTY_NAM) # will use val_df to filter df

Desired output:

Dest_FIPS Dest_county               OutFlow Orig_county
1         West Palm Beach County    111     Broward County
3         Bay County                333     Broward County
4         Miami-Dade County         444     Broward County

CodePudding user response:

Transform df$Dest_county to match the format in val_df, then check which values are %in% val_df$COUNTY_NAM.

Base R:

df[toupper(gsub(" County", "", df$Dest_county)) %in% val_df$COUNTY_NAM,]

tidyverse:

library(dplyr)

filter(df, str_to_upper(str_remove(Dest_county, " County")) %in% val_df$COUNTY_NAM)

Output for both:

    Dest_FIPS            Dest_county Dest_State OutFlow    Orig_county Orig_FIPS Orig_State
1           1 West Palm Beach County         FL     111 Broward County         5         FL
2           3             Bay County         FL     333 Broward County         5         FL
3           4      Miami-Dade County         FL     444 Broward County         5         FL
  • Related