Home > other >  Filtering rows with text values in one data frame containing value from another data frame
Filtering rows with text values in one data frame containing value from another data frame

Time:01-26

I have two unequal lengths of data frame.

The first df is 100K rows containing Hotel Names with cities or without cities, example-

structure(list(Hotel Name = c("ancdr Wyndam NY vbhejn", "rifhwe Wynham SFO fgrnhie", "efuaschiw Marriott DC fgyweuinh", "hfeiwefj Marriott elsn"), Col2 = c("x", "x", "x", "x"), Col3 = c("x", "x", "x", "x" ), Col 4 = c("x", "x", "x", "x")), row.names = c(NA, -4L ), class = c("tbl_df", "tbl", "data.frame"))

The second df is ~100 rows containing city names, example -

Cities

NY

SFO

DC

My desired output should only be those rows from first data frame which have city names from second data frame:

Hilton NY
Marriott NY
Wyndham NY

I tried using %in% but it's returning an error saying unequal length of rows

CodePudding user response:

library(dplyr)
df <- tibble(hotels = c("Hilton", "Hilton", "Hilton", "Hilton", "Hilton", "Hilton"), cities1 = c("Washington", "NY", "San Francisco", "Warsaw", NA, "Wrocław"))
df2 <- tibble(cities2 = c('NY', 'Warsaw', 'Wrocław'))

df |>
  filter(cities1 %in% df2$cities2)
#> # A tibble: 3 × 2
#>   hotels cities1
#>   <chr>  <chr>  
#> 1 Hilton NY     
#> 2 Hilton Warsaw 
#> 3 Hilton Wrocław

Created on 2022-01-25 by the reprex package (v2.0.1)

CodePudding user response:

To just filter df1 based on membership of df2$city in df1$Hotel_Name, you can use stringr::str_detect(). If you want to provide multiple options for the pattern to match you can separate them with |. Therefore I provided paste0(df2$city, collapse = "|") as the pattern to match. You can run just that line of code on it's own to see what it looks like.

library(tidyverse)

df1 <- data.frame(
    Hotel_Name = c(
      "ancdr Wyndam NY vbhejn",
      "rifhwe Wynham SFO fgrnhie",
      "efuaschiw Marriott DC fgyweuinh",
      "hfeiwefj Marriott elsn"
    ),
    Col2 = c("x", "x", "x", "x"),
    Col3 = c("x", "x", "x", "x"),
    Col4 = c("x", "x", "x", "x")
)

df2 <- data.frame(city = c("NY", "SFO", "DC"))

df1 %>% 
  filter(str_detect(Hotel_Name, paste0(df2$city, collapse = "|")))
#>                        Hotel_Name Col2 Col3 Col4
#> 1          ancdr Wyndam NY vbhejn    x    x    x
#> 2       rifhwe Wynham SFO fgrnhie    x    x    x
#> 3 efuaschiw Marriott DC fgyweuinh    x    x    x

Created on 2022-01-25 by the reprex package (v2.0.1)

  •  Tags:  
  • Related