I have two unequal lengths of data frame.
The first df is 100K rows containing Hotel Names with cities or without cities, example-
structure(list(Hotel Name
= c("ancdr Wyndam NY vbhejn", "rifhwe Wynham SFO fgrnhie",
"efuaschiw Marriott DC fgyweuinh", "hfeiwefj Marriott elsn"),
Col2 = c("x", "x", "x", "x"), Col3 = c("x", "x", "x", "x"
), Col 4
= c("x", "x", "x", "x")), row.names = c(NA, -4L
), class = c("tbl_df", "tbl", "data.frame"))
The second df is ~100 rows containing city names, example -
Cities
NY
SFO
DC
My desired output should only be those rows from first data frame which have city names from second data frame:
Hilton NY
Marriott NY
Wyndham NY
I tried using %in% but it's returning an error saying unequal length of rows
CodePudding user response:
library(dplyr)
df <- tibble(hotels = c("Hilton", "Hilton", "Hilton", "Hilton", "Hilton", "Hilton"), cities1 = c("Washington", "NY", "San Francisco", "Warsaw", NA, "Wrocław"))
df2 <- tibble(cities2 = c('NY', 'Warsaw', 'Wrocław'))
df |>
filter(cities1 %in% df2$cities2)
#> # A tibble: 3 × 2
#> hotels cities1
#> <chr> <chr>
#> 1 Hilton NY
#> 2 Hilton Warsaw
#> 3 Hilton Wrocław
Created on 2022-01-25 by the reprex package (v2.0.1)
CodePudding user response:
To just filter df1
based on membership of df2$city
in df1$Hotel_Name
, you can use stringr::str_detect()
. If you want to provide multiple options for the pattern to match you can separate them with |
. Therefore I provided paste0(df2$city, collapse = "|")
as the pattern to match. You can run just that line of code on it's own to see what it looks like.
library(tidyverse)
df1 <- data.frame(
Hotel_Name = c(
"ancdr Wyndam NY vbhejn",
"rifhwe Wynham SFO fgrnhie",
"efuaschiw Marriott DC fgyweuinh",
"hfeiwefj Marriott elsn"
),
Col2 = c("x", "x", "x", "x"),
Col3 = c("x", "x", "x", "x"),
Col4 = c("x", "x", "x", "x")
)
df2 <- data.frame(city = c("NY", "SFO", "DC"))
df1 %>%
filter(str_detect(Hotel_Name, paste0(df2$city, collapse = "|")))
#> Hotel_Name Col2 Col3 Col4
#> 1 ancdr Wyndam NY vbhejn x x x
#> 2 rifhwe Wynham SFO fgrnhie x x x
#> 3 efuaschiw Marriott DC fgyweuinh x x x
Created on 2022-01-25 by the reprex package (v2.0.1)