I have a df (reference image) that I create that shows an aggregation of all the combinations each publisher has to another and then does calculations based on said pair(s).
I want to pull every distinct pair that only contains 2 publishers and all the other field values that are tied to that pair (example would be Amazon, CBS but twice since there is one for month 10 and one for month 11 and so on.
How do I extract this or apply some dplyr function to only pull those? Was thinking of using a regex function in with a pipe but not sure how to do it.
Expected output
Publisher | month_grp
Amazon, CBS 10
Amazon, CBS 11
Amazon, CW 10
Amazon, CW 11
Amazon, ESPN 10
Amazon, ESPN 11
CodePudding user response:
I think that you want the rows with just one comma in the publishers column. You can get these using
df[grep('^[^,] ,[^,] $', df$publishers),]
CodePudding user response:
Similar answer to one posted seconds ago, but with dplyr::filter
and stringr::str_detect
:
library(tidyverse)
tribble(~ Publisher, ~month_grp,
"AWS, CBS", 4,
"AWS, CBS, CW", 2,
"AWS, ESPN", 6,
"AWS, CBS, ESPN", 2,
"AWS, Samsung TV plus", 4,
"AWS, ESPN, CBS", 4
) |>
filter(str_detect(Publisher, "^[\\w ]*, [\\w ]*$"))
#> # A tibble: 3 × 2
#> Publisher month_grp
#> <chr> <dbl>
#> 1 AWS, CBS 4
#> 2 AWS, ESPN 6
#> 3 AWS, Samsung TV plus 4