Home > OS >  Exclude items that can have two different values
Exclude items that can have two different values

Time:11-11

I have a table containing the history of products bought and sold. I would like to extract the items that were only bought (never sold). The bought/sale information is contained in the same column. I found a solution but i would like to know if it was possible to do it in one step

s: sell b: bought

library(dplyr)

df <- data.frame("Date"    = paste0("Date", 1:5), 
                 "Product" = c("A","B","B","C","D"),
                 "Action"  = c("b", "b", "s", "b", "b"))

# df1 <- df %>% 
#   filter(Action == "b") %>%
#   select(Product)
# 
# df2 <- df %>% 
#   filter(Action == "s") %>%
#   select(Product)
# 
# anti_join(df1, df2)

CodePudding user response:

You may try

df %>%
  group_by(Product) %>%
  filter(!("s" %in% Action))


  Date  Product Action
  <chr> <chr>   <chr> 
1 Date1 A       b     
2 Date4 C       b     
3 Date5 D       b   

CodePudding user response:

We can also do this in one line in data.table, by checking the number of unique actions per product and filtering down to those that are only ever bought:

library(data.table)
df <- setDT(df)
df <- df[,unique_actions:=uniqueN(Action), by = Product][Action == "b" & unique_actions == 1][,.(Date, Product,Action)]

> df
    Date Product Action
1: Date1       A      b
2: Date4       C      b
3: Date5       D      b

CodePudding user response:

You may remove all the Product that have 'sell' Action in them.

In base R -

subset(df, !Product %in% Product[Action == 's'])

#   Date Product Action
#1 Date1       A      b
#2 Date4       C      b
#3 Date5       D      b

If you prefer dplyr the same can be written as -

library(dplyr)
df %>% filter(!Product %in% Product[Action == 's'])
  • Related