Home > database >  Filter all product id of customers' first order
Filter all product id of customers' first order

Time:10-06

I have a dataset in which I have customer id's, order id's and product id's, like so

customerID orderID Product ID
123 1 A
123 1 B
456 2 C
123 3 C
789 4 D
789 4 D
789 4 E
123 5 C
789 6 D

I'm looking for a way to only keep the product id rows for every customers first order, like so:

customerID orderID Product ID
123 1 A
123 1 B
456 2 C
789 4 D
789 4 D
789 4 E

I have tried

first_order <-
  df %>% 
  group_by(customerid) %>% 
  filter(row_number()==1)

However, this gives me only the first row of a customerid, meaning it only returns one product id, where it can be possible for a order to include more than one product id.

I guess I should make changes to the filter(row_number()==1) statement, but I'm not sure how.

Any help is much appreciated! Thanks in advance

CodePudding user response:

Use first:

df %>% 
  group_by(customerID) %>% 
  filter(orderID == first(orderID))

In this case, using slice_min also works, assuming that the first order has the lowest number.

df %>% 
  group_by(customerID) %>% 
  slice_min(orderID)

CodePudding user response:

In base R,

subset(df1, orderID %in% orderID[!duplicated(customerID)])

-output

   customerID orderID ProductID
1        123       1         A
2        123       1         B
3        456       2         C
5        789       4         D
6        789       4         D
7        789       4         E
  • Related