Home > Enterprise >  Python - How to filter dataframe to keep Order number with only one product?
Python - How to filter dataframe to keep Order number with only one product?

Time:10-10

I have dataframe that contains Several Columns: Customerid, OrderNumber, PartNumber,Description,Revenue. Something like this:

    CustomerId  OrderNumber PartNumber  Description Revenue
0   6512345 1   ABC1    KitKat  2
1   6512345 1   ABC2    Chips   3
2   6512345 1   ABC3    Coke    2
3   4213124 2   GFA1    Sprite  2
4   2132142 3   FGA1    Beer    3
5   3242342 4   DCA1    Mentos  4
6   3242342 4   DCA2    Fanta   2
7   8768655 5   ABC1    KitKat  2

What I need is to filter this dataframe to contain only orders where the number of item for that order is one. So in this case those would be order with OrderNumber 2, 3 and 5 since these are the one with single transaction.

So since I have few rows of data and i know that here i need to exclude OrderNumbers 1 and 4 i can use this function:

sample = pd.read_excel('../data/Sample.xlsx')
sample = sample[(sample.OrderNumber != 1) & (sample.OrderNumber != 4)]
sample

Which gives me the output like this:

    CustomerId  OrderNumber PartNumber  Description Revenue
3   4213124 2   GFA1    Sprite  2
4   2132142 3   FGA1    Beer    3
7   8768655 5   ABC1    KitKat  2

Now i have over 100k rows and i need to be able to create funciton that will for each order check how many items is related to that order and then exclude the orders that have more than one item.

How would i do that?

CodePudding user response:

One quick way is to use groupby.filter:

df.groupby('OrderNumber').filter(lambda g: len(g) == 1)

   CustomerId  OrderNumber PartNumber Description  Revenue
3     4213124            2       GFA1      Sprite        2
4     2132142            3       FGA1        Beer        3
7     8768655            5       ABC1      KitKat        2

Or filter by transform('size'):

df[df.CustomerId.groupby(df.OrderNumber).transform('size') == 1]

   CustomerId  OrderNumber PartNumber Description  Revenue
3     4213124            2       GFA1      Sprite        2
4     2132142            3       FGA1        Beer        3
7     8768655            5       ABC1      KitKat        2
  • Related