Home > Enterprise >  Get values in one column based on an AND condition in another column in python
Get values in one column based on an AND condition in another column in python

Time:05-08

I have the below dataframe in python with column names "Order_number" and "item_id" which represents the order number and the items consisting in that order. I need to fetch order numbers that consist of item_ids A AND B so my result should be Order Number 12345 and 84573

Order_number item_id
12345 A
12345 B
34235 B
43543 B
84573 A
84573 B
45846 A

I have the below code but I am getting an error:

df[df['item_id'] == 'A' & df['item_id'] == 'B'] 

TypeError: Cannot perform 'rand_' with a dtyped [object] array and scalar of type [bool]

CodePudding user response:

You can groupby.apply a lambda that checks if the unique "item_id"s include both "A" and "B" for each "Order_number"; then filter the ones that do:

out = df.groupby('Order_number')['item_id'].apply(lambda x: {'A','B'}.issubset(x.unique().tolist())).pipe(lambda x: x.index[x]).tolist()

Another option is to use groupby.any twice; once for "A" and again for "B" to create boolean Series objects that return True if an item_id exists for an Order_number; then since we want both to exist, we use & and filter the "Order_number"s:

out = (df['item_id'].eq('A').groupby(df['Order_number']).any() & df['item_id'].eq('B').groupby(df['Order_number']).any()).pipe(lambda x: x.index[x].tolist())

Output:

[12345, 84573]

CodePudding user response:

I would solve as follows: Group by order number and create a set. Filter set that equal to what we want.


(
  df
   .groupby("Order_number")
   .agg(set)
   .loc[lambda d: d['item_id'].eq({'A','B'})]
   .index
   .values
)

# results 
# array([12345, 84573])

If it’s subset we want, assuming that there can be more than A and B, change the filter .loc to:

…
.loc[lambda d:d['item_id'].map({'A','B'}.issubset)]
…

# results 
# array([12345, 84573])

Oh, and if you want a list instead of array, change .values to tolist().

  • Related