Home > Software engineering >  Subsets of a data frame where certain columns satisfy a condition
Subsets of a data frame where certain columns satisfy a condition

Time:11-09

I have a dataset catalog with 3 columns: product id, brand name and product class.

import pandas as pd
catalog = {'product_id': [1, 2, 3, 1, 2, 4, 3, 5, 6],
        'brand_name': ['FW', 'GW', 'FK','FW','GW','WU','FK','MU', 'AS'],
        'product_class': ['ACCESSORIES', 'DRINK', 'FOOD', 'ACCESSORIES', 'DRINK', 'FURNITURE','FOOD', 'ELECTRONICS', 'APPAREL']}

df = pd.DataFrame(data=catalog)

Assume I have a list of product id prod = [1,3,4]. Now, with Python, I want to list all the brand names corresponding to this list prod based on the product_id. How can I do this using only groupby() and get_group() functions? I can do this using pd.DataFrame() combined with the zip() function, but it is too inefficient, as I would need to obtain each column individually.

Expected output (in dataframe)

Product_id      Brand_name
1               'FW'
3               'FK'
4               'WU'                

Can anyone give some help on this?

CodePudding user response:

You can use pandas functions isin() and drop_duplicates() to achieve this:

prod = [1,3,4]
print(df[df.product_id.isin(prod)][["product_id", "brand_name"]].drop_duplicates())

Output:

   product_id brand_name
0           1         FW
2           3         FK
5           4         WU
  • Related