Is there a simpler way to get object from groupby and putting in dictionary?-CodePudding

So my dataframe looks like this: I am trying to find a simpler way to get the objects from a Groupby and then putting it in a dictionary. I had to obtain the index and then do a for loop to get the exact string of each row in Product.

More details if needed: My goal was to find duplicate Order ID and then take the products from the column and add to dictionary with:

key = Product
value = no of times the Product is found to be ordered together

(I am not looking for ways to optimize finding duplicates, i know I could use df.duplicated )

Code:

for date, df in df1.groupby('Order Date'):
    
    if  df.Product.count() > 1:

        indice = df.Product.index
        for data in indice:
            product = df.loc[data].at['Product']
            #update dictionary counter
            product_dict[product] = product_dict.get(product)   1
      
    else:
        continue

for ease you can use this df instead. I listed as dictionary:

{'Order ID': ['147268', '148041', '149343', '149964', '149350', '141732', '149620', '142451', '146039', '143498', '141316', '144804', '144804', '145270', '142789'],
 'Product': ['Wired Headphones', 'USB-C Charging Cable', 'Apple Airpods Headphones', 'AAA Batteries (4-pack)', 'USB-C Charging Cable', 'iPhone', 'Lightning Charging Cable', 'AAA Batteries (4-pack)', '34in Ultrawide Monitor', 'AA Batteries (4-pack)', 'AAA Batteries (4-pack)', 'Wired Headphones', 'iPhone', 'Google Phone', 'AAA Batteries (4-pack)']}

CodePudding user response：

Maybe I misunderstand, but this seems to do what you are trying to achieve by using a Counter:

from collections import Counter

mask = (
    df.groupby(["Order Date", "Order ID"], sort=False)["Product"]
      .transform("count")
      .gt(1)
)
product_dict = Counter(df.loc[mask, "Product"])

Result for the slightly modified sample dataframe (added an Order Date column)

   Order Date Order ID                   Product
0  2021-11-11   147268          Wired Headphones
1  2021-11-11   148041      USB-C Charging Cable
2  2021-11-11   149343  Apple Airpods Headphones
3  2021-11-11   149964    AAA Batteries (4-pack)
4  2021-11-11   149350      USB-C Charging Cable
5  2021-11-12   141732                    iPhone
6  2021-11-12   149620  Lightning Charging Cable
7  2021-11-12   142451    AAA Batteries (4-pack)
8  2021-11-12   146039    34in Ultrawide Monitor
9  2021-11-12   143498     AA Batteries (4-pack)
10 2021-11-12   141316    AAA Batteries (4-pack)
11 2021-11-12   144804          Wired Headphones
12 2021-11-12   144804                    iPhone
13 2021-11-12   145270              Google Phone
14 2021-11-12   142789    AAA Batteries (4-pack)

Counter({'Wired Headphones': 1, 'iPhone': 1})

Maybe it's enough to groupby over Order ID, but since you are grouping over Order Date I suspect it isn't.

CodePudding user response：

If I understood it correctly, you want to count the value pairs ('Order ID', 'Product'). In that case, you can do

res = df.groupby('Order ID')['Product'].value_counts()

Or, if you want to select only the duplicated products in a given order (counts > 1), and convert to a dictionary with the format {(Order ID, Product): <Product count with the same Order ID>}

res = df.groupby('Order ID')['Product'].value_counts().loc[lambda counts: counts > 1].to_dict()