So my dataframe looks like this: I am trying to find a simpler way to get the objects from a Groupby and then putting it in a dictionary. I had to obtain the index and then do a for loop to get the exact string of each row in Product.
More details if needed: My goal was to find duplicate Order ID and then take the products from the column and add to dictionary with:
key = Product
value = no of times the Product is found to be ordered together
(I am not looking for ways to optimize finding duplicates, i know I could use df.duplicated
)
Code:
for date, df in df1.groupby('Order Date'):
if df.Product.count() > 1:
indice = df.Product.index
for data in indice:
product = df.loc[data].at['Product']
#update dictionary counter
product_dict[product] = product_dict.get(product) 1
else:
continue
for ease you can use this df instead. I listed as dictionary:
{'Order ID': ['147268', '148041', '149343', '149964', '149350', '141732', '149620', '142451', '146039', '143498', '141316', '144804', '144804', '145270', '142789'],
'Product': ['Wired Headphones', 'USB-C Charging Cable', 'Apple Airpods Headphones', 'AAA Batteries (4-pack)', 'USB-C Charging Cable', 'iPhone', 'Lightning Charging Cable', 'AAA Batteries (4-pack)', '34in Ultrawide Monitor', 'AA Batteries (4-pack)', 'AAA Batteries (4-pack)', 'Wired Headphones', 'iPhone', 'Google Phone', 'AAA Batteries (4-pack)']}
CodePudding user response:
Maybe I misunderstand, but this seems to do what you are trying to achieve by using a Counter
:
from collections import Counter
mask = (
df.groupby(["Order Date", "Order ID"], sort=False)["Product"]
.transform("count")
.gt(1)
)
product_dict = Counter(df.loc[mask, "Product"])
Result for the slightly modified sample dataframe (added an Order Date
column)
Order Date Order ID Product
0 2021-11-11 147268 Wired Headphones
1 2021-11-11 148041 USB-C Charging Cable
2 2021-11-11 149343 Apple Airpods Headphones
3 2021-11-11 149964 AAA Batteries (4-pack)
4 2021-11-11 149350 USB-C Charging Cable
5 2021-11-12 141732 iPhone
6 2021-11-12 149620 Lightning Charging Cable
7 2021-11-12 142451 AAA Batteries (4-pack)
8 2021-11-12 146039 34in Ultrawide Monitor
9 2021-11-12 143498 AA Batteries (4-pack)
10 2021-11-12 141316 AAA Batteries (4-pack)
11 2021-11-12 144804 Wired Headphones
12 2021-11-12 144804 iPhone
13 2021-11-12 145270 Google Phone
14 2021-11-12 142789 AAA Batteries (4-pack)
is
Counter({'Wired Headphones': 1, 'iPhone': 1})
Maybe it's enough to groupby
over Order ID
, but since you are grouping over Order Date
I suspect it isn't.
CodePudding user response:
If I understood it correctly, you want to count the value pairs ('Order ID', 'Product'). In that case, you can do
res = df.groupby('Order ID')['Product'].value_counts()
Or, if you want to select only the duplicated products in a given order (counts > 1
), and convert to a dictionary with the format {(Order ID, Product): <Product count with the same Order ID>}
res = df.groupby('Order ID')['Product'].value_counts().loc[lambda counts: counts > 1].to_dict()