I have a dataframe with price quotes for a variety of parts and makers. ~10k parts and 10 makers, so my dataset contains up to 100k rows, looking roughly like this:
Part | Maker | Price |
---|---|---|
1 | Alpha | 1.00 |
2 | Alpha | 1.30 |
3 | Alpha | 1.25 |
1 | Bravo | 1.10 |
2 | Bravo | 1.02 |
3 | Bravo | 1.15 |
4 | Bravo | 1.19 |
1 | Charlie | .99 |
2 | Charlie | 1.10 |
3 | Charlie | 1.12 |
4 | Charlie | 1.19 |
I am wanting to return two dictionaries based on the best price, Part/Maker and Part/Price. My main issue is when two makers have the same best price.
I want my result to end up like this:
1:.99
2:1.1
3: 1.02
4:1.19
and the second one to be:
1:Charlie
2: Charlie
3: Bravo
4: [Bravo, Charlie]
The first dictionary is easy. Second one is what I'm stuck on. Here's what I have so far:
winning_price_dict={}
winning_mfg_dict={}
for index, row in quote_df.iterrows():
if row['Part'] not in winning_price_dict:
winning_price_dict[row['Part']] = row['Proposed Quote']
winning_mfg_dict[row['Part']] = list(row['Maker'])
if winning_price_dict[row['Part']]>row['Proposed Quote']:
winning_price_dict[row['Part']] = row['Proposed Quote']
winning_mfg_dict[row['Part']] = row['Maker']
if winning_price_dict[row['Part']]==row['Proposed Quote']:
winning_price_dict[row['Part']] = row['Proposed Quote']
winning_mfg_dict[row['Part']] = winning_mfg_dict[row['Part']].append(row['Maker']) #this is the only line that I don't believe works
When I run it as is, it says 'str' object has no attribute 'append'. However, I thought that it should be a list because of the list(row['Maker']) command.
When I change the relevant lines to this:
for index, row in quote_df.iterrows():
if row['Part'] not in winning_price_dict:
winning_mfg_dict[row['Part']] = list(row['Mfg'])
if winning_price_dict[row['Part']]>row['Proposed Quote']:
winning_mfg_dict[row['Part']] = list(row[['Mfg']])
if winning_price_dict[row['Part']]==row['Proposed Quote']:
winning_mfg_dict[row['Part']] = list(winning_mfg_dict[row['Part']]).append(row['Mfg'])
The winning_mfg_dict is all the part numbers and NoneType values, not the maker names.
What do I need to change to get it to return the list of suitable makers?
Thanks!
CodePudding user response:
In your original code, the actual problem was on line 9 of the first fragment: you set vale to a string, not to a list. Also, calling list(some_string)
dos not what you expect: it creates a list of single chars, not a [some_string]
.
I took the liberty to improve the overall readability by extracting common keys to variables, and joined two branches with same bodies. Something like this should work:
winning_price_dict = {}
winning_mfg_dict = {}
for index, row in quote_df.iterrows():
# Extract variables, saving a few accesses and reducing line lengths
part = row['Part']
quote = row['Proposed Quote']
maker = row['Maker']
if part not in winning_price_dict or winning_price_dict[part] > quote:
# First time here or higher value found - reset to initial
winning_price_dict[part] = quote
winning_mfg_dict[part] = [maker]
elif winning_price_dict[part] == quote:
# Add one more item with same value
# Not updating winning_price_dict - we already know it's proper
winning_mfg_dict[part].append(maker)
CodePudding user response:
You can use groupby to get all quotes for one part
best_quotes = quote_df.groupby("part").apply(lambda df: df[df.price == df.price.min()])
Then you get a dataframe with the part number and the previous index as Multiindex. The lambda function selects only the quotes with the minimum price.
You can get the first dictionary with
winning_price_dict = {part : price for (part, _), price in best_quotes.price.iteritems()}
and the second one with
winning_mfg_dict = {part:list(best.loc[part]["maker"]) for part in best_quotes.index.get_level_values("part")}