I would like to create a dictionary using the values in the pandas' data frame multiple columns as tuple keys and single-column values as the value(s). And where there are no values for a particular tuple pair, I would like to assign a generic value of say 99999. This latter part is proving to be a challenge and I wish to seek help from this forum on how to achieve this task. Thank you.
Sample extract data:
Periods(Days) | Factory | Warehouse | Sales Outlets | Products | Dist Fact-Whse | |
---|---|---|---|---|---|---|
0 | 1 | Berlin | Teltow | Magdeburg | Maracuja | 19.6 |
1 | 2 | Hamburg | Wismar | Lubeck | Himbeer | 126.2 |
2 | 3 | Berlin | Kleinmachnow | Halle | Malaga | 26.9 |
3 | 4 | Hamburg | Wismar | Lubeck | Waldmeister | 126.2 |
4 | 5 | Berlin | Kleinmachnow | Leipzig | Walnuss | 26.9 |
Based on the above data set, the following piece of code is how I am creating my dictionary object(s) from the data frame:
F = df.Factory.drop_duplicates().to_list()
W = df.Warehouse.drop_duplicates().to_list()
dist1 = {};
for i in df.index:
key = (df.at[i, 'Factory'], df.at[i, 'Warehouse'])
value = df.at[i, 'Dist Fact-Whse']
dicT = {key : value}
dist1.update(dicT)
for f in F:
for w in W:
if (f, w) not in dist1:
dist1[(f, w)] = 9999
I get my desired or expected outcome: {('Berlin', 'Teltow'): 19.6, ('Hamburg', 'Wismar'): 126.2, ('Berlin', 'Kleinmachnow'): 26.9, ('Berlin', 'Wismar'): 9999, ('Hamburg', 'Teltow'): 9999, ('Hamburg', 'Kleinmachnow'): 9999},
but this is too elaborous, time consuming, and not efficient as I have a of other parameter similar to "dist1" to create in my entire code.
I kindly welcome a more elegant and smart solution to this issue.
CodePudding user response:
Here is a proposition with pandas.DataFrame.groupby
and itertools.product
from itertools.
d = df.groupby(["Factory","Warehouse"])["Dist Fact-Whse"].first().to_dict()
combs = list(product(df["Factory"].unique(), df["Warehouse"].unique()))
dist1 = {k:v for v,k in zip([d.get(c, 99999) for c in combs], combs)}
This looks shorter (and maybe fancier) but not sure if it's faster. I'll let you give it a try yourself !
Output :
print(dist1)
{('Berlin', 'Teltow'): 19.6,
('Berlin', 'Wismar'): 99999,
('Berlin', 'Kleinmachnow'): 26.9,
('Hamburg', 'Teltow'): 99999,
('Hamburg', 'Wismar'): 126.2,
('Hamburg', 'Kleinmachnow'): 99999}