Home > Enterprise >  Python how to create a dictionary using the values in multiple pandas dataframe columns as tuple key
Python how to create a dictionary using the values in multiple pandas dataframe columns as tuple key

Time:01-27

I would like to create a dictionary using the values in the pandas' data frame multiple columns as tuple keys and single-column values as the value(s). And where there are no values for a particular tuple pair, I would like to assign a generic value of say 99999. This latter part is proving to be a challenge and I wish to seek help from this forum on how to achieve this task. Thank you.

Sample extract data:

Periods(Days) Factory Warehouse Sales Outlets Products Dist Fact-Whse
0 1 Berlin Teltow Magdeburg Maracuja 19.6
1 2 Hamburg Wismar Lubeck Himbeer 126.2
2 3 Berlin Kleinmachnow Halle Malaga 26.9
3 4 Hamburg Wismar Lubeck Waldmeister 126.2
4 5 Berlin Kleinmachnow Leipzig Walnuss 26.9

Based on the above data set, the following piece of code is how I am creating my dictionary object(s) from the data frame:

F = df.Factory.drop_duplicates().to_list()
W = df.Warehouse.drop_duplicates().to_list()

dist1 = {};

for i in df.index:
    key = (df.at[i, 'Factory'], df.at[i, 'Warehouse'])
    value = df.at[i, 'Dist Fact-Whse']
    dicT = {key : value}
    dist1.update(dicT)

for f in F:
    for w in W:
        if (f, w) not in dist1:
            dist1[(f, w)] = 9999

I get my desired or expected outcome: {('Berlin', 'Teltow'): 19.6, ('Hamburg', 'Wismar'): 126.2, ('Berlin', 'Kleinmachnow'): 26.9, ('Berlin', 'Wismar'): 9999, ('Hamburg', 'Teltow'): 9999, ('Hamburg', 'Kleinmachnow'): 9999},

but this is too elaborous, time consuming, and not efficient as I have a of other parameter similar to "dist1" to create in my entire code.

I kindly welcome a more elegant and smart solution to this issue.

CodePudding user response:

Here is a proposition with pandas.DataFrame.groupby and itertools.product from .

d = df.groupby(["Factory","Warehouse"])["Dist Fact-Whse"].first().to_dict()
combs = list(product(df["Factory"].unique(), df["Warehouse"].unique()))
dist1 = {k:v for v,k in zip([d.get(c, 99999) for c in combs], combs)}

This looks shorter (and maybe fancier) but not sure if it's faster. I'll let you give it a try yourself !

Output :

print(dist1)

{('Berlin', 'Teltow'): 19.6,
 ('Berlin', 'Wismar'): 99999,
 ('Berlin', 'Kleinmachnow'): 26.9,
 ('Hamburg', 'Teltow'): 99999,
 ('Hamburg', 'Wismar'): 126.2,
 ('Hamburg', 'Kleinmachnow'): 99999}
  • Related