How can i create conditional column referring present columns of a dataframe and dictionary without-CodePudding

I have a datafarme

import pandas as pd

df = pd.DataFrame({"type":  ["A" ,"A1" ,"A" ,"A1","B" ],
                  "group":  ["g1", "g2","g2","g2","g1"]})

And i have a dictionary

 dic ={"AlphaA": {"A":  {"g1":"A_GRP1",  "g2":"A_GRP2"},
                  "A1": {"g1":"A1_GRP1", "g2":"A1_GRP2"}},
       "AlphaB": {"B":  {"g1":"B_GRP1",  "g2":"B_GRP2"}},
      }

i have to create a column name "value", which will use the data frame and dictionary and get value assigned to it

Conditions to be applied:

if type is "A" or "A1" it should refer dictionary key AlphaA and get the value for respective group and assign it to new column
if type is "B", it should refer dictionary key AlphaB and get the value of the respective group

Example of row one:
type is "A" hence refering dictionary key "AlphaA"
group is "g1
therefore :

dictt["AlphaA"]["A"]["g1"]          #would be the answer

Required Output

 final_df = pd.DataFrame({"type" :  ["A" ,"A1" ,"A" ,"A1","B" ],
                          "group":  ["g1", "g2","g2","g2","g1"],
                          "value":  ["A_GRP1", "A1_GRP2", "A_GRP2",
                                     "A1_GRP2", "B_GRP1"]})

I was able to achieve this using loops but its is taking lot of time,
hence looking for some speedy technique.

CodePudding user response：

Assuming dic the input dictionary, you can merge the dictionary values into a single dictionary (with help of ChainMap), convert to DataFrame and unstack to Series and merge:

from collections import ChainMap
s = pd.DataFrame(dict(ChainMap(*dic.values()))).unstack()

# without ChainMap
# d = {k: v for d in dic.values() for k,v in d.items()}
# pd.DataFrame(d).unstack()

out = df.merge(s.rename('value'), left_on=['type', 'group'], right_index=True)

output:

  type group    value
0    A    g1   A_GRP1
1   A1    g2  A1_GRP2
3   A1    g2  A1_GRP2
2    A    g2   A_GRP2
4    B    g1   B_GRP1

CodePudding user response：

Use DataFrame.join with Series created from dictionary by dict comprehension:

d1 = {(k1, k2): v2 for k, v in d.items() for k1, v1 in v.items() for k2, v2 in v1.items()}
df = df.join(pd.Series(d1).rename('value'), on=['type','group'])
print (df)
  type group    value
0    A    g1   A_GRP1
1   A1    g2  A1_GRP2
2    A    g2   A_GRP2
3   A1    g2  A1_GRP2
4    B    g1   B_GRP1

CodePudding user response：

You can remove the outer key of original dictionary and try apply on rows

d = {k:v for vs in d.values() for k, v in vs.items()}
df['value'] = (df.assign(value=df['type'].map(d))
               .apply(lambda row: row['value'][row['group']], axis=1)
               )

print(d)

{'A': {'g1': 'A_GRP1', 'g2': 'A_GRP2'}, 'A1': {'g1': 'A1_GRP1', 'g2': 'A1_GRP2'}, 'B': {'g1': 'B_GRP1', 'g2': 'B_GRP2'}}

print(df)

  type group    value
0    A    g1   A_GRP1
1   A1    g2  A1_GRP2
2    A    g2   A_GRP2
3   A1    g2  A1_GRP2
4    B    g1   B_GRP1