Home > Back-end >  How to transform dictionary keys into a dataframe column based on the values if the values are lists
How to transform dictionary keys into a dataframe column based on the values if the values are lists

Time:07-28

I have a dictionary of where the keys are numbers and the values are lists of strings. I want to create a dataframe column where the column values are the dictionary keys and the key is selected base on matching the value of another column in each row to an item in the dictionary value lists. See example code below: Sample starting dataframe and dictionary:

dict_x = {1:[a], 2:[b, c, e], 3:[d, f]
df = ['ID':[a, b, c, d, e, f]]

Desired output:

df = ['ID':[a, b, c, d, e, f], 'Number':[1, 2, 2, 3, 2, 3]]

I thought some sort of df['Number'] = df['ID'].apply(lambda x : ???) would work but I'm struggling with the conditions here, and I tried writing some for loops but ran in to issues with only the last iteration of the loop being preserved when I wrote the column.

CodePudding user response:

Simply invert the dictionary dict_x by switching the role of key and value (loop over list elements to do that).

# setup dictionary properly 
dict_x = {1:['a'], 2:['b', 'c', 'e'], 3:['d', 'f']}
df = pd.DataFrame({'ID':['a', 'b', 'c', 'd', 'e', 'f']})

# reverse dictionary
rev_dict_x = dict()
for k,v in dict_x.items():
    for v_elem in v:
        rev_dict_x[v_elem] = k
        
# replace elements
df['Number'] = df['ID'].replace(rev_dict_x)

>df

enter image description here

Note, that this assumes that the elements in the lists are unique, respectively. Otherwise, setting up the rev_dict_x will overwrite the value to those keys.

CodePudding user response:

I hope I've understood you correctly:

df = pd.DataFrame(
    [(k, i) for k, v in dict_x.items() for i in v], columns=["Number", "ID"]
)
print(df)

Prints:

   Number ID
0       1  a
1       2  b
2       2  c
3       2  e
4       3  d
5       3  f

Or:

df = (
    pd.DataFrame([dict_x])
    .melt()
    .explode("value")
    .rename(columns={"variable": "Number", "value": "ID"})
)
print(df)
  • Related