Home > Mobile >  Create dictionary from several columns based on position of values
Create dictionary from several columns based on position of values

Time:03-03

I have a dataframe like this

import pandas as pd

df = pd.DataFrame(
    {
        'C1': list('aabbab'),
        'C2': list('abbbaa'),
        'value': range(11, 17)
    }
)

  C1 C2  value
0  a  a     11
1  a  b     12
2  b  b     13
3  b  b     14
4  a  a     15
5  b  a     16

and I would like to generate a dictionary like this:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}

Logic is as follows:

In df I go to the column C1 and the first a I find in the column corresponds to value 11, the second one to value 12 and the third one to 15. The position of the a and the corresponding value should be stored in the dictionary for the keys C1 and a.

I could do something like this

df_ss = df.loc[df['C1'] == 'a', 'value']
d = {ind: val for ind, val in enumerate(df_ss.values, 1)}

which yields for d:

{1: 11, 2: 12, 3: 15}

which is indeed the desired output. I could then put this into a loop and generate all required dictionaries.

Does anyone sees something more efficient than this?

CodePudding user response:

You could use a groupby and a nested dict comprehension:

import pandas as pd

df = pd.DataFrame(
    {
        'C1': list('aabbab'),
        'C2': list('abbbaa'),
        'value': range(11, 17)
    }
)

d = {
    c: {k: dict(enumerate(g["value"], 1)) for k, g in df.groupby(c)}
    for c in ["C1", "C2"]
}

Which outputs:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
 'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}

CodePudding user response:

Use a dictionary comprehension on the groupby for each C-value with enumerate:

cols = ['C1', 'C2']
# or, programmatically
# cols = df.filter(regex='^C').columns

out = {c: {k: dict(enumerate(g, start=1)) for k,g in df.groupby(c)['value']} 
       for c in cols}

output:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
 'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}

CodePudding user response:

This is probably a bit slower than @mozway and @Alex methods:

{
    c: df.set_index([df.groupby(c).cumcount()   1, c])["value"]
         .unstack(0)
         .to_dict("index")
    for c in ["C1", "C2"]
}

Output:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
 'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}
  • Related