Group values of a chosen column into a list when creating a dictionary from a pandas data frame with-CodePudding

I have a dataframe that looks like his

_____________________
|col1 | col2 | col3 |
---------------------
| a   | b    | c    |
| d   | b    | c    |
| e   | f    | g    |
| h   | f    | j    |
---------------------

I want to get a dictionary structure that looks as follows

{
    b : { col1: [a,d], col2: b, col3: c},
    f : { col1: [e, h], col2: f, col3: [g, j]}
}

I have seen this answer. But it seems like overkill for what I want to do as it converts every value of the key inside the nested dictionary into a list. I would only like to convert col1 into a list when creating the dictionary. Is this possible?

CodePudding user response：

Use custom lambda function for return unique values in list if there is multiple them else scalar in lambda function:

d = (df.set_index('col2', drop=False)
       .groupby(level=0)
       .agg(lambda x: list(set(x)) if len(set(x)) > 1 else list(set(x))[0])
       .to_dict('index'))
print (d)
{'b': {'col1': ['d', 'a'], 'col2': 'b', 'col3': 'c'}, 
 'f': {'col1': ['h', 'e'], 'col2': 'f', 'col3': ['j', 'g']}}

If order is important use dict.fromkeys for remove duplicates:

d = (df.set_index('col2', drop=False)
       .groupby(level=0)
       .agg(lambda x: list(dict.fromkeys(x)) if len(set(x)) > 1 else list(set(x))[0])
       .to_dict('index'))
print (d)
{'b': {'col1': ['a', 'd'], 'col2': 'b', 'col3': 'c'},
 'f': {'col1': ['e', 'h'], 'col2': 'f', 'col3': ['g', 'j']}}