Python: How to create nested Dictionary from tuples?-CodePudding

I think this problem is similar to the previous questions in this forum. However, I still have a problem with how to create a nested dictionary from a list of tuples.

Suppose I have the following tuple:

my_list = [
 ('actor', 'actor_id', 'integer', 'NO'),
 ('actor', 'first_name', 'character varying', 'NO'),
 ('actor_info', 'actor_id', 'integer', 'YES'),
 ('actor_info', 'first_name', 'character varying', 'YES')]

# convert into DataFrame
col = ['table', 'col_name', 'dtype', 'isnull']
df = pd.DataFrame(mylist, columns=col)

print(df)

        table     col_name                        dtype   isnull
0       actor     actor_id                      integer     NO
1       actor   first_name            character varying     NO
2  actor_info     actor_id                      integer    YES
3  actor_info   first_name            character varying    YES

Current result:

{
 'actor': {
    'actor_id': {'dtype': 'integer', 'isnull': 'NO'},
    'first_name': {'dtype': 'character varying', 'isnull': 'NO'}
    },
 'actor_info': {
    'actor_id': {'dtype': 'integer', 'isnull': 'YES'},
    'first_name': {'dtype': 'character varying', 'isnull': 'YES'}
    }
}

The expected result (it should be grouped by table_name):

{
 'actor': 
    [
        {'column': 'actor_id', 'dtype': 'integer', 'isnull': 'NO'},
        {'column': 'first_name', 'dtype': 'character varying', 'isnull': 'NO'}
    ],
 'actor_info': 
    [
        {'column': 'actor_id', 'dtype': 'integer', 'isnull': 'YES'},
        {'column': 'first_name', 'dtype': 'character varying', 'isnull': 'YES'}
    ]
}

I've tried to make a nested dictionary by converting my_list to DataFrame. However, I cannot get the desired output result. This is my current code:

# convert to nested dictionary
ff = df.iloc.groupby(['table'])[['col_name','dtype','isnull']].apply(lambda x: x.set_index('col_name').to_dict(orient='index')).to_dict()

# convert to JSON
print(json.dumps(ff, indent=1))

Could you help me to solve this kind of problem?

I'm also curious about how to solve this problem without converting to DataFrame (e.g., list comprehension, nested looping). Any help to clear this problem up would be much appreciated. Thanks

CodePudding user response：

As per my comment, I'm assuming you'll want a list of columns metadata per table name, rather dict of dicts.

If so, this is one straightforward approach.

from collections import defaultdict

d = defaultdict(list)

for tablename, col, dtype, isnull in my_list:
    d[tablename].append({ 'column': col, 'dtype': dtype, 'isnull': isnull })

Output (in ipython):

In [19]: d
Out[19]:
defaultdict(list,
            {'actor': [{'column': 'actor_id',
               'dtype': 'integer',
               'isnull': 'NO'},
              {'column': 'first_name',
               'dtype': 'character varying',
               'isnull': 'NO'}],
             'actor_info': [{'column': 'actor_id',
               'dtype': 'integer',
               'isnull': 'YES'},
              {'column': 'first_name',
               'dtype': 'character varying',
               'isnull': 'YES'}]})

In [20]: d['actor']
Out[20]:
[{'column': 'actor_id', 'dtype': 'integer', 'isnull': 'NO'},
 {'column': 'first_name', 'dtype': 'character varying', 'isnull': 'NO'}]

In [21]: d['actor_info']
Out[21]:
[{'column': 'actor_id', 'dtype': 'integer', 'isnull': 'YES'},
 {'column': 'first_name', 'dtype': 'character varying', 'isnull': 'YES'}]

CodePudding user response：

Quite easy with a dictionary comprehension:

from itertools import groupby

{k: {a: dict(zip(('column', 'dtype', 'isnull'), b])) for _,a,*b in g}
 for k,g in groupby(my_list, lambda t: t[0])}

NB. groupby assumes that the initial array is sorted by the grouping key, if not you need to sort it first

Output:

{'actor': {'actor_id': {'column': 'integer', 'dtype': 'NO'},
  'first_name': {'column': 'character varying', 'dtype': 'NO'}},
 'actor_info': {'actor_id': {'column': 'integer', 'dtype': 'YES'},
  'first_name': {'column': 'character varying', 'dtype': 'YES'}}}