I think this problem is similar to the previous questions in this forum. However, I still have a problem with how to create a nested dictionary from a list of tuples.
Suppose I have the following tuple:
my_list = [
('actor', 'actor_id', 'integer', 'NO'),
('actor', 'first_name', 'character varying', 'NO'),
('actor_info', 'actor_id', 'integer', 'YES'),
('actor_info', 'first_name', 'character varying', 'YES')]
# convert into DataFrame
col = ['table', 'col_name', 'dtype', 'isnull']
df = pd.DataFrame(mylist, columns=col)
print(df)
table col_name dtype isnull
0 actor actor_id integer NO
1 actor first_name character varying NO
2 actor_info actor_id integer YES
3 actor_info first_name character varying YES
Current result:
{
'actor': {
'actor_id': {'dtype': 'integer', 'isnull': 'NO'},
'first_name': {'dtype': 'character varying', 'isnull': 'NO'}
},
'actor_info': {
'actor_id': {'dtype': 'integer', 'isnull': 'YES'},
'first_name': {'dtype': 'character varying', 'isnull': 'YES'}
}
}
The expected result (it should be grouped by table_name):
{
'actor':
[
{'column': 'actor_id', 'dtype': 'integer', 'isnull': 'NO'},
{'column': 'first_name', 'dtype': 'character varying', 'isnull': 'NO'}
],
'actor_info':
[
{'column': 'actor_id', 'dtype': 'integer', 'isnull': 'YES'},
{'column': 'first_name', 'dtype': 'character varying', 'isnull': 'YES'}
]
}
I've tried to make a nested dictionary by converting my_list to DataFrame
. However, I cannot get the desired output result. This is my current code:
# convert to nested dictionary
ff = df.iloc.groupby(['table'])[['col_name','dtype','isnull']].apply(lambda x: x.set_index('col_name').to_dict(orient='index')).to_dict()
# convert to JSON
print(json.dumps(ff, indent=1))
Could you help me to solve this kind of problem?
I'm also curious about how to solve this problem without converting to DataFrame (e.g., list comprehension, nested looping). Any help to clear this problem up would be much appreciated. Thanks
CodePudding user response:
As per my comment, I'm assuming you'll want a list of columns metadata per table name, rather dict of dicts.
If so, this is one straightforward approach.
from collections import defaultdict
d = defaultdict(list)
for tablename, col, dtype, isnull in my_list:
d[tablename].append({ 'column': col, 'dtype': dtype, 'isnull': isnull })
Output (in ipython):
In [19]: d
Out[19]:
defaultdict(list,
{'actor': [{'column': 'actor_id',
'dtype': 'integer',
'isnull': 'NO'},
{'column': 'first_name',
'dtype': 'character varying',
'isnull': 'NO'}],
'actor_info': [{'column': 'actor_id',
'dtype': 'integer',
'isnull': 'YES'},
{'column': 'first_name',
'dtype': 'character varying',
'isnull': 'YES'}]})
In [20]: d['actor']
Out[20]:
[{'column': 'actor_id', 'dtype': 'integer', 'isnull': 'NO'},
{'column': 'first_name', 'dtype': 'character varying', 'isnull': 'NO'}]
In [21]: d['actor_info']
Out[21]:
[{'column': 'actor_id', 'dtype': 'integer', 'isnull': 'YES'},
{'column': 'first_name', 'dtype': 'character varying', 'isnull': 'YES'}]
CodePudding user response:
Quite easy with a dictionary comprehension:
from itertools import groupby
{k: {a: dict(zip(('column', 'dtype', 'isnull'), b])) for _,a,*b in g}
for k,g in groupby(my_list, lambda t: t[0])}
NB. groupby assumes that the initial array is sorted by the grouping key, if not you need to sort it first
Output:
{'actor': {'actor_id': {'column': 'integer', 'dtype': 'NO'},
'first_name': {'column': 'character varying', 'dtype': 'NO'}},
'actor_info': {'actor_id': {'column': 'integer', 'dtype': 'YES'},
'first_name': {'column': 'character varying', 'dtype': 'YES'}}}