Home > database >  Normalize a list with jsons to a dataframe in steps
Normalize a list with jsons to a dataframe in steps

Time:05-10

I have a problem. I have a list with JSONs. I want to create a complete dataframe in steps. My idea is, for example: My list contains 100 elements. I want to say the size of the steps should be 25. So I say len(list) / size = 4 = 100 / 25. I have 4 runs of the for loop and 4 times of concat the small dataframe to the complete. For the MVP I have build a list, with 4 elements and with a step of 2. So every loops it should contactened two elements.

At the end my dataframe_complete contains only two rows. What is the problem for that?

The first loop should contain my_Dict and my_Dict2 the second my_Dict2 and my_Dict2. So the list should go from 0-1 and from 2-3. So every loop run should contain two elements.

import pandas as pd

my_Dict = {
'_key': '1',
 'group': 'test',
 'data': {},
 'type': '',
 'code': '007',
 'conType': '1',
 'flag': None,
 'createdAt': '2021',
 'currency': 'EUR',
 'detail': {
        'selector': {
            'number': '12312',
            'isTrue': True,
            'requirements': [{
                'type': 'customer',
                'requirement': '1'}]
            }
        }   
 }

my_Dict2 = {
'_key': '2',
 'group': 'test',
 'data2': {},
 'type': '',
 'code': '007',
 'conType': '1',
 'flag': None,
 'createdAt': '2021',
 'currency': 'EUR',
 'detail2': {
        'selector': {
            'number': '12312',
            'isTrue': True,
            'requirements': [{
                'type': 'customer',
                'requirement': '1'}]
            }
        }   
 }
list_dictionaries = [my_Dict, my_Dict2, my_Dict2, my_Dict2]

df_complete = pd.DataFrame()

size= 1
for i in range((len(list_dictionaries) // size)):
    print(i)
    df = pd.json_normalize(list_dictionaries[i], sep='_')
    df_complete= pd.concat([df_complete, df])

print(df_complete)

[OUT]

    _key    group   type    code    conType flag    createdAt   currency    detail_selector_number  detail_selector_isTrue  detail_selector_requirements    detail2_selector_number detail2_selector_isTrue detail2_selector_requirements
0   1   test        007 1   None    2021    EUR 12312   True    [{'type': 'customer', 'requirement': '1'}]  NaN NaN NaN
0   2   test        007 1   None    2021    EUR NaN NaN NaN 12312   True    [{'type': 'customer', 'requirement': '1'}]

Expected output

    _key    group   type    code    conType flag    createdAt   currency    detail_selector_number  detail_selector_isTrue  detail_selector_requirements    detail2_selector_number detail2_selector_isTrue detail2_selector_requirements
0   1   test        007 1   None    2021    EUR 12312   True    [{'type': 'customer', 'requirement': '1'}]  NaN NaN NaN
1   2   test        007 1   None    2021    EUR NaN NaN NaN 12312   True    [{'type': 'customer', 'requirement': '1'}]
2   2   test        007 1   None    2021    EUR NaN NaN NaN 12312   True    [{'type': 'customer', 'requirement': '1'}]
3   2   test        007 1   None    2021    EUR NaN NaN NaN 12312   True    [{'type': 'customer', 'requirement': '1'}]

CodePudding user response:

The problem most likely arose that at the first iteration an empty dataframe was obtained.Since the list_dictionaries[:0] slice was used. Try the code below.

list_dictionaries = [my_Dict, my_Dict2, my_Dict2, my_Dict2]
df_complete = pd.DataFrame()

for i in range(0, len(list_dictionaries)):
    df = pd.json_normalize(list_dictionaries[i], sep='_')
    df_complete = pd.concat([df_complete, df])


print(df_complete.reset_index())

Is that what you need?

If you need two dictionaries at each iteration:

for i in range(0, len(list_dictionaries), 2):
    print(list_dictionaries[i:i 2])

If you want to connect all normalized frames in two iterations.

for i in range(0, len(list_dictionaries), 2):
    df1 = pd.json_normalize(list_dictionaries[i], sep='_')
    df2 = pd.json_normalize(list_dictionaries[i 1], sep='_')
    df_complete = pd.concat([df_complete, df1, df2])

df_complete = df_complete.reset_index()
print(df_complete)

Or in general, so as not to create unnecessary four dictionaries in 'list_dictionaries'. It is necessary to pass in a loop a list of the necessary elements at each iteration and take indexes from them.The first iteration is the first and second[0, 1] dictionary, the second is both second[1, 1] dictionaries.

list_dictionaries = [my_Dict, my_Dict2]
df_complete = pd.DataFrame()
for i in [[0, 1], [1, 1]]:
    df1 = pd.json_normalize(list_dictionaries[i[0]], sep='_')
    df2 = pd.json_normalize(list_dictionaries[i[1]], sep='_')
    df_complete = pd.concat([df_complete, df1, df2])

df_complete = df_complete.reset_index()
  • Related