I have a problem. I have a list
with JSONs
. I want to create a complete dataframe in steps. My idea is, for example: My list contains 100 elements. I want to say the size of the steps should be 25. So I say len(list) / size = 4 = 100 / 25
. I have 4 runs of the for loop and 4 times of concat the small dataframe to the complete. For the MVP I have build a list, with 4 elements and with a step of 2. So every loops it should contactened two elements.
At the end my dataframe_complete
contains only two rows. What is the problem for that?
The first loop should contain my_Dict
and my_Dict2
the second my_Dict2
and my_Dict2
. So the list should go from 0-1 and from 2-3. So every loop run should contain two elements.
import pandas as pd
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
my_Dict2 = {
'_key': '2',
'group': 'test',
'data2': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail2': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
list_dictionaries = [my_Dict, my_Dict2, my_Dict2, my_Dict2]
df_complete = pd.DataFrame()
size= 1
for i in range((len(list_dictionaries) // size)):
print(i)
df = pd.json_normalize(list_dictionaries[i], sep='_')
df_complete= pd.concat([df_complete, df])
print(df_complete)
[OUT]
_key group type code conType flag createdAt currency detail_selector_number detail_selector_isTrue detail_selector_requirements detail2_selector_number detail2_selector_isTrue detail2_selector_requirements
0 1 test 007 1 None 2021 EUR 12312 True [{'type': 'customer', 'requirement': '1'}] NaN NaN NaN
0 2 test 007 1 None 2021 EUR NaN NaN NaN 12312 True [{'type': 'customer', 'requirement': '1'}]
Expected output
_key group type code conType flag createdAt currency detail_selector_number detail_selector_isTrue detail_selector_requirements detail2_selector_number detail2_selector_isTrue detail2_selector_requirements
0 1 test 007 1 None 2021 EUR 12312 True [{'type': 'customer', 'requirement': '1'}] NaN NaN NaN
1 2 test 007 1 None 2021 EUR NaN NaN NaN 12312 True [{'type': 'customer', 'requirement': '1'}]
2 2 test 007 1 None 2021 EUR NaN NaN NaN 12312 True [{'type': 'customer', 'requirement': '1'}]
3 2 test 007 1 None 2021 EUR NaN NaN NaN 12312 True [{'type': 'customer', 'requirement': '1'}]
CodePudding user response:
The problem most likely arose that at the first iteration an empty dataframe was obtained.Since the list_dictionaries[:0] slice was used. Try the code below.
list_dictionaries = [my_Dict, my_Dict2, my_Dict2, my_Dict2]
df_complete = pd.DataFrame()
for i in range(0, len(list_dictionaries)):
df = pd.json_normalize(list_dictionaries[i], sep='_')
df_complete = pd.concat([df_complete, df])
print(df_complete.reset_index())
Is that what you need?
If you need two dictionaries at each iteration:
for i in range(0, len(list_dictionaries), 2):
print(list_dictionaries[i:i 2])
If you want to connect all normalized frames in two iterations.
for i in range(0, len(list_dictionaries), 2):
df1 = pd.json_normalize(list_dictionaries[i], sep='_')
df2 = pd.json_normalize(list_dictionaries[i 1], sep='_')
df_complete = pd.concat([df_complete, df1, df2])
df_complete = df_complete.reset_index()
print(df_complete)
Or in general, so as not to create unnecessary four dictionaries in 'list_dictionaries'. It is necessary to pass in a loop a list of the necessary elements at each iteration and take indexes from them.The first iteration is the first and second[0, 1] dictionary, the second is both second[1, 1] dictionaries.
list_dictionaries = [my_Dict, my_Dict2]
df_complete = pd.DataFrame()
for i in [[0, 1], [1, 1]]:
df1 = pd.json_normalize(list_dictionaries[i[0]], sep='_')
df2 = pd.json_normalize(list_dictionaries[i[1]], sep='_')
df_complete = pd.concat([df_complete, df1, df2])
df_complete = df_complete.reset_index()