I have a problem. I have a list that contains 2549150
elements. However, I don't want to convert the whole list into a dataframe at once, using the pd.json_normalize
method.
I would like to convert the list into a dataframe step by step. First I want to convert the first 100,000 elements of the list, then from the 100,000 1 element the next 100,000 elements and so on.
However, the problem is that my dataframe contains 2500000
elements at the end instead of 2549150
elements. I therefore have too many and wrong elements. How can I fix the error?
In summary, I would like to convert the list into a dataframe in 100,000 steps.
import pandas as pd
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
a1D= [my_Dict] * 2549150
size = 25 # Didn't want to calculate this myself, but didn't know how else to solve it.
df_complete = pd.DataFrame()
for i in range(0,len(a1D),len(a1D)//size):
#print(i)
df = pd.json_normalize(a1D[i:i 100000], sep='_')
#print(df.shape)
df_complete= pd.concat([df_complete, df])
df_complete.shape
>>> [OUT]
>>> (2500000, 11)
CodePudding user response:
Rather than step up to your guess at how many elements there should be, step by the chunk size up to the length of the array instead:
df_complete = pd.DataFrame()
chunk = 100000
for i in range(0, len(a1D), chunk):
df = pd.json_normalize(a1D[i:i chunk], sep='_')
df_complete = pd.concat([df_complete, df])
df_complete.shape
Output:
(2549150, 11)