I need to process a large dataframe in chunks and I applied this function:
def chunker(seq, size):
return (seq[pos:pos size] for pos in range(0, len(seq), size))
for i in chunker(df,chunk_size):
....
However, when I run this, I get error: ValueError: invalid literal for int() with base 10:
Do you have another way to process dataframe in chunks or to adjust above script?
thanks !
CodePudding user response:
You need to use iloc for this index sliciing over the rows:
def chunker(seq, size):
return (seq.iloc[pos:pos size] for pos in range(0, len(seq), size))
for i in chunker(df,chunk_size):
....
the reason is df[]
is for looking up columns and it does not take a slice argument. df.loc
is for row-index lookups which do not necessarily match incremental indexing (position based). You can read this for a more detailed explanation.
CodePudding user response:
thanks for your quick answer. I tried again following your recommendation, but still I get error:
ValueError: invalid literal for int() with base 10: 'xxx'
if I run the same code for a different dataframe, built using np.random.randn, it works fine. Also, the error is not given if I do not divide the dataframe in chunks. Any clue?