Home > database >  pandas chunksize how to slice chunk and directly jump into target chunk
pandas chunksize how to slice chunk and directly jump into target chunk

Time:10-09

I have a large file and I use pandas chunksize to split it into about 500 chunks:

  index=0  
  for df_ia in pd.read_csv("/path/to/file/file.TXT", chunksize=100000,iterator=True, low_memory=False):
        index  = 1
        if index < 500:
            continue
        elif index > 560:
            break

The problem is, as the code shows, if I want to go the 500th chunk to do some logic,I need go through from 1st chunk , chunk by chunk, until chunk 500,this cost about 200s, and then I can do some real logic.

My question is,is there any way I can slice the chunk and directly jump to chunk 500 to do logic,something like:

for df_ia in pd.read_csv("/path/to/file/file.TXT", chunksize=100000,iterator=True, low_memory=False):
if chunk_index ==500:
   do logic

or something like:

for df_ia in pd.read_csv("/path/to/file/file.TXT", chunksize=100000,iterator=True, low_memory=False , chunk[500:]):

Notice that I use slicing chunk[500:]

CodePudding user response:

IIUC use skiprows parameter for omit first 499 chunks, for not remove header is use np.arange:

n = 100000
for i, df_ia in enumerate(pd.read_csv("/path/to/file/file.TXT", 
                          chunksize=n, 
                          skiprows = np.arange(1, 500 * n   1),
                          iterator=True, 
                          low_memory=False)):
    if i == 0:
        #do logic
  • Related