I have a large file and I use pandas chunksize to split it into about 500 chunks:
index=0
for df_ia in pd.read_csv("/path/to/file/file.TXT", chunksize=100000,iterator=True, low_memory=False):
index = 1
if index < 500:
continue
elif index > 560:
break
The problem is, as the code shows, if I want to go the 500th chunk to do some logic,I need go through from 1st chunk , chunk by chunk, until chunk 500,this cost about 200s, and then I can do some real logic.
My question is,is there any way I can slice the chunk and directly jump to chunk 500 to do logic,something like:
for df_ia in pd.read_csv("/path/to/file/file.TXT", chunksize=100000,iterator=True, low_memory=False):
if chunk_index ==500:
do logic
or something like:
for df_ia in pd.read_csv("/path/to/file/file.TXT", chunksize=100000,iterator=True, low_memory=False , chunk[500:]):
Notice that I use slicing chunk[500:]
CodePudding user response:
IIUC use skiprows
parameter for omit first 499 chunks, for not remove header is use np.arange
:
n = 100000
for i, df_ia in enumerate(pd.read_csv("/path/to/file/file.TXT",
chunksize=n,
skiprows = np.arange(1, 500 * n 1),
iterator=True,
low_memory=False)):
if i == 0:
#do logic