Home > Software engineering >  How to remove the already iterated rows of a csv file
How to remove the already iterated rows of a csv file

Time:11-06

I have a csv file with 100 records. I want to write the first 50 records in a new csv file i.e 'newFile.csv' in the first iteration. In the second iteration, I want to write the next 50 records in the 'newFile.csv' file after reading the next 50 records from the original csv file.

I am able to perform the first Iteration but unable to perform the second iteration with the expected values as the next 50 rows that has to be written in the csv file. Can someone please help me out in this?? Thank you

Here is the code

import pandas as pd

oldData = pd.read_csv('oldFile.csv') # Has 100 rows

for i in range(2):
    newData = pd.read_csv('oldFile.csv', nrows=50) # Has 50 rows

    newCsv = newData.to_csv('newFile.csv', index=False)

    newData = newData.iloc[50:] # Removes those 50 rows

    

CodePudding user response:

import pandas as pd

oldData = pd.read_csv('oldFile.csv') # Has 100 rows

for newData in pd.read_csv('oldFile.csv', chunksize=50) # Has 50 rows:

    newCsv = newData.to_csv('newFile.csv', index=False)

    newData = newData.iloc[50:] # Removes those 50 rows

In this way each time you read the .csv file it contains 50 rows. The first iteration the first 50 rows, the second one the rows from 51 to 101, and so on.

CodePudding user response:

You can read the oldFile.csv in chunks of 50 rows and then process each chunk individually, e.g.,

import pandas as pd

nRows=50

with pd.read_csv('oldFile.csv', chunksize=nRows, header=None) as reader:
    for chunk in reader:
        print(chunk)
        chunk.to_csv('newFile.csv', index=False, header=None) 

Note that newFile.csv is being overwritten on each iteration.

  • Related