Home > Enterprise >  Split Large CSV at a point and create new file
Split Large CSV at a point and create new file

Time:06-18

I have about 40,000 lines of data. All the data is not split evenly. The data is split using two lines that get read in as [',,,'] as shown in the picture below. I want to iterate through the data, split at these rows and then write the data up to that point into a new CSV. Any help would be appreciated. How the data is split; seen here

CodePudding user response:

You could try this reference

from csv import reader

# open file
with open("Demo.csv", "r") as my_file:
    # pass the file object to reader()
    file_reader = reader(my_file)
    # do this for all the rows
    for i in file_reader:
        # print the rows
        print(i)

Then add a section to check each row for ",,,". You would need a counter to capture the number of rows at that point but then you can use it in pd.read_csv('file.csv', nrows = n).

You don't need to print the rows like the above code does btw. Its just there as an example to iterate through the csv.

CodePudding user response:

I think from your description, you want to split a file where the split is a given string (",,,"). I don't think the fact that it's a csv makes a difference. In which case you're looking for something like:

count = 1
with open("my.csv") as fd_in:
    fd_out = open(f"out-{count}.csv", "w")
    for line in fd_in:
        if line.rstrip() == ",,,":
            fd_out.close()
            count  = 1
            fd_out = open(f"out-{count}.csv", "w")
            continue

        fd_out.write(line)
    fd_out.close()

It's a little 'hacky' but produces numbered output files.

  • Related