I have about 40,000 lines of data. All the data is not split evenly. The data is split using two lines that get read in as [',,,'] as shown in the picture below. I want to iterate through the data, split at these rows and then write the data up to that point into a new CSV. Any help would be appreciated. How the data is split; seen here
CodePudding user response:
You could try this reference
from csv import reader
# open file
with open("Demo.csv", "r") as my_file:
# pass the file object to reader()
file_reader = reader(my_file)
# do this for all the rows
for i in file_reader:
# print the rows
print(i)
Then add a section to check each row for ",,,". You would need a counter to capture the number of rows at that point but then you can use it in pd.read_csv('file.csv', nrows = n)
.
You don't need to print the rows like the above code does btw. Its just there as an example to iterate through the csv.
CodePudding user response:
I think from your description, you want to split a file where the split is a given string (",,,"). I don't think the fact that it's a csv makes a difference. In which case you're looking for something like:
count = 1
with open("my.csv") as fd_in:
fd_out = open(f"out-{count}.csv", "w")
for line in fd_in:
if line.rstrip() == ",,,":
fd_out.close()
count = 1
fd_out = open(f"out-{count}.csv", "w")
continue
fd_out.write(line)
fd_out.close()
It's a little 'hacky' but produces numbered output files.