I have a TSV file with the data separated like this
class (this is tab header)
256-56-8411 Bob 3.61 junior cs
471-44-7458 Tim 3.45 senior ce
326-56-4286 Rajesh 2.97 freshman te
548-66-1124 Eric 2.88 sophomore ee
447-21-3599 John 2.51 junior cs
911-41-1256 Rebecca 3.92 senior cs
854-22-6372 Robin 2.45 freshman te
criteria (this is tab header)
A B C D D 1
A B C D E 2
S D F G H 2
S D F G B 4
class (this is tab header)
447-21-3599 John 2.51 junior cs
911-41-1256 Rebecca 3.92 senior cs
854-22-6372 Robin 2.45 freshman te
256-56-8411 Bob 3.61 junior cs
471-44-7458 Tim 3.45 senior ce
326-56-4286 Rajesh 2.97 freshman te
548-66-1124 Eric 2.88 sophomore ee
criteria (this is tab header)
S D F G H 2
S D F G B 4
A B C D D 1
A B C D E 2
I need help writing a python script that will extract and merge all of the data for each tab into One file for respective tabs . One for class , another for tab criteria. The out dataset will look something like this.
output file for class
256-56-8411 Bob 3.61 junior cs
471-44-7458 Tim 3.45 senior ce
326-56-4286 Rajesh 2.97 freshman te
548-66-1124 Eric 2.88 sophomore ee
447-21-3599 John 2.51 junior cs
911-41-1256 Rebecca 3.92 senior cs
854-22-6372 Robin 2.45 freshman te
447-21-3599 John 2.51 junior cs
911-41-1256 Rebecca 3.92 senior cs
854-22-6372 Robin 2.45 freshman te
256-56-8411 Bob 3.61 junior cs
471-44-7458 Tim 3.45 senior ce
326-56-4286 Rajesh 2.97 freshman te
548-66-1124 Eric 2.88 sophomore ee
output file for criteria
A B C D D 1
A B C D E 2
S D F G H 2
S D F G B 4
S D F G H 2
S D F G B 4
A B C D D 1
A B C D E 2
Thanks for your anticipated help.
CodePudding user response:
Given the file you gave this will sort your TSV into the two lists.
It will fail if there is any not empty line above one of the * (this is tab header)
lines.
NB1. as I wrote it right now it will replace the original file with the new sorted file.
NB2. this reads the file in memory, if you're working with huge files it might be best to read/write to a pair of tempfile.TemporaryFile
and then concatenate those.
lines = {"class": [], "criteria": []}
with open("file.tsv", "r") as f:
while line := f.readline():
if line == "\n":
continue
elif line == "class (this is tab header)\n":
current = "class"
elif line == "criteria (this is tab header)\n":
current = "criteria"
else:
lines[current].append(line)
with open("file.tsv", "w") as f: # note this truncates the file to 0
f.write("class (this is tab header)\n")
f.writelines(lines["class"])
f.write("criteria (this is tab header)\n")
f.writelines(lines["criteria"])
CodePudding user response:
You could use a dictionary to hold suitable file handles and write the lines as they are read in. This would then work on any file size:
output = {
'class' : open('class.tsv', 'w'),
'criteria' : open('criteria.tsv', 'w')
}
# Write suitable headers
for header, handle in output.items():
handle.write(f"{header} (this is a tab header)\n")
with open('input.tsv') as f_input:
for line in f_input:
if line.strip(): # skip any blank lines
if line.startswith('class'):
mode = 'class'
elif line.startswith('criteria'):
mode = 'criteria'
else:
output[mode].write(line)
for handle in output.values():
handle.close()