As it can be seen in the code. I created two output files one for output after splitting and second output as actual out after removing duplicate lines How can i make only one output file. Sorry if i sound too stupid, I'm a beginner
import sys
txt = sys.argv[1]
lines_seen = set() # holds lines already seen
outfile = open("out.txt", "w")
actualout = open("output.txt", "w")
for line in open(txt, "r"):
line = line.split("?", 1)[0]
outfile.write(line "\n")
outfile.close()
for line in open("out.txt", "r"):
if line not in lines_seen: # not a duplicate
actualout.write(line)
lines_seen.add(line)
actualout.close()
CodePudding user response:
You can add the lines from the input file directly into the set. Since sets cannot have duplicates, you don't even need to check for those. Try this:
import sys
txt = sys.argv[1]
lines_seen = set() # holds lines already seen
actualout = open("output.txt", "w")
for line in open(txt, "r"):
line = line.split("?", 1)[0]
lines_seen.add(line "\n")
for line in lines_seen:
actualout.write(line)
actualout.close()
CodePudding user response:
In the first step you iterate through every line in the file, split the line on your decriminator and store it into a list. After that you iterate through the list and write it into your output file.
import sys
txt = sys.argv[1]
lines_seen = set() # holds lines already seen
actualout = open("output.txt", "w")
data = [line.split("?", 1[0] for line in open("path/to/file/here", "r")]
for line in data:
if line not in lines_seen: # not a duplicate
actualout.write(line)
lines_seen.add(line)
actualout.close()