I have this file (with thousands of lines). Each line contains two numbers separated by whitespace:
3466 937
3466 5233
3466 8579
3466 10310
3466 15931
3466 17038
3466 18720
3466 19607
10310 1854
10310 3466
10310 4583
10310 5233
10310 9572
10310 10841
10310 13056
10310 14982
10310 16310
and I have to delete in python lines that are repeated in reverse order, namely 10310 3466 and 3466 10310 should appear only as one line (either 10310 3466 or 3466 10310). Any ideas? Thank you.
CodePudding user response:
One approach is to use frozenset
to generate keys that are order insensitive:
# change data.csv to the name of your file
with open("data.csv") as infile:
uniques = set(frozenset(line.strip().split()) for line in infile)
for value in uniques:
print(*value)
Output (for input given)
10310 3466
5233 10310
10310 4583
19607 3466
1854 10310
3466 8579
10310 9572
10310 13056
10310 14982
5233 3466
17038 3466
15931 3466
10310 10841
937 3466
18720 3466
16310 10310
Alternative, using sorted
to convert each line to the same key:
# change data.csv to the name of your file
with open("data.csv") as infile:
uniques = set(" ".join(sorted(line.strip().split())) for line in infile)
for value in uniques:
print(value)
To better understand the approach using frozenset
, see the code below:
frozenset((1, 2)) == frozenset((2, 1))
Out[2]: True
As it can be seen two frozenset
are equals independent of the order of the tuples used as input. This happens for regular sets also but frozensets are hashable, from the documentation:
The frozenset type is immutable and hashable — its contents cannot be altered after it is created; it can therefore be used as a dictionary key or as an element of another set.
Note
To write the de-duplicated lines to a new file do:
# change data.csv to the name of your file
with open("data.csv") as infile:
uniques = set(frozenset(line.strip().split()) for line in infile)
# change output.csv to the name of your output file
with open("output.csv", mode="w") as outfile:
for value in uniques:
outfile.write(f'{" ".join(value)}\n')
CodePudding user response:
It seems order of the numbers is not important so you could do like this:
filename='data.txt'
list=[]
with open(filename) as file:
lines = file.readlines()
for line in lines:
nums=line.split(' ')
nums = ' '.join(nums).split()
a,b=int(nums[0]),int(nums[1])
min=a
max=b
if b<a:
min=b
max=a
list.append(str(min) ' ' str(max))
uniqueSet=set(list)
with open("output.txt", mode="w") as outfile:
for l in uniqueSet:
outfile.write(l '\n')