Home > Back-end >  Compare two files and store data in new file in python
Compare two files and store data in new file in python

Time:01-06

Sorry, I have updated my question. I have two files file1.txt and file2.txt and their respective data is as follow:

file1.txt:

admin:admin
admin:meunsm
admin:12345

sequence in file1.txt is:

username:password

file2.txt:

192.168.0.114:1137   >   192.168.0.193:21 csanders:echo

sequence in file2.txt is:

source ip:source port > destination ip:destination port username:password

Now, what I want from python is to just compare these files and extract the username only. If username in file1.txt doesn't exist in file2.txt, then that username must store in a new text file. Here I have updated my question with .txt files data. Also there can be hundred of thousands rows in these both files and for loop should be use in this case because I want to save the username in my database table.

I have picked this code sample from Stack overflow Where both files are compared at same time, if there is any common data in both files that data will write in a new file:

sample:

with open('file1.txt') as file1:
    with open('file2.txt') as file2:
        newfile = open('newfile.txt','w')
        common_lines = set(file1.readlines()) & set(file2.readlines())
        for line in common_lines:
          newfile.write(line)

        newfile.close() 

but my scenario is quiet different. I want if data in file1.txt is not in file2.txt then that data must store in newfile.txt. Just I want to compare two files at same time and I want if data in file1.txt doesn't exist in file2.txt, so that data must be stored in newfile

CodePudding user response:

It is quite easy with a for loop.

with open('file1.txt') as file1:
  with open('file2.txt') as file2:
    newfile = open('newfile.txt','w')
    different_lines = []

    for line1 in file1.readlines():
        if line1 not in file2.readlines():
            different_lines.append(line1)
    
    for line in different_lines:
      newfile.write(line)

    newfile.close()

You can also make it better with python list comprehension.

with open('file1.txt') as file1:
  with open('file2.txt') as file2:
    newfile = open('newfile.txt','w')
    different_lines = [l1 for l1 in file1.readlines() if l1 not in file2.readlines()]
    for line in different_lines:
      newfile.write(line)

    newfile.close()

CodePudding user response:

You can use symmetric_difference.

Something like:

with open('file1.txt') as file1, \
     open('file2.txt') as file2, \
     open('newfile.txt', 'w') as newfile:
    diff = set(file1.readlines()).symmetric_difference(file2.readlines())
    for line in diff:
        newfile.write(f"{line.strip().split(':')[0]}\n")

Note: using set does not guarantee the order of lines.

CodePudding user response:

Here is the simple code and it will maintain the order (memory efficient), I will work on huge files also as it is using iterator

with open("file1.txt") as fp1, open("file2.txt") as fp2, open("newfile.txt", "w") as fp3:
    i = 0
    k = 0
    while True:
        try:
            if i == 0:
                # at first get line from both file
                l1 = next(fp1)
                l2 = next(fp2)
            # if both the line is equal get another line
            if l1 == l2:
                try:
                    l1 = next(fp1)
                except StopIteration:
                    break
                l2 = next(fp2)
            # if line are not equal then put l1 in new file
            else:
                fp3.write(l1)
                try:
                    l1 = next(fp1)
                except StopIteration:
                    break
            i  = 1
        except StopIteration:
            k  = 1
            if k == 2:
                break
        except Exception as e:
            print(e)
            break
  • Related