Looking to compare values in two different files-CodePudding

I have two CSV files that have been renamed to text files. I need to compare a column in each one (a date) to confirm they have been updated.

For example, c:\temp\oldfile.txt has 6 columns and the last one is called version. I need to make sure that c:\temp\newfile.txt has a different value for version. It doesn't need to do any date verification of any kind, as long as the comparison sees that they're different, it can proceed. If possible, I would prefer to stick with 'standard' libraries as I'm just learning and don't want to start creating dictionaries and learning pandas and numpy just yet.

Edit

Here's a copy of oldfile.txt and newfile.txt.

oldfile.txt:

feed_publisher_name,feed_publisher_url,feed_lang,feed_start_date,feed_end_date,feed_version
MyStuff,http://www.mystuff.com,en,20220103,20220417,22APR_20220401

newfile.txt:

feed_publisher_name,feed_publisher_url,feed_lang,feed_start_date,feed_end_date,feed_version
MyStuff,http://www.mystuff.com,en,20220103,20220417,22APR_20220414

In this case the comparison would note that the last column has a different value and would know to proceed with the rest of the script. Otherwise, if the values are the same, it will know that it was not updated and I'll have the program exit.

CodePudding user response：

You can do it by using the csv module in the standard library since that's the format of your files.

import csv

with open('oldfile.txt', 'r', newline='') as oldfile, \
     open('newfile.txt', 'r', newline='') as newfile:

    old_reader = csv.DictReader(oldfile)
    new_reader = csv.DictReader(newfile)

    old_row = next(old_reader)
    new_row = next(new_reader)

    same = old_row['feed_version'] == new_row['feed_version']
    print(f"The files are {'the same' if same else 'different'}.")

CodePudding user response：

If you are only interested in checking if there two files are the equal (essentially "updated"), you can compute the hash of one file and compare with the hash of the other

To compute hash (for example, sha256), you can use the following function:

import hashlib
def sha256sum(filename):
    # Opens the file
    with open(filename, 'rb') as file:
        content = file.read()
    hasher = hashlib.sha256()
    hasher.update(content)
    return hasher.hexdigest()

hashlib is probably part of the standard library if you went through the default installation process.

For example, if you write "v1.0" in a text document, the hasher function will give "fa8b919c909d5eb9e373d090928170eb0e7936ac20ccf413332b96520903168e"

If you later change it to "v1.1", the hasher function will give "eb79768c42dbbf9f10733e525a06ea9eb08f28b7b8edf9c6dcacb63940aedcb0".

These are two different hexdigest values, so it would imply that two files are different.

CodePudding user response：

Reading the file-
we don't need any libraries for this. just opening the file and reading it, then doing a little parsing:

a, b = "", "" # set the globals for the comparison

with open("c:/temp/oldfile.txt") as f: # open the file as f
    text = f.read().split('\n')[1] # get the contents of the file then cut just the second line from it
    a = text.split(',')[5] # spliting the string by ',' to an array then getting the 6th element

Then opening the other one:

with open("c:/temp/newfile.txt") as f:
    text = f.read().split('\n')[1]
    b = text.split(',')[5]

more on reading files here

Comparing the lines-

if a == b:
    print("The date is the same!")
else:
    print("The date is different...")

Of course you can make this into a function and make it return whether or not they're equal then use the value to determine the future of the program.

Hope this helps!