Working in git/Python...
Background:
I have a python script that is uncompressing content, that must be versioned, reading that content and the re-compressing that content to save space. Every day when my 'over-night' script runs it shows that all of the the ".tar.gz" file has changed even though I have checked the contents and found in many of the .tar.gz files nothing has changed.
~~ Edit Start ~~
I have checked the MD5sum of the each file along the way, each of the contents of the "tar", the md5 of the 'tar', and the md5 of the 'gz' and found, it seems that the md5's of each layer are a perfect match, the only thing changing may be the meta data on the file itself, things like date/timestamp
~~ Edit End ~~
Question:
I'm looking for some help or information concerning maybe the fact that I should only GZ the files, that 'tar' is adding a hash that is changing even though the content inside the 'tar' is the same.
-- OR --
How to make git ignore the hash of the .tar.gz without adding it to the .gitignore file, since the content in these files may change and would then need to be updated in the git repo.
CodePudding user response:
You can upload then on a specific folder, and ignore all the contents of the given folder.
like:
.gitignore
/files/*
CodePudding user response:
After much consideration I have decided to do a comparison on the old and new file to ensure that the new file is only used once it's confirmed that the files are in fact different. I'm working to test this more fully but think that this may "fix" the issue.
def cleanPath(path):
path = path.replace('/','\\')
return path
def untar(tar_path):
print(f'Trying to unzip... {tar_path}')
try:
tar_path = cleanPath(tar_path)
# path_date = tar_path.split("\\")[-1].replace(".tar.gz","")
path_date = tar_path.replace('.tar.gz','')
print(path_date)
# open file
file = tarfile.open(tar_path)
# extracting file
file.extractall(path_date)
file.close()
except Exception as ex:
print("Unzip Failure")
print(ex)
return False
return True
def tar(tar_path):
print(f'Trying to zip...{tar_path}')
# TAR the file
try:
print("tar_path")
print(tar_path)
old_file_path = f"{tar_path}_new.tar.gz"
tar_file = f"{tar_path}_new.tar.gz"
with tarfile.open(tar_file,'w:gz') as tar_handle:
for r,d,f in os.walk(tar_path):
for gz_file in f:
# tar_handle.add(file)
tar_handle.add(os.path.join(r,gz_file),gz_file)
try:
shutil.rmtree(tar_path)
except Exception as ex:
print(ex)
except Exception as ex:
print("Zip Failure")
print(ex)
return False
# Check the MD5 of each side, leave the old file if possible
try:
old_hash = hashlib.md5(open(old_file_path,'rb').read()).hexdigest()
except Exception as ex:
print(ex)
try:
new_hash = hashlib.md5(open(tar_file,'rb').read()).hexdigest()
except Exception as ex:
print(ex)
if new_hash == old_hash:
# leave the old file
os.remove(tar_file)
else:
# remove the old file
try:
os.remove(old_file_path)
except Exception as ex:
print(ex)
# rename the new file to the old file name
os.rename(tar_file,old_file_path)
return True