First I have video files that record from webcam camera. It will got many file of videos but I want to delete duplicate file base on modification time, limited by minutes.
For example, I have 3 video files as below. base on (hour : minute : second)
- Ek001.AVI - time modification of file is 08:30:15
- Ek002.AVI - time modification of file is 08:30:40
- Ek003.AVI - time modification of file is 08:32:55
I want to get remains output.
- Ek001.AVI - time modification of file is 08:30:15 (first file created remaining)
- Ek003.AVI
Now I have code for find modification time as below.
import os
import datetime
import glob
from datetime import datetime
for file in glob.glob('C:\\Users\\xxx\\*.AVI'):
time_mod = os.path.getmtime(file)
print (datetime.fromtimestamp(time_mod).strftime('%Y-%m-%d %H:%M:%S'),'-->',file)
Please supporting me to adapt my code for delete duplicate file based on modified time, limited by minutes.
CodePudding user response:
Here is my suggested solution. See the comments in the code itself for an detailed explanation, but the basic idea is that you build up a nested dictionary of lists of 2-element tuples, where the keys of the dictionary are the number of minutes since the start of Unix time, and the 2-tuples contain the filename and the remaining seconds. You then loop over the values of the dictionary (lists of tuples for files created within the same calendar minute), sort these by the seconds, and delete all except the first.
The use of a defaultdict
here is just a convenience to avoid the need to explicitly add new lists to the dictionary when looping over files, because these will be added automatically when needed.
import os
import glob
from collections import defaultdict
files_by_minute = defaultdict(list)
# group together all the files according to the number of minutes since the
# start of Unix time, storing the filename and the number of remaining seconds
for filename in glob.glob("C:\\Users\\xxx\\*.AVI"):
time_mod = os.path.getmtime(filename)
mins = time_mod // 60
secs = time_mod % 60
files_by_minute[mins].append((filename, secs))
# go through each of these lists of files, removing the newer ones if
# there is more than one
for fileset in files_by_minute.values():
if len(fileset) > 1:
# sort tuples by second element (i.e. the seconds)
fileset.sort(key=lambda t:t[1])
# remove all except the first
for file_info in fileset[1:]:
filename = file_info[0]
print(f"removing {filename}")
os.remove(filename)
CodePudding user response:
I think you can solve this by using a set. Convert the Unix time (mtime) to integer minutes, then iterate a sorted (ascending order) sequence. If a number is in the set, you already have a file for that minute (delete the file). If not, add the number to the set. Here's how this can look in principle:
ts = [83015, 83145, 83045, 83115]
s = set()
for t in sorted(ts):
# to minute; note that it would be //60 if using Unix time (seconds)
mins = t//100
if mins in s:
print(f"delete {t}")
else:
s.add(mins)
# delete 83045
# delete 83145
In practice, that could look like
from datetime import datetime
from pathlib import Path
src = Path('...') # insert your path
files = sorted(src.glob('...'), key=lambda p: p.stat().st_mtime) # use your search pattern
s = set()
for f in files:
mins = int(f.stat().st_mtime)//60
if mins in s:
print(f"delete {f}")
print(datetime.fromtimestamp(f.stat().st_mtime).strftime('%Y-%m-%d %H:%M:%S'))
else:
print(f"keep {f}")
print(datetime.fromtimestamp(f.stat().st_mtime).strftime('%Y-%m-%d %H:%M:%S'))
s.add(mins)
CodePudding user response:
I gave this a try. As I understood it you want to save the latest file only. Why would you have to specify the minutes? It is enough to count time since last change in seconds.
My code has a lot of comments that hopefully clarifies my logic. But roughly:
- Find all files and calculate time since last save
- Add filename and time to a dict
- find min value in dict (= lastest file) and delete all other files
Hope this helps
import os
import time
fileDir = '/path/to/files'
time_dict = {}
# loop through files in dir
for file in sorted(os.listdir(fileDir)):
# find time since last save
time_since_change = int(time.time() - os.path.getmtime(file))
# if-statement in case you have your files in the sam dir as your code
if '.py' not in file:
# save filename & time since last save into dict
time_dict[file] = time_since_change
# prints dict just to check that I later will delete the correct file
print(time_dict)
# loop through dict, might not be necessary
for k,v in list(time_dict.items()):
# if value not min since last save == if file lastest saved
if v != min(time_dict.values()):
print("remove file: ", k, "\t", v)
# to uncomment when you actually want to test the deletion:
# os.remove(k)
# del time_dict[k]