Home > Enterprise >  Python move file to folder based on both partial of file name and value of another partial of file n
Python move file to folder based on both partial of file name and value of another partial of file n

Time:02-01

There are 3 files in folder:

my_floder:

Review Report - 2020-3.20230110151743889.xlsx

Review Report - 2020-3.20230110151753535.xlsx

Review Report - 2019-4.20230110151744423.xlsx

Each of the file name has 3 parts,take the first file as an example:

First Part:"Review Report -"
Second Part:"2020-3"
Third Part:".20230110151743889"

The logic is: if some files have the same second part file names, then only choose the one who has the larger third part value and move it to another folder .The third part of the file name is a time stamp,yyyymmddhhmm...

For example the second part of the first 2 files are the same, but since the second file Review Report - 2020-3.20230110151753535.xlsx has a large 3rd part of the file name '20230110151753535',so only the second and third files will be copy to another file, the first file will be skipped.

Some helpful script:

parts_list=os.path.basename(filename).split(".")
    
output is:
['Review Report - 2020-3', '20230110151743889', 'xlsx']

second_part = parts_list[0].split(" - ")[1]
output is:
'2020-3'

thrid_part=parts_list[1]
output is:
20230110151743889

The best that I can do:

    unique = []
    for filename in glob.glob(my_floder):
        parts_list=os.path.basename(filename).split(".")
        second_part = parts_list[0].split(" - ")[1]
        thrid_part=parts_list[1]
        if second_part not in unique:
           unique.append(second_part)
        else:
            # here need to compare the value of the third part ,and move the file with larger third part to another folder but I have no idea how to do that

any friend can help ?

CodePudding user response:

I would use a dictionary:

unique = {}
    for filename in glob.glob(my_floder):
        parts_list=os.path.basename(filename).split(".")
        second_part = parts_list[0].split(" - ")[1]
        thrid_part=parts_list[1]
        if second_part not in unique:
           unique.update({second_part: third_part})
        elif float(unique.get(second_part)) < float(third_part):
           unique.update({second_part: third_part})

Since the third_part contains a dot, it's easiest to treat it as a float. It cuts off the last digit, but I hope we can assume different versions aren't saved seconds apart.

There is probably a better way by merging the if and elif but there could be some complications if the first condition is not met; I will leave that up to you.

Not the neatest or best solution, but it should work.

CodePudding user response:

You can create dict with collections.defaultdict and in each iteration keep max datetime for each second part.

from datetime import datetime
from collections import defaultdict
import shutil

res = defaultdict(lambda:0)
for file in glob.glob(my_floder):
    fs, t = file.split('.', 1)
    f, s = fs.split(' - ')
    tmp = datetime.strptime(t, '%Y%m%d%H%M%S%f.xlsx')
    if (not res[s]) or tmp > datetime.strptime(res[s], '%Y%m%d%H%M%S%f.xlsx'):
        res[s] = t

# You can find max datetime in filename like below and move to folder
for k,v in res.items():
    print(f"Review Report - {k}.{v}")
    file_name = f"Review Report - {k}.{v}"
    
    shutil.move(file_name, 
                f'new_path/{file_name}')

Output:

Review Report - 2020-3.20230110151753535.xlsx
Review Report - 2019-4.20230110151744423.xlsx
  • Related