I have lot of xml files which are named like:
First_ExampleXML_Only_This_Should_Be_Name_20211234567.xml
Second_ExampleXML_OnlyThisShouldBeName_202156789.xml
Third_ExampleXML_Only_This_Should_Be_Name1_2021445678.xml
Fourth_ExampleXML_Only_This_Should_Be_Name2_20214567.xml
I have to make a script that will go through all of the files and rename them, so only this is left from the example:
Only_This_Should_Be_Name.xml
OnlyThisShouldBeName.xml
Only_This_Should_Be_Name1xml
Only_This_Should_Be_Name2.xml
At the moment I have something like this but really struggling to get exactly what I need, guess that have to count from second _ up to _202, and take everything in between.
fnames = listdir('.')
for fname in fnames:
# replace .xml with type of file you want this to have impact on
if fname.endswith('.xml):
Anyone has idea what would be the best approach to do it?
CodePudding user response:
There are two problems here:
Finding files of one kind in the directory
Whilst listdir will work, you might as well glob them:
from pathlib import Path
for fn in Path("/path").glob("*.xml"):
....
Renaming files
In this case your files are named "file_name_NUMBERS.xml" and we want to strip the numbers out, so we'll use a regex:
import re
from pathlib import Path
for fn in Path("dir").glob("*.xml"):
new_name = re.search(r"(.*?)_[0-9] ", fn.stem).group(1)
fn.rename(fn.with_name(new_name ".xml"))
Note that you can do all this without pathlib, but you asked for the best way ;)
Lastly, to answer an implicit question, nothing stops you wrapping all this in a function and passing an argument to glob for different types of files.
CodePudding user response:
You can strip the contents by splitting with underscores for all xml files and rename with the first value in the list as below.
import os
fnames = os.listdir('.')
for fname in fnames:
# replace .xml with type of file you want this to have impact on
if fname.endswith('.xml'):
newName = fname.split("_")
os.rename(fname, newName[1] ".xml")
else:
continue
here you are eliminating the values which are before and after "_".