I am new to regex. I have read various tutorials, still I have failed to run my simple codes.
My files are organized such as "c1c2c4_aa_1"
, "c1c2c3_aa_2"
, "c1c2c8_aa_3"
, "c1c3c4_aa_4"
, ... "c1c2c4_bb_41"
, "c1c8c9_cc_58"
, "c1c3c11_aa_19"
I want to find all those ones that includes "aa"
(such as "c1c2c3_aa_3"
) and convert them to "c1c2c4_zz_3"
So I want the last number and the first string before "_" remains fixed, but change the "aa" in the middle.
"c1", "c2", "c3" are some conditions. Also, the last numbers are quite random, so I do not know them to define them.
I am interested in using regex.
I tried this:
con_list1 = ["c1", "c2", ... "c8"]
con_list2 = ["c1", "c2", ... "c11"]
con_list3 = ["c1", "c2", ... "c10"]
for con1 in con_list1:
for con2 in con_list2:
for con3 in con_list3:
if(os.path.exists("./" con1 con2 con3 "_aa(.*)")):
os.rename("./" con1 con2 con3 "_aa(.*)", "./" con1 con2 con3 "_zz(.*)")
I want the last number corresponding to the file that I rename remains fixed:
"c1c2c3_aa_3" -> "c1c2c3_zz_3" "c1c2c3_aa_13" -> "c1c2c3_zz_13"
I am also interested in using regex and (.*) in the right way.
However, the above code seems not working.
I appreciate to help to implement this code.
CodePudding user response:
If you have a list like con_list1 = ["c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4"]
you may try something like:
import re
con_list1 = ["c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4"]
regex = r"_aa_"
for test_str in con_list1:
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
result = match.groups()
if result:
test_str[match.start():] '_zz_' test_str[:match.end()]
but the most simple way is:
con_list1 = ["c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4"]
for test_str in con_list1:
test_str .replace('_aa_', '_zz_')
CodePudding user response:
Try this to find all names: "[a-z0-9] _aa_[0-9] "
names = re.findall(r'\"[a-z0-9] \_aa\_[0-9] \"', files_names_list.text, flags=re.I))
files_names_list is a list, where you have all your file names
Hope I understand you correctly
CodePudding user response:
Assuming the files to rename exist in the current directory, would you please try the following:
import os, re
for f in os.listdir('.'):
m = re.match(r'((?:c\d{1,2}){3})_aa_(\d{1,2})$', f)
if m:
newname = m.group(1) '_zz_' m.group(2)
os.rename(f, newname)
((?:c\d{1,2}){3})
matches three repetitions of the set ofc
one or two digits.(\d{1,2})
matches one or two digits.- As the regexes above are enclosed by parentheses, the matched substrings
are captured by
m.group(1)
andm.group(2)
individually.
CodePudding user response:
You can use
import os, re
con_list1 = ["c1", "c2", "c3","c4","c5","c6","c7","c8"]
con_list2 = ["c1", "c2", "c3","c4","c5","c6","c7","c8", "c9","c10", "c11"]
con_list3 = ["c1", "c2", "c3","c4","c5","c6","c7","c8", "c9","c10"]
regex = re.compile(f'^((?:{"|".join(map(re.escape, con_list1))})(?:{"|".join(map(re.escape, con_list2))})(?:{"|".join(map(re.escape, con_list3))}))_aa_')
rootdir = "YOUR_ROOT_DIR"
for root, dirs, files in os.walk(rootdir):
for file in files:
if regex.search(file):
os.rename(file, regex.sub(r'\g<1>_zz_', file))
Note: os.walk()
searches in all subdirs recursively, if you do not need that behavior, see Non-recursive os.walk().
This is not the most efficient way to create a dynamic pattern (a regex TRIE would be better), but it shows a viable approach. The regex will look like
^((?:c1|c2|c3|c4|c5|c6|c7|c8)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10|c11)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10))_aa_
See the regex demo. Note that each item in your condition lists is re.escape
d to make sure special chars do not prevent your file names from matching.
Details:
^
- start of string((?:c1|c2|c3|c4|c5|c6|c7|c8)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10|c11)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10))
- Group 1 (\g<1>
refers to this group value, if_zz_
is not a placeholder for text starting with a digit, you can even use\1
instead): a value fromcon_list1
, then a value fromcon_list2
and then a value fromcon_list3
_aa_
- an_aa_
fixed string.