I'm new in python and I need to remove part of the file names in this vector.
I have been trying something like:
for x in documents:
x.replace("Sint", "")
But I'm not able to do it all at once.
I have this vector:
documents = ['SintEstatuto1009908_17032016.rtf.txt', 'SintEstatuto16545345_15042016.rtf.txt', 'Estatuto124452336145_02052016.rtf.txt', 'SintEstatuto1645649_04042014.rtf.txt', 'MartEstatuto2592451_20072011.rtf.txt', 'Estatuto77845645858_29645615.rtf.txt', 'Estatuto149453456678_2547042016.rtf.txt', 'BrewEstatuto128634565661_14042014.rtf.txt', 'MartEstatuto11454536186_26022014.rtf.txt', 'MartEstatuto1635456456462_09042016.rtf.txt', 'SintEstatuto64565468987_22012015.rtf.txt', 'ColdEstatuto9645668602_18042016.rtf.txt', 'SintEstatuto1374534196_26032013.rtf.txt', 'SintEstatuto12964456455654040_22122008.rtf.txt', 'SintEstatuto1559914_27042016.rtf.txt', 'SintEstatuto145645152097_24042015.rtf.txt', 'MartEstatuto01064590027_21082015.rtf.txt', 'SintEstatuto1060307_04032016.rtf.txt', 'SintEstatuto8404454566046_18102014.rtf.txt', 'ColdEstatuto123545345921_30042013.rtf.txt', 'BrewEstatuto45656456791_07032015.rtf.txt', 'BrewEstatuto129754345353_29042011.rtf.txt', 'MartEstatuto1526456924_14062016.rtf.txt', 'MartEstatuto1524536924_03042014.rtf.txt', 'SintEstatuto80233287_20032016.rtf.txt', 'SintEstatuto1604998_23032015.rtf.txt', 'SintEstatuto4295435438890_22112013.rtf.txt', 'BrewEstatuto991778678639_24042014.rtf.txt', 'BrewEstatuto1330354387_1045343082011.rtf.txt']
And I want to remove this words:
names = ['Sint', 'Mart', 'Cold', 'Brew']
So I want this result:
documents = ['Estatuto1009908_17032016.rtf.txt', 'Estatuto16545345_15042016.rtf.txt', 'Estatuto124452336145_02052016.rtf.txt', 'Estatuto1645649_04042014.rtf.txt', 'Estatuto2592451_20072011.rtf.txt', 'Estatuto77845645858_29645615.rtf.txt', 'Estatuto149453456678_2547042016.rtf.txt', 'Estatuto128634565661_14042014.rtf.txt', 'Estatuto11454536186_26022014.rtf.txt', 'Estatuto1635456456462_09042016.rtf.txt', 'Estatuto64565468987_22012015.rtf.txt', 'Estatuto9645668602_18042016.rtf.txt', 'Estatuto1374534196_26032013.rtf.txt', 'Estatuto12964456455654040_22122008.rtf.txt', 'Estatuto1559914_27042016.rtf.txt', 'Estatuto145645152097_24042015.rtf.txt', 'Estatuto01064590027_21082015.rtf.txt', 'Estatuto1060307_04032016.rtf.txt', 'Estatuto8404454566046_18102014.rtf.txt', 'Estatuto123545345921_30042013.rtf.txt', 'Estatuto45656456791_07032015.rtf.txt', 'Estatuto129754345353_29042011.rtf.txt', 'Estatuto1526456924_14062016.rtf.txt', 'Estatuto1524536924_03042014.rtf.txt', 'Estatuto80233287_20032016.rtf.txt', 'Estatuto1604998_23032015.rtf.txt', 'Estatuto4295435438890_22112013.rtf.txt', 'Estatuto991778678639_24042014.rtf.txt', 'Estatuto1330354387_1045343082011.rtf.txt']
How can I do it?
CodePudding user response:
You could build a regex alternation of the keywords to remove, then use re.sub
:
names = ['Sint', 'Mart', 'Cold', 'Brew']
regex = r'^(?:' r'|'.join(names) r')'
documents = ['SintEstatuto1009908_17032016.rtf.txt', 'SintEstatuto16545345_15042016.rtf.txt', 'Estatuto124452336145_02052016.rtf.txt', 'SintEstatuto1645649_04042014.rtf.txt', 'MartEstatuto2592451_20072011.rtf.txt', 'Estatuto77845645858_29645615.rtf.txt', 'Estatuto149453456678_2547042016.rtf.txt', 'BrewEstatuto128634565661_14042014.rtf.txt', 'MartEstatuto11454536186_26022014.rtf.txt', 'MartEstatuto1635456456462_09042016.rtf.txt', 'SintEstatuto64565468987_22012015.rtf.txt', 'ColdEstatuto9645668602_18042016.rtf.txt', 'SintEstatuto1374534196_26032013.rtf.txt', 'SintEstatuto12964456455654040_22122008.rtf.txt', 'SintEstatuto1559914_27042016.rtf.txt', 'SintEstatuto145645152097_24042015.rtf.txt', 'MartEstatuto01064590027_21082015.rtf.txt', 'SintEstatuto1060307_04032016.rtf.txt', 'SintEstatuto8404454566046_18102014.rtf.txt', 'ColdEstatuto123545345921_30042013.rtf.txt', 'BrewEstatuto45656456791_07032015.rtf.txt', 'BrewEstatuto129754345353_29042011.rtf.txt', 'MartEstatuto1526456924_14062016.rtf.txt', 'MartEstatuto1524536924_03042014.rtf.txt', 'SintEstatuto80233287_20032016.rtf.txt', 'SintEstatuto1604998_23032015.rtf.txt', 'SintEstatuto4295435438890_22112013.rtf.txt', 'BrewEstatuto991778678639_24042014.rtf.txt', 'BrewEstatuto1330354387_1045343082011.rtf.txt']
output = [re.sub(regex, '', x) for x in documents]
print(output)
This prints:
['Estatuto1009908_17032016.rtf.txt', 'Estatuto16545345_15042016.rtf.txt',
'Estatuto124452336145_02052016.rtf.txt', ..., 'Estatuto1330354387_1045343082011.rtf.txt']
CodePudding user response:
One option is to use removeprefix
:
from functools import reduce
out = [reduce(lambda x, y: x.removeprefix(y), names, item) for item in documents]
The same code with an explicit loop:
out = []
for item in documents:
for name in names:
item = item.removeprefix(name)
out.append(item)