I want to write a REGEX that removes all combination of a group of characters from the end of strings. For instance removes "k", "t", "a", "u" and all of their combinations from the end of the string:
Input:
["Rajakatu","Lapinlahdenktau","Nurmenkaut","Linnakoskenkuat"]
Output:
["Raja","Lapinlahden","Nurmen","Linnakosken"]
CodePudding user response:
How about something like this [ktau]{4}\b
?
https://regex101.com/r/BVwTcs/1
This will match at the end of a word for those character combinations.
For example, k, followed by u, followed by a, followed by t.
This can also match aaaa
so take that into account.
It will match any 4 combinations of the characters at the end of the word.
CodePudding user response:
the below is my approach to it. Please try it:
from itertools import permutations
mystr = [["Rajakatu","Lapinlahdenktau","Nurmenkaut","Linnakoskenkuat"]]
#to get the last four letters of whose permutations you need
x = mystr[0][0]
exclude= x[-4:]
#get the permutations
perms = [''.join(p) for p in permutations(exclude)]
perms
#remove the last for letters of the string if it lies in the perms
for i in range(4):
curr = mystr[0][i]
last4 = curr[-4:]
if(last4 in perms):
mystr[0][i]=curr[:-4]
print(mystr)
OUTPUT: [['Raja', 'Lapinlahden', 'Nurmen', 'Linnakosken']]