import re
line = "treinta y un" #example 1
line = "veinti un " #example 2
line = "un" #example 3
line = "un " #example 4
line = "uno" #example 5
line = "treinta yun" #example 6
line = "treinta y unghhg" #example 7
re_for_identificate_1 = "(?<!^)un"
re_for_identificate_2 = " un"
line = re.sub(re_for_identificate_2, " un ", line)
line = re.sub(re_for_identificate_1, "un ", line)
print(repr(line))
How to obtain this outputs from those inputs?
"treinta y un " #for example 1
"veinti un " #for example 2
"un " #for example 3
"un " #for example 4
"uno" #for example 5
"treinta yun" #for example 6
"treinta y unghhg" #for example 7
Note that for examples 4, 5, 6 and 7 the regex should not make any changes, since after the word there is already a space placed, or because in the case of "uno"
, the word "un"
is not at the end of the sentence, or in the case of "treinta yun"
the substring "un"
is not preceded by one or more spaces.
CodePudding user response:
If you want to use regex, you can use \bun$
, which checks that the last whole word in the string is un
, and that there is nothing after it in the string. If that is the case, a space is added to the end of the string:
import re
lines = ["treinta y un", "veinti un ", "un", "un ",
"uno", "treinta yun", "treinta y unghhg"]
result = [re.sub(r'\bun$', 'un ', line) for line in lines]
Output:
[
'treinta y un ',
'veinti un ',
'un ',
'un ',
'uno',
'treinta yun',
'treinta y unghhg'
]
CodePudding user response:
I'm not sure you need regular expressions. The following code appears to achieve what you want.
Three checks are performed:
- The content is a string
- The last two characters are "un"
- The last word is "un"
Here I've wrapped the logic in a list comprehension to demonstrate.
lines = ["treinta y un", "veinti un ", "un", "un ",
"uno", "treinta yun", "treinta y unghhg"]
result = [ line " " if (isinstance(line, str)
and (line[-2:] == "un")
and (line.split()[-1] == "un"))
else line
for line in lines ]
for line in result:
print(f"'{line}'")
Output:
'treinta y un '
'veinti un '
'un '
'un '
'uno'
'treinta yun'
'treinta y unghhg'
CodePudding user response:
If you declare the line =
like that in you code, you will overwrite it each time.
Using (?<!^)un
asserts not the start of the string directly to the left.
If you also want to exclude a match for #un
you can use (?<\S)
instead asserting a whitspace boundary to the left.
To make sure the pattern is at the end of the string, you can use the anchor $
The code example uses single lines, but if you want to do the replacement when having multiple lines, you have to use the multiline flag re.MULTILINE
with re.sub.
Example
import re
pattern = r"(?<!\S)un$"
lines = ["treinta y un", "veinti un ", "un", "un ",
"uno", "treinta yun", "treinta y unghhg", "#un"]
print([re.sub(pattern, 'un ', line) for line in lines])
Output
[
'treinta y un ',
'veinti un ',
'un ',
'un ',
'uno',
'treinta yun',
'treinta y unghhg',
'#un'
]