First off, you don't need to know Arabic to answer this question, just know Arabic is written from right to left and numbers in Arabic are written from left to right itself.
I am trying to translate an English item into Arabic and print it. for example: "paper roll 2.50m x 3.36m VIP" into Arabic is "VIP لفة ورق 2.50 م × 3.36 م"
I use regex to see if there are any uncovered words (English words and numbers) not to reverse it.
english = re.compile("^[A-Za-z0-9_.] $")
item_name = "paper roll 2.50m x 3.36m VIP"
''.join(s if english.match(s) else s[::-1] for s in reversed(re.split('(\w )', arabic_reshaper.reshape(GoogleTranslator(source='en', target='ar').translate(item_name)))))
The issue here is there the regex considers the words as "50", "." and "2" for "2.50" then makes it as "50.2" so the output becomes "VIP لفة ورق 50.2 م × 36.3 م" which is incorrect.
Is there any possibility that I can check if the word is a decimal number and not reverse it using regex?
CodePudding user response:
I haven't attempted the Arabic translation part, but you're doing that fine so I guess it's not needed for the solution. Instead, I've just reversed the non-number part of the strings.
That being said, does this do what you need?
matchStringNum = re.compile("[A-Za-z\s*] (?=[0-9])?|[\d\.]*")
item_name = "paper roll 2.50m x 3.36m VIP"
reversedString = ''
for string in matchStringNum.findall(item_name)[::-1]:
try:
float(string)
except ValueError:
reversedString = reversedString string[::-1]
else:
reversedString = reversedString string
print(reversedString)
Output:
PIV m3.36 x m2.50 llor repap
CodePudding user response:
You can solve the problem using re
library.
import re
text = 'VIP لفة ورق 50.2 م × 36.3 م'
# Matches all values that contain a number followed by a dot and then another number
reversed_numbers = re.findall('\d \.\d ', text)
for value in reversed_numbers:
# reverse the reversed values
tt = value[value.index('.') 1:] '.' value[:value.index('.')]
# replace reversed values
text = text.replace(value, tt)
# final result :)
print(text) # ==> VIP لفة ورق 2.50 م × 3.36 م
CodePudding user response:
You can build an iterator which returns a sequence of 2-tuples consisting of a substring with an accompanying boolean that says whether they are numbers or not:
import re
def get_parts(s):
"""
Iterator which yields sequence of tuples
(is_number, substring)
"""
start = 0
for m in re.finditer(r'\d \.?\d*|\d*\.?\d ', s):
yield from _get_non_digit_parts(s[start:m.start()])
yield (True, m.group())
start = m.end()
yield from _get_non_digit_parts(s[start:])
def _get_non_digit_parts(s):
"""
helper function - splits up a part which is known not to contain
numbers
"""
for part in re.split(r'(\w )', s):
if part:
yield (False, part)
With this example calling code:
s = "paper roll 2.50m x 3.36m VIP"
for is_number, part in get_parts(s):
print(f'{is_number} "{part}"')
you will get:
False "paper"
False " "
False "roll"
False " "
True "2.50"
False "m"
False " "
False "x"
False " "
True "3.36"
False "m"
False " "
False "VIP"
Then you can process these in whatever way you want.
CodePudding user response:
I don't have google translate installed, but you might try:
re.findall(r'(\d \.\d )|(\w )', item_name)
instead of there.split
you are using. That will produce a list of tuples like[('', 'paper'), ('', 'roll'), ('2.50', ''), ('', 'm'), ('', 'x'), ('3.36', ''), ('', 'm'), ('', 'VIP')]
Now use that list of tuples in your conditional expression such:
t[0] if t[0] else if english.match(t[1]) else t[1][::-1] for t in...