I'm currently working on neural text to speech, and to process the data I need several steps. One step is convert the numeric in string into english character words instead of numeral. The closest thing I can found is num2words
, but I'm not sure how to apply it to an existing string. Here's my use case :
I have list of string like this
list_string = ['I spent 140 dollar yesterday','I have 3 brothers and 2 sisters']
I wanted to convert into :
output_string = ['I spent one hundred forty dollar yesterday','I have three brothers and two sisters']
The struggle is one text might consist of several number, and even if I can get the numeric using re.match
, I'm not sure how to put the number back to the string.
No need to worry about floating number or year for now since I don't have that kind of number inside my string.
Thanks
CodePudding user response:
There is a very quick way to do it in one line using regex to match digits and replace them in string:
from num2words import num2words
import re
list_string = ['I spent 140 dollar yesterday','I have 3 brothers and 2 sisters']
output_string = [re.sub('(\d )', lambda m: num2words(m.group()), sentence) for sentence in list_string]
Otherwise, you can iterate through the words contained in each sentence and replace them in case they are numbers. Please see the code below:
from num2words import num2words
list_string = ['I spent 140 dollar yesterday','I have 3 brothers and 2 sisters']
output_string = []
for sentence in list_string:
output_sentence = []
for word in sentence.split():
if word.isdigit():
output_sentence.append(num2words(word))
else:
output_sentence.append(word)
output_string.append(' '.join(output_sentence))
print(output_string)
# Output
# ['I spent one hundred and forty dollar yesterday', 'I have three brothers and two sisters']
CodePudding user response:
For each sentence in your list_string, find any numbers, and replace them using num2word:
from num2words import num2words
list_string = ['I spent 140 dollar yesterday', 'I have 3 brothers and 2 sisters']
output_list = []
for sentence in list_string:
numbers = [s for s in sentence.split() if s.isdigit()]
for number in numbers:
sentence = sentence.replace(number, num2words(number))
output_list.append(sentence)
print(output_list)
CodePudding user response:
as long as the position of those strings don't change, this could work. integercoords[0] is start index, integercoords[1] is end index, integercoords[-1 or 2] is list_string's index number for the specific sentence.
list_string = ['I spent 140 dollar yesterday', 'I have 3 brothers and 2 sisters']
integercoords = []
sentence_index = -1
for sentence in list_string:
sentence_index = 1
listoindex = []
numbersignal = False
for indexthing in sentence:
if indexthing == '0' or indexthing == '1' or indexthing == '2' or indexthing == '3'\
or indexthing == '4' or indexthing == '5' or indexthing == '6' or indexthing == '7'\
or indexthing == '8' or indexthing == '9':
listoindex.append(sentence.index(indexthing))
numbersignal = True
else:
numbersignal = False
if listoindex:
if numbersignal:
pass
else:
integercoords.append([listoindex[0], listoindex[-1], sentence_index])
listoindex.clear()
print(integercoords)
for data in integercoords:
print(list_string[data[-1]][data[0]:data[1] 1])
CodePudding user response:
I have tried it doing it in bash, the script looks like this. file name be conver2words.sh , invoke this from your python script.
digits=(
"" one two three four five six seven eight nine
ten eleven twelve thirteen fourteen fifteen sixteen seventeen eightteen nineteen
)
tens=("" "" twenty thirty forty fifty sixty seventy eighty ninety)
units=("" thousand million billion trillion)
number2words() {
local -i number=$((10#$1))
local -i u=0
local words=()
local group
while ((number > 0)); do
group=$(hundreds2words $((number % 1000)) )
[[ -n "$group" ]] && group="$group ${units[u]}"
words=("$group" "${words[@]}")
((u ))
((number = number / 1000))
done
echo "${words[*]}"
}
hundreds2words() {
local -i num=$((10#$1))
if ((num < 20)); then
echo "${digits[num]}"
elif ((num < 100)); then
echo "${tens[num / 10]} ${digits[num % 10]}"
else
echo "${digits[num / 100]} hundred $("$FUNCNAME" $((num % 100)) )"
fi
}
with_commas() {
# sed -r ':a;s/^([0-9] )([0-9]{3})/\1,\2/;ta' <<<"$1"
# or, with just bash
while [[ $1 =~ ^([0-9] )([0-9]{3})(.*) ]]; do
set -- "${BASH_REMATCH[1]},${BASH_REMATCH[2]}${BASH_REMATCH[3]}"
done
echo "$1"
}
for arg; do
[[ $arg == *[^0-9]* ]] && result="NaN" || result=$(number2words "$arg")
printf "%s\t%s\n" "$(with_commas "$arg")" "$result"
done
In action:
$ bash ./num2text.sh 9 98 987
9 nine
98 ninety eight
987 nine hundred eighty seven
you can check if the string has the number and call this script to get the words of it.
Adding the python code for this, this is a draft you will get an idea
list_string = ['I spent 140 dollar yesterday','I have 3 brothers and 2 sisters']
for str in list_string:
str_split = str.split(" ");
for word in str_split:
if word.isnumeric():
// now you know this is numberic and call the bash script from here and read the output you can also use os.system if it works instead of sub process
out = subprocess.call(['bash','conver2words.sh',word])
line = out.stdout.readline()
print(line);
else:
print(word);