Home > Back-end >  Convert number inside a string to english words
Convert number inside a string to english words

Time:10-27

I'm currently working on neural text to speech, and to process the data I need several steps. One step is convert the numeric in string into english character words instead of numeral. The closest thing I can found is num2words, but I'm not sure how to apply it to an existing string. Here's my use case :

I have list of string like this

list_string = ['I spent 140 dollar yesterday','I have 3 brothers and 2 sisters']

I wanted to convert into :

output_string = ['I spent one hundred forty dollar yesterday','I have three brothers and two sisters']

The struggle is one text might consist of several number, and even if I can get the numeric using re.match, I'm not sure how to put the number back to the string.

No need to worry about floating number or year for now since I don't have that kind of number inside my string.

Thanks

CodePudding user response:

There is a very quick way to do it in one line using regex to match digits and replace them in string:

from num2words import num2words
import re

list_string = ['I spent 140 dollar yesterday','I have 3 brothers and 2 sisters']
output_string = [re.sub('(\d )', lambda m: num2words(m.group()), sentence) for sentence in list_string]

Otherwise, you can iterate through the words contained in each sentence and replace them in case they are numbers. Please see the code below:

from num2words import num2words

list_string = ['I spent 140 dollar yesterday','I have 3 brothers and 2 sisters']
output_string = []

for sentence in list_string:
    output_sentence = []
    for word in sentence.split():
        if word.isdigit():
            output_sentence.append(num2words(word))
        else:
            output_sentence.append(word)
    output_string.append(' '.join(output_sentence))

print(output_string)

# Output
# ['I spent one hundred and forty dollar yesterday', 'I have three brothers and two sisters']

CodePudding user response:

For each sentence in your list_string, find any numbers, and replace them using num2word:

from num2words import num2words

list_string = ['I spent 140 dollar yesterday', 'I have 3 brothers and 2 sisters']
output_list = []

for sentence in list_string:
    numbers = [s for s in sentence.split() if s.isdigit()]
    for number in numbers:
        sentence = sentence.replace(number, num2words(number))
    output_list.append(sentence)

print(output_list)

CodePudding user response:

as long as the position of those strings don't change, this could work. integercoords[0] is start index, integercoords[1] is end index, integercoords[-1 or 2] is list_string's index number for the specific sentence.

list_string = ['I spent 140 dollar yesterday', 'I have 3 brothers and 2 sisters']

    integercoords = []
    sentence_index = -1
    for sentence in list_string:
        sentence_index  = 1
        listoindex = []
        numbersignal = False
        for indexthing in sentence:
            if indexthing == '0' or indexthing == '1' or indexthing == '2' or indexthing == '3'\
                    or indexthing == '4' or indexthing == '5' or indexthing == '6' or indexthing == '7'\
                    or indexthing == '8' or indexthing == '9':
                listoindex.append(sentence.index(indexthing))
                numbersignal = True
            else:
                numbersignal = False

            if listoindex:
                if numbersignal:
                    pass
                else:
                    integercoords.append([listoindex[0], listoindex[-1], sentence_index])
                    listoindex.clear()
    print(integercoords)
    for data in integercoords:
        print(list_string[data[-1]][data[0]:data[1] 1])

CodePudding user response:

I have tried it doing it in bash, the script looks like this. file name be conver2words.sh , invoke this from your python script.


digits=(
    "" one two three four five six seven eight nine
    ten eleven twelve thirteen fourteen fifteen sixteen seventeen eightteen nineteen
)
tens=("" "" twenty thirty forty fifty sixty seventy eighty ninety)
units=("" thousand million billion trillion)

number2words() {
    local -i number=$((10#$1))
    local -i u=0
    local words=()
    local group

    while ((number > 0)); do
        group=$(hundreds2words $((number % 1000)) )
        [[ -n "$group" ]] && group="$group ${units[u]}"

        words=("$group" "${words[@]}")

        ((u  ))
        ((number = number / 1000))
    done
    echo "${words[*]}"
}

hundreds2words() {
    local -i num=$((10#$1))
    if ((num < 20)); then
        echo "${digits[num]}"
    elif ((num < 100)); then
        echo "${tens[num / 10]} ${digits[num % 10]}"
    else
        echo "${digits[num / 100]} hundred $("$FUNCNAME" $((num % 100)) )"
    fi
}

with_commas() {
    # sed -r ':a;s/^([0-9] )([0-9]{3})/\1,\2/;ta' <<<"$1"
    # or, with just bash
    while [[ $1 =~ ^([0-9] )([0-9]{3})(.*) ]]; do
        set -- "${BASH_REMATCH[1]},${BASH_REMATCH[2]}${BASH_REMATCH[3]}"
    done
    echo "$1"
}

for arg; do
    [[ $arg == *[^0-9]* ]] && result="NaN" || result=$(number2words "$arg")
    printf "%s\t%s\n" "$(with_commas "$arg")" "$result"
done

In action:

$ bash ./num2text.sh 9 98 987 
9       nine
98      ninety eight
987     nine hundred eighty seven

you can check if the string has the number and call this script to get the words of it.

Adding the python code for this, this is a draft you will get an idea

list_string = ['I spent 140 dollar yesterday','I have 3 brothers and 2 sisters']

for str in list_string:
    str_split = str.split(" ");
    for word in str_split: 
        if word.isnumeric():
           // now you know this is numberic and call the bash script from here and read the output you can also use os.system if it works instead of sub process
           out = subprocess.call(['bash','conver2words.sh',word])
           line = out.stdout.readline()
           print(line);
            
        else:
            print(word);
  • Related