how to convert words to integer in python-CodePudding

I need to convert a string to integer the word in range of [zero to ten]

Example Input 1:

a=two3four

Needed Output:

Example 2:

b=fivesixseven

Needed Output:

My code:

def w2n (number):
  words = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"]
  return "".join(str(words.index(number[i])) for i in range(0,len(number)))

print(w2n("onetwoseven"))

I am getting error in this code

Traceback (most recent call last):
  File "HelloWorld.py", line 5, in <module>
    print(w2n("onetwoseven"))    
  File "HelloWorld.py", line 3, in w2n
    return "".join(str(words.index(number[i])) for i in range(0,len(number)))
  File "HelloWorld.py", line 3, in <genexpr>
    return "".join(str(words.index(number[i])) for i in range(0,len(number)))
ValueError: 'o' is not in list

Please guide me why this error occurs and how to get output for my two example input(given).

CodePudding user response：

You are iterating over the number. It is possible to iterate over a string in python, then it will go over each element. This happens in this part of your code: for i in range(0,len(number)). You have to change your input to print(w2n(["onetwoseven"])) (in a list).

Or you can change your code to take only one word at a time and get rid of the for loop.

Just a tip, if you need the elements in an iterator (for example a list) and you are not using the index you can create a for loop without the range:

return "".join(str(words.index(i) for i in number))

CodePudding user response：

You can use regular expressions to extract digits and their names, then replace the extracted names with their digital representations, and combine those into a string. Bear in mind that "ten" is not a digit and should not be on the list.

import re

# A dictionary of names of digits
digits = {'zero': '0', 'one': '1', 'two': '2', 'three': '3', 'four': '4', 
          'five': '5', 'six': '6', 'seven': '7', 'eight': '8', 'nine': '9'}
digits.update({str(i): str(i) for i in range(10)})

# The regular expression for searching the names
numbers = re.compile("|".join(digits))

"".join(digits[w] for w in numbers.findall("two3four"))
#'234'
"".join(digits[w] for w in numbers.findall("fivesixseven"))
#'567'

CodePudding user response：

The regex answer is very clever, but if you want a non-regex solution, you could use the length of the numbers-as-words to get their start & end index in the original input:

def w2n(number):
    words = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"]

    for word in words:
        if word in number:
            start = number.index(word)
            end = start   len(word)
            print(f'Word "{word}" starts at {start} and ends at {end}')
            extracted_word = number[start:end]
            number = number.replace(word, str(words.index(extracted_word)) )

    return number

print( w2n('two3four') )

print( w2n('fivesixseven') )

By overwriting the original input with each word's numerical representation, we ignore any digits in the input and leave them in place as you can see with the "two3four" example.

This would output:

Word "two" starts at 0 and ends at 3
Word "four" starts at 2 and ends at 6
234

Word "five" starts at 0 and ends at 4
Word "six" starts at 1 and ends at 4
Word "seven" starts at 2 and ends at 7
567