Home > Back-end >  Python: Get single words from a string
Python: Get single words from a string

Time:03-28

i'm trying to make a string analyzer in python. I'm starting with this input as example:

toAnalyze= "Hello!!gyus-- lol\n" and as output i want something like that:

>Output: ['Hello', '!!', 'guys', '--', ' ', 'lol'] I want every gropus sorted in the original order

I have thought to scan all chars in the original string until the "\n" character and i came up whith this solution:

toAnalyze= "Hello!!gyus-- lol\n"
final = ""
for char in toAnalyze:
    if char != " \n\t" and char != " " and char != "\n" and char != "\n\t":
            final  = char
    elif char == " " or char == "\n" or char == "\n\t" or char == " \n\t":
        if not final.isalnum():
            word= ""
            thing = ""
            for l in final:
                if l.isalnum():
                    word  = l
                 else:
                    thing  = l
            print("word: "   word)
            print("thing: "   thing )

And my current output is:

>Output: thing: !!-- word: Hellogyus lol

Do you have and idea? The output wanted :

>Output: ['Hello', '!!', 'guys', '--', ' ', 'lol']

Thanks in advance and have a nice day

CodePudding user response:

I'm not a python guy, but want to help you to get started. This is the working solution which you can try to improve so that it becomes more pythonist:

toAnalyze= 'Hello!!gyus-- lol\n'

word = ''
separator = ''
tokens = []

for ch in toAnalyze:
    if ch.isalnum():
        word  = ch
    
    # we met the first character of a separator, so save a word
    if not ch.isalnum() and word:
        tokens.append(word)
        word = ''
        
    # 1. we met the first alphanumeric after a separator, so save the separator or
    # 2. we met a new separator right after another one, also save the old separator
    if ch.isalnum() and separator or separator and separator[-1] != ch:
        tokens.append(separator)
        separator = ''
       
    if not ch.isalnum():
        separator  = ch

The output for your example is:

['Hello', '!!', 'gyus', '--', ' ', 'lol']
  • Related