i'm trying to make a string analyzer in python. I'm starting with this input as example:
toAnalyze= "Hello!!gyus-- lol\n"
and as output i want something like that:
>Output: ['Hello', '!!', 'guys', '--', ' ', 'lol']
I want every gropus sorted in the original order
I have thought to scan all chars in the original string until the "\n" character and i came up whith this solution:
toAnalyze= "Hello!!gyus-- lol\n"
final = ""
for char in toAnalyze:
if char != " \n\t" and char != " " and char != "\n" and char != "\n\t":
final = char
elif char == " " or char == "\n" or char == "\n\t" or char == " \n\t":
if not final.isalnum():
word= ""
thing = ""
for l in final:
if l.isalnum():
word = l
else:
thing = l
print("word: " word)
print("thing: " thing )
And my current output is:
>Output: thing: !!-- word: Hellogyus lol
Do you have and idea? The output wanted :
>Output: ['Hello', '!!', 'guys', '--', ' ', 'lol']
Thanks in advance and have a nice day
CodePudding user response:
I'm not a python guy, but want to help you to get started. This is the working solution which you can try to improve so that it becomes more pythonist:
toAnalyze= 'Hello!!gyus-- lol\n'
word = ''
separator = ''
tokens = []
for ch in toAnalyze:
if ch.isalnum():
word = ch
# we met the first character of a separator, so save a word
if not ch.isalnum() and word:
tokens.append(word)
word = ''
# 1. we met the first alphanumeric after a separator, so save the separator or
# 2. we met a new separator right after another one, also save the old separator
if ch.isalnum() and separator or separator and separator[-1] != ch:
tokens.append(separator)
separator = ''
if not ch.isalnum():
separator = ch
The output for your example is:
['Hello', '!!', 'gyus', '--', ' ', 'lol']