How can I count occurrences of words specified in an array in Python?-CodePudding

I am working on a small program in which the user enters text and I would like to check how many times the given words occur in the given input.

# Read user input
print("Input your code: \n")

user_input = sys.stdin.read()
print(user_input)

For example, the text that I input in a program is:

a=1
b=3
if (a == 1):
    print("A is a number 1")
elif(b == 3):
    print ("B is 3")
else: 
    print("A isn't 1 and B isn't 3")

The words to find out are specified in an array.

wordsToFind = ["if", "elif", "else", "for", "while"]

And basically I would like to print how many "if", "elif" and "else" has occurred in a input.

How can I count occurrences of words like "if", "elif", "else", "for", "while" in a given string by user input?

CodePudding user response：

I think the best option is to use the tokenize built-in module of python:

# Let's say this is tokens.py
import sys
from collections import Counter
from io import BytesIO
from tokenize import tokenize

# Get input from stdin
code_text = sys.stdin.read()

# Tokenize the input as python code
tokens = tokenize(BytesIO(code_text.encode("utf-8")).readline)

# Filter the ones in wordsToFind
wordsToFind = ["if", "elif", "else", "for", "while"]
words = [token.string for token in tokens if token.string in wordsToFind]

# Count the occurrences
counter = Counter(words)

print(counter)

Test

Let's say you have a test.py:

a=1
b=3
if (a == 1):
    print("A is a number 1")
elif(b == 3):
    print ("B is 3")
else: 
    print("A isn't 1 and B isn't 3")

and then you run:

cat test.py | python tokens.py

Output:

Counter({'if': 1, 'elif': 1, 'else': 1})

Advantages

Only correct python (syntactically) will be parsed
You only will be counting the python keywords (not every if occurrence in the code text, for example, you can have an line like

a = "if inside str"

That if should not be counted I think