Coding a function to find the validity of certain words in a sentence-CodePudding

A university assignment has us tasked with writing a program in Python that analyzes tweets. Part of the assignment is coding a function that identifies whether words within a string sentence are valid, and can be counted. Here's the question:

Task 8 Valid Words

We also might want to look at only valid words in our data set. A word will be a valid word if all three of the following conditions are true:

• The word contains only letters, hyphens, and/or punctuation* (no digits).

• There is at most one hyphen '-'. If present, it must be surrounded by characters ("a-b" is valid, but "-ab" and "ab-" are not valid).

• There is at most one punctuation mark. If present, it must be at the end of the word ("ab,", "cd!", and "." are valid, but "a!b" and "c.," are not valid).

NB: for this question, the 3rd condition will also apply to apostrophes despite real "valid" words containing them.

Write a function valid_words_mask(sentence) that takes an input parameter sentence (type string) and returns the tuple: (int, list[]), where:

• int is the number of valid words found.

• list[] contains the booleans True or False for each word in sequence depending on whether that word is valid.

*Assume that a punctuation mark is any character that is not an alphanumeric (except for apostrophes, and for hyphens, which are handled separately as per the instructions).

Here's the code I have written so far, after many days of struggling. It seems to only return one iteration of the loop. Keep in mind that I am a beginner programmer, and have only applied the few concepts we have learned. :) Thanks for the feedback.

def valid_words_mask(sentence):

   
    """Takes a string sentence input and determines whether words are valid"""

    import string
    punctuation = list(string.punctuation)
    punctuation.remove("-")
    word_list = " ".split(sentence)
    valid_count = 0
    valid_list = []

    for word in word_list:

        hyphen_count = 0
        digit_count = 0
        punctuation_count = 0

       
        for i in range (0, len(word)):   
            
            #Checks whether given character is a punctuation mark  
            
            if word[i] == "-":  
                hyphen_count  = 1  
        
        for i in range (0, len(word)):
            
            #Checks whether given character is a digit
            
            if word[i].isdigit() == True:
                digit_count  = 1
                
        for i in range (0, (len(word) - 1)):
            
            if word[i] in punctuation:
                
                punctuation_count  = 1
                
        if digit_count < 1 and hyphen_count < 2 and punctuation_count < 1:
            if word[0] != "-" and word[-1] != "-":
                validity = True
        else: validity = False
        
        if validity == True:
            valid_count  = 1
        
        valid_list.append(validity)
            
     
    final_tuple = (valid_count, valid_list)
    
    return final_tuple
                

                
                
sentence = "these are valid  words"
print(valid_words_mask(sentence))

CodePudding user response：

The problem is wit the line: word_list = " ".split(sentence). word_list is an empty list.

Put word_list = sentence.split() instead.

CodePudding user response：

Some of your indentation seems off. Try:

    if digit_count < 1 and hyphen_count < 2 and punctuation_count < 1:
        if word[0] != "-" and word[-1] != "-":
            validity = True
            valid_count  = 1            
        else: validity = False

CodePudding user response：

The instructions for this task are confusing when it comes to defining what constitutes punctuation which means that the following code may not work for you.

However, you should think about breaking down the functionality into its component parts. In particular, you have 3 "rules" so write 3 complementary functions: each one succinct. Then it becomes easier to combine those rules into another "driver" function. Here's an example:

from string import ascii_lowercase, punctuation

HYPHEN = '-'
PUNCTUATION = punctuation.replace(HYPHEN, '')
VCHARS = ascii_lowercase   punctuation

def valid_chars(word):
    return all(c in VCHARS for c in word)

def valid_hyphens(word):
    return word.count(HYPHEN) == 0 or (word[0] != HYPHEN and word[-1] != HYPHEN)

def valid_punctuation(word):
    pcount = sum(1 for c in word if c in PUNCTUATION)
    return pcount == 0 or (pcount == 1 and word[-1] in PUNCTUATION)


def valid_words_mask(sentence):
    valid_count = 0
    valid_list = list()
    for word in sentence.lower().split():
        if v := valid_chars(word) and valid_punctuation(word) and valid_hyphens(word):
            valid_count  = 1
        valid_list.append(v)
    return valid_count, valid_list

print(valid_words_mask('Hello world??'))

Output:

(1, [True, False])