Home > other >  advanced text processing: how to find words that do not contain 'e'
advanced text processing: how to find words that do not contain 'e'

Time:03-29

Write a program that iterates over the provided jane_eyre_sentences.txt file and counts the number of words without an "e", including both upper and lower case. For each sentence in which the relative amount of words without "e" is over 70%, print out how many words in that sentence contain no "e", and how many words there are in total. Also, let your program print out the corresponding line number (starting to count from zero).

Here is my work:

line_no = -1

for line in open("jane_eyre_sentences.txt"):

  total = 0

  line_no  = 1

  lines = line.strip()

  word = lines.split()

  for val in word:

    if not "e" in val.lower():

      total  = 1

      if (total/len(word)) > 0.7:

        print("{}: {} out of {} words contain no 'e'.".format(line_no, total, len(word)))

The output is: 0: 8 out of 10 words contain no 'e'. 0: 9 out of 10 words contain no 'e'. 22: 13 out of 18 words contain no 'e'. 22: 14 out of 18 words contain no 'e'. 24: 11 out of 15 words contain no 'e'. 31: 7 out of 9 words contain no 'e'. 33: 19 out of 27 words contain no 'e'. 36: 19 out of 26 words contain no 'e'. 38: 11 out of 15 words contain no 'e'.

But the correct output should be: 0: 9 out of 10 words contain no 'e'. 22: 14 out of 18 words contain no 'e'. 24: 11 out of 15 words contain no 'e'. 31: 7 out of 9 words contain no 'e'. 33: 19 out of 27 words contain no 'e'. 36: 19 out of 26 words contain no 'e'. 38: 11 out of 15 words contain no 'e'.

What's wrong with my codes?

CodePudding user response:

I don't want to give you the solution to what is clearly homework, but some hints:

Think about where exactly you should check if you are over 70%.

Also print the output of lines.split() (or look at it in a debugger) and see if it really is what you expect.

CodePudding user response:

The hint by Olli should have given you the solution. So in the end your code was fundamentally correct, as you will have discovered.

I think however you may perhaps find it useful to compare your code with a more concise, more pythonic version, like this one (see comments inline):

with open("jane_eyre_sentences.txt") as infile: #use of the context manager is recommended
    for line_no, line in enumerate(infile): #enumerate() takes care of your line_no
        words = line.strip().split() #it is a list of words, let us use the plural form for readability
        total = len(list(filter(lambda x:'e' not in x.lower(), words))) #filter() does the work for you
        if total/len(words) > 0.7:
            print("{}: {} out of {} words contain no 'e'.".format(line_no, total, len(words)))
  • Related