Home > Back-end >  Find the words from the file whose first letter is capitalized
Find the words from the file whose first letter is capitalized

Time:11-07

Write a script that finds all of the capitalized words (not words in all caps, just the initial letter) in a text file and presents them in alphabetical order.

I used this logic:

re.findall(r'\b[A-Z][a-z]*\b', line)

and my function returns this output:

Enter the file name: bzip2.txt
['A', 'All', 'Altered', 'C', 'If', 'Julian', 'July', 'R', 'Redistribution', 'Redistributions', 'Seward', 'The', 'This']

How to remove single letter words (A, C and R)?

CodePudding user response:

You can do this within the regex itself, no need to filter the array. Just use instead of *:

re.findall(r'\b[A-Z][a-z] \b', line)

In RegEx, * means to match zero or more times, while means to match one or more times. Hence, your original code matched the lowercase letters zero times, so it was essentially ignored). With the , it will be forced to match at least once. You can learn more about this from this question and its answers.

Also, credit where credit is due: blhsing also pointed this out in the comments of the original question while I was writing this answer.

CodePudding user response:

Instead of using a regex, split and directly check

  • has at least 2 characters
  • first letter is a capitalized letter

Then call sorted() to get a sorted list

>>> alphabet = set("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
>>> sorted(filter(lambda word: len(word) >= 2 and word[0] in alphabet, my_collection))
['All', 'Altered', 'If', 'Julian', 'July', 'Redistribution', 'Redistributions', 'Seward', 'The', 'This']
  • Related