could you please help me check if any strings in a list that are next to each other have the same first letter. Im pretty new to python and my approach was to first tokenise and make the list lowercased. Then I create a nested list:
import nltk
myStrings = "Bob build a house"
myStrings_words = nltk.word_tokenize(myStrings)
myStings_words_lower = [word.lower() for word in myStrings_words]
nested_list = [list(x) for x in myStings_words_lower]
Now I'm not sure though how to compare each words 1 letter with each other and make sure that they are next to each other in the list. Maybe a for loop and accessing the 1 letters by myString_words_lower[x][1]
?
The output should be the words that have the same letter in the beginning, so in this case bob and build.
Thank you in advance, Paul
CodePudding user response:
You can use itertools.groupby to help you with this. Let's assume you have your list of lowercase words:
import nltk
myStrings = "Bob build a house"
myStrings_words = nltk.word_tokenize(myStrings)
myStings_words_lower = [word.lower() for word in myStrings_words]
To group them into any neighbours that share a first letter, you can do:
import itertools
# define a grouping helper
first_letter = lambda x: x[0]
# get the groups
grouped_words = itertools.groupby(myStings_words_lower, key=first_letter)
print(f"The number of words is {len(myStings_words_lower)} and the number of groups is {len(list(grouped_words))}")
If the number of groups is equal to the number of words, then no consecutive words share a starting letter. If the number is not equal, then you know there are neighbouring entries that share a starting letter.
CodePudding user response:
Another approach:
In [6]: myString = "Bob build an aeroplane, boat and a haunted house"
In [7]: my_words = [word.lower() for word in myString.split()]
In [8]: my_words
Out[8]: ['bob', 'build', 'an', 'aeroplane,', 'boat', 'and', 'a', 'haunted', 'house']
# Iterate over the words and while iterating, check if present word and
# the next word has the same first letter. (We use len(my_words) - 1 as
# we are using i 1 in the loop and so should stop at the penultimate word)
In [9]: for i in range(len(my_words) - 1):
...: if my_words[i][0] == my_words[i 1][0]:
...: print(my_words[i], my_words[i 1])
...:
bob build
an aeroplane,
and a
haunted house
Cheers!