I was looking to collect each word from a list that is included in a string in python. I found some solutions but so far i get:
data = "Today I gave my dog some carrots to eat in the car"
tweet = data.lower() #convert to lower case
split = tweet.split()
matchers = ['dog','car','sushi']
matching = [s for s in split if any(xs in s for xs in matchers)]
print(matching)
The result is
['dog', 'carrots', 'car']
How do I fix that the result is only dog and car without adding spaces to my matchers?
Also how would I remove any $ signs (as example) from the data string but no other special characters like @?
CodePudding user response:
How do I fix that the result is only dog and car without adding spaces to my matchers?
To do this with your current code, replace this line:
matching = [s for s in split if any(xs in s for xs in matchers)]
With this:
matching = []
# iterate over all matcher words
for word in matchers:
if word in split: # check if word is in the split up words
matching.append(word) # add word to list
You also mention this:
Also how would I remove any $ signs (as example) from the data string but no other special characters like @?
To do this, I would create a list that contains characters you want to remove, like so:
things_to_remove = ['$', '*', '#'] # this can be anything you want to take out
Then, simply strip each character from the tweet string before you split it.
for remove_me in things_to_remove:
tweet = tweet.replace(remove_me, "")
So a final code block that demonstrates all of these topics:
data = "Today I@@ gave my dog## some carrots to eat in the$ car"
tweet = data.lower() #convert to lower case
things_to_remove = ['$', '*', '#']
for remove_me in things_to_remove:
tweet = tweet.replace(remove_me, "")
print("After removeing characters I don't want:")
print(tweet)
split = tweet.split()
matchers = ['dog','car','sushi']
matching = []
# iterate over all matcher words
for word in matchers:
if word in split: # check if word is in the split up words
matching.append(word) # add word to list
print(matching)