I'm working on a personal project where I have Reddit comments from a thread in a subreddit. I now have those comments in a pandas data frame. In a separate data frame, I have a column containing stock ticker symbols. What I have is the following: The first few entries of each of my dataframes
With this in mind, is there a way to use the tickers in ticker_symbols as a dictionary and then output the three most mentioned tickers in comm.body?
CodePudding user response:
You could create a string of all the comments and then use re.findall()
to iterate through each symbol and get a count of how many times it appears:
import re
comments = ', '.join(comm['body'].values)
ticker_symbols.assign(num=ticker_symbols.apply(lambda x: len(re.findall(x['ACT Symbol'], comments)), axis=1))
If you really just want a list of the top three you can do the following:
import re
comments = ', '.join(comm['body'].values)
result = list(
ticker_symbols
.assign(num=ticker_symbols.apply(
lambda x: len(re.findall(x['ACT Symbol'], comments)),
axis=1,
))
.sort_values(by='num', ascending=False)[:3]['ACT Symbol'].values
)