Home > Net >  Two Pandas DataFrames Word Count
Two Pandas DataFrames Word Count

Time:06-25

I'm working on a personal project where I have Reddit comments from a thread in a subreddit. I now have those comments in a pandas data frame. In a separate data frame, I have a column containing stock ticker symbols. What I have is the following: The first few entries of each of my dataframes

With this in mind, is there a way to use the tickers in ticker_symbols as a dictionary and then output the three most mentioned tickers in comm.body?

CodePudding user response:

You could create a string of all the comments and then use re.findall() to iterate through each symbol and get a count of how many times it appears:

import re

comments = ', '.join(comm['body'].values)
ticker_symbols.assign(num=ticker_symbols.apply(lambda x: len(re.findall(x['ACT Symbol'], comments)), axis=1))

If you really just want a list of the top three you can do the following:

import re

comments = ', '.join(comm['body'].values)
result = list(
    ticker_symbols
    .assign(num=ticker_symbols.apply(
        lambda x: len(re.findall(x['ACT Symbol'], comments)), 
        axis=1,
    ))
    .sort_values(by='num', ascending=False)[:3]['ACT Symbol'].values
)
  • Related