I have 2 datasets. One contains a column of companies name, and another contains a column of headlines of news. So the aim I want to achieve is to find all the news whose headline contains one company in the other datasets.Basically the two datasets are like this, and I wanna select the news with specific company names
I have tried to use for loop to achieve my goals, but I think it takes too much time and I think pandas or some other libraries can do this in an easier way.
I am a starter in python.
CodePudding user response:
If I understand correctly you should have 2 data sets with different columns, first, you need to loop through the dataset that contains the company name to search in the headline, then you could use obj. find(“search”) to find matches in both datasets. Also if every query is stored in a CSV format you could use the split() function to get the only column you wanna use
CodePudding user response:
Supposing that you have saved your company names in a pd.Series
called company
and headlines and texts in a pd.DataFrame
called df
, this will be what you are looking for:
# it will add a column called "company" to your initial df
for org, headline in zip(company, df['headline']):
if org in headline:
df.loc[df['headline'] == headline, 'company'] = org
You should pay attention to lower and upper case letters, as this will only find the corresponding company if the exact same word appears in the headline.