I am a beginner in Python so I get stuck sometimes in easy stuff.
Please, consider a column of names (securities) and tickers in df1:
Security | Tickers |
---|---|
GOOG | |
TWTR | |
Logitech | LOGI |
and then consider a column of news headlines in df2:
headlines |
---|
Twitter bought by rich entrepreneur |
Netflix lost 5m subscribers |
Amazon stocks raised 3 percent |
I want to create a new column in df2 with the ticker associated to that precise news if the security of df1 is present in df2["headlines"]. Otherwise, delete that row from df1.
I tried several versions of code.
The simplest one was:
for i in range(len(df2["headlines"])):
if df1["Security"][i] in df2["Headlines"][i]:
df2["Tickers"] = df1["Tickers"][i]
else:
data.drop(labels=[i],axis=0)
Here the problem was that df1 has 500 rows, while df2 has 30k rows. The loop should restart for df1 since I want to check that any security is present or not in any of the headlines of df2.
From there on I tried other things, including df.isin etc..., but it never worked. What do you suggest? Thanks!
CodePudding user response:
Try this:
#To create the new column in df2
for i in range(len(df1)):
for j in range(len(df2)):
if df2['Headlines'].str.contains(df1["Security"][i])[j]:
df2.loc[j, "Tickers"]=df1["Tickers"][i]
#Restrict df1 just to companies included in df2
df1=df1[df1['Tickers'].isin(df2['Tickers'])]