I have a dataset with text message data. i want to take out specific rows that contain specific keywords into another csv file.
Please find the sample dataset here: https://docs.google.com/spreadsheets/d/1B7LgkNn2pLchbmjRggAWkq6O7GrWi79aiJJOUpmDGIc/edit?usp=sharing
I wrote this. Its not working out well. Need some assistance to point me in the right direction
keywords = ["SBI", "HDFC", "Canara", "HSBC", "KTK"]
listMatchPosition = []
listMatchDescription = []
df = pd.read_csv("SMS.csv", sep=",")
for i in range(len(df.index)):
if any(df['text'][i] for x in keywords):
listMatchDescription.append(df['text'][i])
output = pd.DataFrame({'senderAddress':listMatchDescription})
output.to_csv("new_data.csv", index=False)
CodePudding user response:
SO won't let me post sample code w/bit.ly links in it, so here's a solution w/o the sample data setup.
I had to add Zomato
to the keywords list so that there is an actual match, as none of the other keywords you have are present in your sample text.
keywords = ["Zomato", "HDFC", "Canara", "HSBC", "KTK"]
matches = df.loc[df.text.apply(lambda x: any(k for k in keywords if k in x))][['senderAddress','text']]
print(matches)
Output
senderAddress text
0 JK-SmplPL Rs.95.15 on Zomato charged via Simpl.\r\n--\r\...
3 BP-ACKOGI Mohd,\nCheck the incredible Acko insurance pol...
CodePudding user response:
Edit: use apply to execute a custom function:
def check_string(x):
for i in keywords:
if i in x.title:
return x
output = df.apply(check_string)
I hope this works