Home > OS >  Python: show rows if there's certain keyword from the list and show what was the detected keywo
Python: show rows if there's certain keyword from the list and show what was the detected keywo

Time:06-05

I was trying to get a data frame of spam messages so I can analyze them. This is what the original CSV file looks like.

original data frame

I want it to be like filtered data frame

This is what I had tried:

###import the original CSV (it's simplified sample which has only two columns - sender, text)
import pandas as pd
df = pd.read_csv("spam.csv")

### if any of those is in the text column, I'll put that row in the new data frame.
keyword = ["prize", "bit.ly", "shorturl"]

### putting rows that have a keyword into a new data frame. 
spam_list = df[df['text'].str.contains('|'.join(keyword))]

### creating a new column 'detected keyword' and trying to show what was detected keyword
spam_list['detected word'] = keyword
spam_list

However, "detected word" is in order of the list. I know it's because I put the list into the new column, but I couldn't think/find a better way to do this. Should I have used "for" as the solution? Or am I approaching it in a totally wrong way?

CodePudding user response:

You can define a function that gets the result for each row:

def detect_keyword(row):
    for key in keyword:
        if key in row['text']:
            return key

then get it done for all rows with pandas.apply() and save results as a new column:

df['detected_word'] = df.apply(lambda x: detect_keyword(x), axis=1)

CodePudding user response:

You can use the code given below in the picture to solve your stated problem, I wasn't able to paste the code because stackoverflow wasn't allowing to paste short links. The link to the code is available.

enter image description here

The code has been adapted from here

  • Related