Home > OS >  Adding new column in Pandas dataframe with keyword from a list that is contained in another column (
Adding new column in Pandas dataframe with keyword from a list that is contained in another column (

Time:06-22

I want to create a new column in a dataframe that should have one keyword from a list of keywords based on which keyword appears in another (description) column. If the description column has multiple keywords, I need to pick the first match. I then need to add the values of the unique keywords. Here is my attempt at solving this but I can't seem to solve the multiple matches issue. Can someone please help?

    kw = ['alpha', 'beta', 'theta', 'delta']
    data = {'description':['This text contains alpha', 'Here are delta & beta', 'It is beta', 'Another Alpha', 'sometimes Theta too', 'One more BETA'],
        'value': [100,200,300,400,500,600]}
    df = pd.DataFrame(data)

    #add column based on which keyword appears in description
    df['keys'] = df['description'].str.lower().str.findall('|'.join(kw)).apply(set).str.join(',') #Is there a simpler way to code this?
    print(f"new df = \n{df}\n")

    #add values of unique keywords
    df2 = df.groupby('keys').sum()
    print(f"with key values = \n{df2}\n")

Output:

new df = 
                description  value        keys
0  This text contains alpha    100       alpha
1     Here are delta & beta    200  delta,beta
2                It is beta    300        beta
3             Another Alpha    400       alpha
4       sometimes Theta too    500       theta
5             One more BETA    600        beta

with key values = 
            value
keys             
alpha         500
beta          900
delta,beta    200
theta         500

CodePudding user response:

You can use this: df['keys'] = df['description'].str.lower().str.findall('|'.join(kw)).str[0]

Instead of applying a set (will order alphabetical) it will keep the greek letter that appears first. Using .str[0] you can get the first element of all lists in the column.

  • Related