How to create a pandas column with words from another column, contained in a list-CodePudding

I want to strip words, specified in a list, from stings of a pandas column, and build another column with them. I have this example inspired from question python pandas if column string contains word flag

listing  = ['test', 'big']
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})
df['Test_Flag'] = np.where(df['Title'].str.contains('|'.join(listing), case=False, 
na=False), 'T', '')
print (df)

        Title         Test_Flag
0  small test         T
1  huge Test          T
2  big                T
3  nothing
4   NaN          
5     a
6     b

But, what if I want to put instead of "T", the actual word in the list that has been found? So, having a result:

        Title       Test_Flag
0  small test       test
1  huge Test        test
2  big              big
3  nothing
4   NaN          
5     a
6     b

CodePudding user response：

Using the .apply method with a custom function should give you what you are looking for

import pandas as pd
import numpy as np

# Define the listing list with the words you want to extract
listing  = ['test', 'big']
# Define the DataFrame
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})

# Define the function which takes a string and a list of words to extract as inputs
def listing_splitter(text, listing):
    # Try except to handle np.nans in input
    try:
        # Extract the list of flags
        flags = [l for l in listing if l in text.lower()]
        # If any flags were extracted then return the list
        if flags:
            return flags
        # Otherwise return np.nan
        else:
            return np.nan
    except AttributeError:
        return np.nan

# Apply the function to the column
df['Test_Flag'] = df['Title'].apply(lambda x: listing_splitter(x, listing))
df

Output:

    Title       Test_Flag
0   small test  ['test']
1   huge Test   ['test']
2   big         ['big']
3   nothing     NaN
4   NaN         NaN
5   a           NaN
6   b           NaN
7   smalltest   ['test']