I want to strip words, specified in a list, from stings of a pandas column, and build another column with them. I have this example inspired from question python pandas if column string contains word flag
listing = ['test', 'big']
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})
df['Test_Flag'] = np.where(df['Title'].str.contains('|'.join(listing), case=False,
na=False), 'T', '')
print (df)
Title Test_Flag
0 small test T
1 huge Test T
2 big T
3 nothing
4 NaN
5 a
6 b
But, what if I want to put instead of "T", the actual word in the list that has been found? So, having a result:
Title Test_Flag
0 small test test
1 huge Test test
2 big big
3 nothing
4 NaN
5 a
6 b
CodePudding user response:
Using the .apply
method with a custom function should give you what you are looking for
import pandas as pd
import numpy as np
# Define the listing list with the words you want to extract
listing = ['test', 'big']
# Define the DataFrame
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})
# Define the function which takes a string and a list of words to extract as inputs
def listing_splitter(text, listing):
# Try except to handle np.nans in input
try:
# Extract the list of flags
flags = [l for l in listing if l in text.lower()]
# If any flags were extracted then return the list
if flags:
return flags
# Otherwise return np.nan
else:
return np.nan
except AttributeError:
return np.nan
# Apply the function to the column
df['Test_Flag'] = df['Title'].apply(lambda x: listing_splitter(x, listing))
df
Output:
Title Test_Flag
0 small test ['test']
1 huge Test ['test']
2 big ['big']
3 nothing NaN
4 NaN NaN
5 a NaN
6 b NaN
7 smalltest ['test']