I am trying to use list comprehension for some complex column creation in pandas.
For instance, I am trying to use a list as a reference to create another column in a pandas data frame:
fruit = ['watermelon', 'apple', 'grape']
string new_column
watermelons are cool watermelon
apples are good apple
oranges are on sale NaN
I tried to use list comprehension -
df['new_column'] = [f in fruit if any(f in s for f in fruit) for s in df['string']]
I don't think this is correct, would need some help!
CodePudding user response:
Best is to use str.extract
:
fruit = ['watermelon', 'apple', 'grape']
import re
df['new_column'] = df['string'].str.extract(f"({'|'.join(map(re.escape, fruit))})")
output:
string new_column
0 watermelons are cool watermelon
1 apples are good apple
2 oranges are on sale NaN
CodePudding user response:
This will do the job:
import pandas as pd
import numpy as np
fruit = ['watermelon', 'apple', 'grape']
df = pd.DataFrame()
df['string'] = ['watermelons are cool', 'apples are good', 'oranges are on sale', 'apples are not watermelons']
output = df['string'].apply(lambda x: ','.join([f for f in fruit if f in x]))
output[output == ''] = np.nan
print(output)
Output:
0 watermelon
1 apple
2 NaN
3 watermelon,apple
Name: string, dtype: object