I was wondering how I could create a dummy variable for the following condition: column 'lemmatised' contains at least two words from 'innovation_words'. Innovation_words is a list I defined myself:
innovation_words = ['community', 'local', 'charity', 'event', 'partner',
'volunteering', 'plastic', 'surplusfood']
The lemmatised column looks like this (I'm fine changing the type or formatting if needed):
So, if any observation includes for example local and plastic, I would like to have a dummy variable: 'innovation' = 1. Hope someone can help me with this. Some code I already tried:
conditions = [df_posts['lemmatised'].isin(innovation_words),
df_posts['lemmatised'].isin(innovation_words)]
dummy = [1,0]
df_posts['innovation'] = np.select(conditions, dummy)
CodePudding user response:
Maybe you can try this:
df_posts['innovation'] = 0
df_posts.loc[df_posts.lemmatised.isin(innovation_words), 'innovation'] = 1
CodePudding user response:
Use from this code
df['new']=df.lemmatised.map(lambda w: len([i for i in innovation_words if i in w])>1)
just rename the variables