I wish to add a boolean/binary 'CONDITION' column to my existing data frame based on whether a predefined value exists in the given row's 'DICTIONARY' column that contains a dictionary of simple string-string pairs, and we are looking at its keys. I tried to avoid writing my own loop with:
df.loc['KEYWORD' in df['DICTIONARY'], 'CONDITION'] = 1
df.loc['KEYWORD' not in df['DICTIONARY'], 'CONDITION'] = 0
But it gives the error:
KeyError: 'cannot use a single bool to index into setitem'
The same error showed up when I've tried:
condition = ('KEYWORD' in (i for i in df['DICTIONARY']))
df.loc[condition, 'CONDITION'] = 1
I have also tried with this, however it results in a generator which I was unable to utilize:
condition = ('KEYWORD' in i for i in df['DICTIONARY'].tolist())
CodePudding user response:
If possible convert valuest to strings and then test subtrings use:
df['CONDITION'] = df['DICTIONARY'].astype(str).str.contains('KEYWORD').astype(int)
df['CONDITION'] = np.where(df['DICTIONARY'].astype(str).str.contains('KEYWORD'), 1, 0)
Or maybe (depends of data):
df['CONDITION'] = df['DICTIONARY'].map(lambda x: 'KEYWORD' in x).astype(int)
df['CONDITION'] = np.where(df['DICTIONARY'].map(lambda x: 'KEYWORD' in x), 1, 0)