Question
Consider the following:
word = 'analphabetic'
df = pd.DataFrame({'substring': list('abcdefgh') ['ab', 'phobic']})
substring is not necessarily a single letter!
I want to add a column with the name of word
and each row it shows True/False
whether the substring in that row is in word
. Can I do this with a built-in pandas method?
Desired output:
substring analphabetic
0 a True
1 b True
2 c True
3 d False
4 e True
5 f False
6 g False
7 h True
8 ab True
9 phobic False
pandas.Series.str.contains
The other way around can be done by doing something like df.substring.str.contains(word)
. I guess you could do something like:
df[word] = [i in word for i in df.substring]
But then the built-in function str.contains()
could be done by:
string = 'a'
df = pd.DataFrame({'words': ['these', 'are', 'some', 'random', 'words']})
df[string] = [string in i for i in df.words]
So my thought is that there is also a built-in method to do my trick.
CodePudding user response:
A possible solution (which should work for substrings longer than a single letter):
df['analphabetic'] = df['substring'].map(lambda x: x in word)
Output:
substring analphabetic
0 a True
1 b True
2 c True
3 d False
4 e True
5 f False
6 g False
7 h True
Using list comprehension:
df['analphabetic'] = [x in word for x in df.substring]
Using apply
:
df['analphabetic'] = df['substring'].apply(lambda x: x in word)