I'm trying to search for strings within lists that are contained in a pandas dataframe, see this one example:
userAuthor hashtagsMessage
post_1 nytimes [#Emmys]
post_2 TMZ []
post_3 Forbes [#BTSatUNGA]
post_4 nytimes [#Emmys]
post_5 Forbes [#BTS, #BTSatUNGA]
As you have noticed, the column that hosts such lists is 'hashtagsMessage'. I've tried using conventional methods for string searching but I've not been able to.
If I wanted to look for an exact match for '#BTS', with a conventional method, you could use some of these options, like:
df['hashtagsMessage'].str.contains("#BTS", case=False)
or
df['hashtagsMessage']=="#BTS"
Or similar. Unfortunately, these approaches do not work for lists, I have to make an extra step I suppose to index inside the list while I'm searching in the DataFrame but I'm not really sure how to do this part.
Any help is entirely appreciated!
CodePudding user response:
Use map
or apply
:
>>> df['hashtagsMessage'].map(lambda x: '#BTS' in x)
post_1 False
post_2 False
post_3 False
post_4 False
post_5 True
Name: hashtagsMessage, dtype: bool
Update
A more vectorizable way using explode
:
>>> df.loc[df['hashtagsMessage'].explode().eq('#BTS').loc[lambda x: x].index]
userAuthor hashtagsMessage
post_5 Forbes [#BTS, #BTSatUNGA]
CodePudding user response:
Please search for raw string
if not actual list use:
df['hashtagsMessage'].str.contains(r'#BTS')
if list please use
df['hashtagsMessage'].astype(str).str.contains(r'#BTS')
CodePudding user response:
You could use a simple anonymous function employing a list-comprehension and any()
e.g.:
Edit: I originally presumed you wanted any tag containing '#BTS', and just edited to find only exact match(es) :)
In [10]: df = pd.DataFrame({'hashtagsMessage':[
[], ["#BTSatUNGA"],
["#Emmys"], ['#BTS', '#BTSatUNGA']]})
In [18]: df['hashtagsMessage'].apply(lambda lst: any(s for s in lst
if s == "#BTS"))
Out[18]:
0 False
1 False
2 False
3 True
Name: hashtagsMessage, dtype: bool