Home > front end >  pandas Series.str.contains extracted what shouldn't have appeared
pandas Series.str.contains extracted what shouldn't have appeared

Time:03-16

I have a data like this

data img

I'm extracting rows with 1.1.1.1 by Series.str.contains(pttn, regex=False)

pttn = '1.1.1.1'
dd = pd.Series(['1.1.1.1, 2.22.3.107','1.1.1.1','2.2.2.2, 1.1.1.1', '2.2.2.2, 1.1.1.14','1.1.1.15','1.1.1.100','1.1.1.101','1.1.1.109'])
dd[dd.str.contains(pttn, regex=False, na=False)]

and I got the unexpected result

0    1.1.1.1, 2.22.3.107
1                1.1.1.1
2       2.2.2.2, 1.1.1.1
3      2.2.2.2, 1.1.1.14
4               1.1.1.15
5              1.1.1.100
6              1.1.1.101
7              1.1.1.109
dtype: object

but actually what I want only is

0    1.1.1.1, 2.22.3.107
1                1.1.1.1
2       2.2.2.2, 1.1.1.1
dtype: object

CodePudding user response:

Simply use

newdd = dd[dd == pttn]

Your solution uses contains, and indeed, all values in dd contain the string '1.1.1.1', so they all match.

CodePudding user response:

Update

>>> dd[dd.str.split(', ').explode().loc[lambda x: x == pttn].index]
0    1.1.1.1, 2.22.3.107
1                1.1.1.1
2       2.2.2.2, 1.1.1.1
dtype: object

Old answer

You are looking for str.fullmatch:

>>> dd[dd.str.fullmatch(pttn)]
0    1.1.1.1
dtype: object

Or

>>> df[dd == pttn]
0    1.1.1.1
dtype: object

The advantage with str.fullmatch is you can use a regular expression or control the case, sensitive or not.

  • Related