Validate the presence of a single string within another string-CodePudding

By using the following code I am trying to only keep the rows from the dataset that contain the value 9 within Col_B which is a string.

The issue is that by using this code it returns ONLY the rows that consist of ONLY the string "9", while I want to also keep the rows that contain 9 amongst other strings.

Original input:

   Col_A        Col_B
0      5  1,2,9,0,6,4
1      2        3,1,0
2     46            0
3    184        1,5,7
4     31            9
5     81          9,0

Code used:

desired_numbers = ['9,', ',9', ',9,', '9']
df1 = df1[df1['Col_B'].isin(desired_numbers)]
df1

Original output:

   Col_A  Col_B
0     31      9

Desired output:

   Col_A        Col_B
0      5  1,2,9,0,6,4
1     31            9
2     81          9,0

Please let me know if you have any recommendations! Thanks

CodePudding user response：

Try with string str.contains

out = df[df.Col_B.str.contains(r'\b9\b')] #str.match also work

Out[336]: 
   Index  Col_A        Col_B
0      0      5  1,2,9,0,6,4
4      4     31            9
5      5     81          9,0

CodePudding user response：

More along the lines of Barmar's suggestion, Though I think BENY's answer is more efficient:

out = df[df.Col_B.str.split(',').apply(set) >= {'9'}]
print(out)

...

   Col_A        Col_B
0      5  1,2,9,0,6,4
4     31            9
5     81          9,0

CodePudding user response：

Although I like BENY's approach, and it is more general in case there are other separators besides the comma, you can split the strings into lists using str.split and use str.contains with regex=False to check if the lists contain the value 9.

out = df[df['Col_B'].str.split(',').str.contains('9', regex=False)]

Or using Series.apply, which might be less efficient but more straightforward:

out = df[df['Col_B'].apply(lambda vals: '9' in vals.split(','))]

Output:

>>> out

   Index  Col_A        Col_B
0      0      5  1,2,9,0,6,4
4      4     31            9
5      5     81          9,0