By using the following code I am trying to only keep the rows from the dataset that contain the value 9 within Col_B which is a string.
The issue is that by using this code it returns ONLY the rows that consist of ONLY the string "9", while I want to also keep the rows that contain 9 amongst other strings.
Original input:
Col_A Col_B
0 5 1,2,9,0,6,4
1 2 3,1,0
2 46 0
3 184 1,5,7
4 31 9
5 81 9,0
Code used:
desired_numbers = ['9,', ',9', ',9,', '9']
df1 = df1[df1['Col_B'].isin(desired_numbers)]
df1
Original output:
Col_A Col_B
0 31 9
Desired output:
Col_A Col_B
0 5 1,2,9,0,6,4
1 31 9
2 81 9,0
Please let me know if you have any recommendations! Thanks
CodePudding user response:
Try with string str.contains
out = df[df.Col_B.str.contains(r'\b9\b')] #str.match also work
Out[336]:
Index Col_A Col_B
0 0 5 1,2,9,0,6,4
4 4 31 9
5 5 81 9,0
CodePudding user response:
More along the lines of Barmar's suggestion, Though I think BENY's answer is more efficient:
out = df[df.Col_B.str.split(',').apply(set) >= {'9'}]
print(out)
...
Col_A Col_B
0 5 1,2,9,0,6,4
4 31 9
5 81 9,0
CodePudding user response:
Although I like BENY's approach, and it is more general in case there are other separators besides the comma, you can split the strings into lists using str.split
and use str.contains
with regex=False
to check if the lists contain the value 9.
out = df[df['Col_B'].str.split(',').str.contains('9', regex=False)]
Or using Series.apply
, which might be less efficient but more straightforward:
out = df[df['Col_B'].apply(lambda vals: '9' in vals.split(','))]
Output:
>>> out
Index Col_A Col_B
0 0 5 1,2,9,0,6,4
4 4 31 9
5 5 81 9,0