I have a dataframe such as
COL1
jcf7180001991334_2-HSPs_ __SP_1
jcf7180001991334:23992-26263( ):SP_2
jcf7180001988059:2889-4542(-):SP_3
and a list :
the_list['jcf7180001991334_2-HSPs_ __SP_1','not_in_tab1','jcf7180001991334:23992-26263( ):SP_2','not_intab2','not_intab3','jcf7180001988059:2889-4542(-):SP_3']
and by iterating over that list such as :
for element in the_list:
if element in df['COL1']:
print(element, " in df")
else:
print(element, " not in df")
I should then get the following output :
jcf7180001991334_2-HSPs_ __SP_1 in df
not_in_tab1 not in df
jcf7180001991334:23992-26263( ):SP_2 in df
not_intab2 not in df
not_intab3 not in df
jcf7180001988059:2889-4542(-):SP_3 in df
But instead I cannot fint any o them in the df and i get :
jcf7180001991334_2-HSPs_ __SP_1 not in df
not_in_tab1 not in df
jcf7180001991334:23992-26263( ):SP_2 not in df
not_intab2 not in df
not_intab3 not in df
jcf7180001988059:2889-4542(-):SP_3 not in df
I guess it is because of the special characters within the element such as parentheses
and
or -
? Does someone know how to deal with that ?
CodePudding user response:
By default, in
checks whether the value is in the index.
Then, you may look in the values like this df['COL1'].values
import pandas as pd
data = {
"COL1": ['jcf7180001991334_2-HSPs_ __SP_1', 'jcf7180001991334:23992-26263( ):SP_2', 'jcf7180001988059:2889-4542(-):SP_3']}
df = pd.DataFrame(data)
the_list=['jcf7180001991334_2-HSPs_ __SP_1', 'not_in_tab1', 'jcf7180001991334:23992-26263( ):SP_2', 'not_intab2', 'not_intab3','jcf7180001988059:2889-4542(-):SP_3']
for element in the_list:
if element in df['COL1'].values: # Here You should look in The values
print(element, " in df")
else:
print(element, " not in df")
[Output]
jcf7180001991334_2-HSPs_ __SP_1 in df
not_in_tab1 not in df
jcf7180001991334:23992-26263( ):SP_2 in df
not_intab2 not in df
not_intab3 not in df
jcf7180001988059:2889-4542(-):SP_3 in df
CodePudding user response:
Give this a try:
["in df" if x in df['COL1'].values else "not in df" for x in the_list]