Home > OS >  Is in a column with special characters in python
Is in a column with special characters in python

Time:11-09

I have a dataframe such as

COL1 
jcf7180001991334_2-HSPs_ __SP_1
jcf7180001991334:23992-26263( ):SP_2
jcf7180001988059:2889-4542(-):SP_3

and a list :

the_list['jcf7180001991334_2-HSPs_ __SP_1','not_in_tab1','jcf7180001991334:23992-26263( ):SP_2','not_intab2','not_intab3','jcf7180001988059:2889-4542(-):SP_3'] 

and by iterating over that list such as :

for element in the_list:
 if element in df['COL1']:
  print(element, " in df")
 else:
  print(element, " not in df")

I should then get the following output :

jcf7180001991334_2-HSPs_ __SP_1 in df 
not_in_tab1 not in df
jcf7180001991334:23992-26263( ):SP_2 in df
not_intab2 not in df
not_intab3 not in df
jcf7180001988059:2889-4542(-):SP_3 in df

But instead I cannot fint any o them in the df and i get :

jcf7180001991334_2-HSPs_ __SP_1 not in df 
not_in_tab1 not in df
jcf7180001991334:23992-26263( ):SP_2 not in df
not_intab2 not in df
not_intab3 not in df
jcf7180001988059:2889-4542(-):SP_3 not in df

I guess it is because of the special characters within the element such as parentheses and or - ? Does someone know how to deal with that ?

CodePudding user response:

By default, in checks whether the value is in the index.

Then, you may look in the values like this df['COL1'].values

import pandas as pd
data = {
  "COL1": ['jcf7180001991334_2-HSPs_ __SP_1', 'jcf7180001991334:23992-26263( ):SP_2', 'jcf7180001988059:2889-4542(-):SP_3']}

df = pd.DataFrame(data)

the_list=['jcf7180001991334_2-HSPs_ __SP_1', 'not_in_tab1', 'jcf7180001991334:23992-26263( ):SP_2', 'not_intab2', 'not_intab3','jcf7180001988059:2889-4542(-):SP_3'] 

for element in the_list:

 if element in df['COL1'].values: # Here You should look in The values
  print(element, " in df")
 else:
  print(element, " not in df")

[Output]

jcf7180001991334_2-HSPs_ __SP_1  in df
not_in_tab1  not in df
jcf7180001991334:23992-26263( ):SP_2  in df
not_intab2  not in df
not_intab3  not in df
jcf7180001988059:2889-4542(-):SP_3  in df

CodePudding user response:

Give this a try: ["in df" if x in df['COL1'].values else "not in df" for x in the_list]

  • Related