Home > front end >  How to find column where is punctation mark as a single value in Python Pandas?
How to find column where is punctation mark as a single value in Python Pandas?

Time:12-24

I have DataFrame like below:

COL1 | COL2 | COL3
-----|------|--------
abc  | P    | 123
b.bb | ,    | 22
  1  | B    | 2
...  |...   | ...

And I need to find columns where is only punctation mark like !"#$%&'()* ,-./:;<=>?@[]^_`{|}~

So as a result I need something like below (only COL2, because in COL1 is also punctation mark, but there is with other values).

COL2 
-------
 P    
 ,    
 B   
... 

CodePudding user response:

Using a regex with str.fullmatch and any:

import re

chars = '''!"#$%&'()* ,-./:;<=>?@[]^_`{|}~'''
pattern = f'[{re.escape(chars)}]'
# [!"\#\$%\&'\(\)\*\ ,\-\./:;<=>\?@\[\]\^_`\{\|\}\~]

out = df.loc[:, df.astype(str).apply(lambda s: s.str.fullmatch(pattern).any())]

Or with isin:

out = df.loc[:, df.isin(set(chars)).any()]

Output:

  COL2
0    P
1    ,
2    B

CodePudding user response:

punc = set("!\"#$%&'()* ,-./:;<=>?@[]^_`{|}~")
df.loc[:, df.applymap(lambda x: set(x).issubset(punc)).any()]
  • Related