How do I drop pandas dataframe columns that contains special characters such as @ / ] [ } { - _ etc.?
For example I have the following dataframe (called df
):
I need to drop the columns Name
and Matchkey
becasue they contain some special characters.
Also, how can I specify a list of special characters based on which the columns will be dropped?
For example: I'd like to drop the columns that contain (in any record, in any cell) any of the following special characters:
listOfSpecialCharacters: ¬,`,!,",£,$,£,#,/,\
CodePudding user response:
One option is to use a regex with str.contains
and apply
, then use boolean indexing to drop the columns:
import re
chars = '¬`!"£$£#/\\'
regex = f'[{"".join(map(re.escape, chars))}]'
# '[¬`!"£\\$£\\#/\\\\]'
df2 = df.loc[:, ~df.apply(lambda c: c.str.contains(regex).any())]
example:
# input
A B C
0 123 12! 123
1 abc abc a¬b
# output
A
0 123
1 abc