negative lookbehind when filtering pandas columns-CodePudding

Consider this simple example

import pandas as pd

df = pd.DataFrame({'good_one' : [1,2,3],
                   'bad_one' : [1,2,3]})

Out[7]: 
   good_one  bad_one
0         1        1
1         2        2
2         3        3

In this artificial example I would like to filter the columns that DO NOT start with bad. I can use a regex condition on the pandas columns using .filter(). However, I am not able to make it work with a negative lookbehind.

See here

df.filter(regex = 'one')
Out[8]: 
   good_one  bad_one
0         1        1
1         2        2
2         3        3

but now

df.filter(regex = '(?<!bad).*')
Out[9]: 
   good_one  bad_one
0         1        1
1         2        2
2         3        3

does not do anything. Am I missing something?

Thanks

CodePudding user response：

Solution if need remove columns names starting by bad:

df = pd.DataFrame({'good_one' : [1,2,3],
                   'not_bad_one' : [1,2,3],
                   'bad_one' : [1,2,3]})


#https://stackoverflow.com/a/5334825/2901002
df1 = df.filter(regex=r'^(?!bad).*$')
print (df1)
   good_one  not_bad_one
0         1            1
1         2            2
2         3            3

^ asserts position at start of a line

Negative Lookahead (?!bad) Assert that the Regex below does not match bad matches

. matches any character

* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

$ asserts position at the end of a line

Solution for remove all columns with bad substring:

df2 = df.filter(regex=r'^(?!.*bad).*$')
print (df2)
   good_one
0         1
1         2
2         3

^ asserts position at start of a line

Negative Lookahead (?!.*bad) Assert that the Regex below does not match

. matches any character bad matches the characters bad literally

. matches any character

* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

$ asserts position at the end of a line