Home > database >  negation / inversion of python pandas DataFrame.filter
negation / inversion of python pandas DataFrame.filter

Time:03-26

How do I filter columns of a data frame not containing a given string in their label?

DataFrame.filter allows, for example, to select all columns of a data frame whose label contain a provided string.

df = pd.DataFrame(
    np.array(([1, 2, 3], [4, 5, 6])),
    columns=['beat', 'meat', 'street']
)
df.filter(like="eat", axis=1) ### yields the columns "beat" and "meat".

Is there a way to revert this logic, so that I may only keep those columns not containing "eat"? Alternatively: Is there a way to drop columns containing "eat"?

CodePudding user response:

Use regex parameter:

print (df.filter(regex=r'^(?!.*eat).*$'))

CodePudding user response:

Based on @jezrael's answer, one could parameterize the solution like this:

import re
def neg_filter(df, not_like, axis):
    """Only keep labels from axis, which satisfy `not_like in label == False`."""
    pattern = r"^(?!.*"   re.escape(not_like)   r").*$"
    return df.filter(regex=pattern, axis=axis)
  • Related