How do I filter columns of a data frame not containing a given string in their label?
DataFrame.filter allows, for example, to select all columns of a data frame whose label contain a provided string.
df = pd.DataFrame(
np.array(([1, 2, 3], [4, 5, 6])),
columns=['beat', 'meat', 'street']
)
df.filter(like="eat", axis=1) ### yields the columns "beat" and "meat".
Is there a way to revert this logic, so that I may only keep those columns not containing "eat"? Alternatively: Is there a way to drop columns containing "eat"?
CodePudding user response:
Use regex
parameter:
print (df.filter(regex=r'^(?!.*eat).*$'))
CodePudding user response:
Based on @jezrael's answer, one could parameterize the solution like this:
import re
def neg_filter(df, not_like, axis):
"""Only keep labels from axis, which satisfy `not_like in label == False`."""
pattern = r"^(?!.*" re.escape(not_like) r").*$"
return df.filter(regex=pattern, axis=axis)