I have a dataframe df
that contains around 2 million records.
Some of the columns contain only alphanumeric values (e.g. "wer345", "gfer34", "123fdst").
Is there a pythonic way to drop those columns (e.g. using isalnum()
)?
CodePudding user response:
Apply Series.str.isalnum
column-wise to mask all the alphanumeric values of the DataFrame. Then use DataFrame.all
to find the columns that only contain alphanumeric values. Invert the resulting boolean Series to select only the columns that contain at least one non-alphanumeric value.
is_alnum_col = df.apply(lambda col: col.str.isalnum()).all()
res = df.loc[:, ~is_alnum_col]
Example
import pandas as pd
df = pd.DataFrame({
'a': ['aas', 'sd12', '1232'],
'b': ['sdds', 'nnm!!', 'ab-2'],
'c': ['sdsd', 'asaas12', '12.34'],
})
is_alnum_col = df.apply(lambda col: col.str.isalnum()).all()
res = df.loc[:, ~is_alnum_col]
Output:
>>> df
a b c
0 aas sdds sdsd
1 sd12 nnm!! asaas12
2 1232 ab-2 12.34
>>> df.apply(lambda col: col.str.isalnum())
a b c
0 True True True
1 True False True
2 True False False
>>> is_alnum_col
a True
b False
c False
dtype: bool
>>> res
b c
0 sdds sdsd
1 nnm!! asaas12
2 ab-2 12.34