Home > Blockchain >  Drop pandas column with constant alphanumeric values
Drop pandas column with constant alphanumeric values

Time:04-05

I have a dataframe df that contains around 2 million records. Some of the columns contain only alphanumeric values (e.g. "wer345", "gfer34", "123fdst").

Is there a pythonic way to drop those columns (e.g. using isalnum())?

CodePudding user response:

Apply Series.str.isalnum column-wise to mask all the alphanumeric values of the DataFrame. Then use DataFrame.all to find the columns that only contain alphanumeric values. Invert the resulting boolean Series to select only the columns that contain at least one non-alphanumeric value.

is_alnum_col = df.apply(lambda col: col.str.isalnum()).all()
res = df.loc[:, ~is_alnum_col]

Example

import pandas as pd

df = pd.DataFrame({
    'a': ['aas', 'sd12', '1232'],
    'b': ['sdds', 'nnm!!', 'ab-2'],
    'c': ['sdsd', 'asaas12', '12.34'],
})

is_alnum_col = df.apply(lambda col: col.str.isalnum()).all()
res = df.loc[:, ~is_alnum_col]

Output:

>>> df

      a      b        c
0   aas   sdds     sdsd
1  sd12  nnm!!  asaas12
2  1232   ab-2    12.34

>>> df.apply(lambda col: col.str.isalnum())

      a      b      c
0  True   True   True
1  True  False   True
2  True  False  False

>>> is_alnum_col

a     True
b    False
c    False
dtype: bool

>>> res

       b        c
0   sdds     sdsd
1  nnm!!  asaas12
2   ab-2    12.34
  • Related