Home > Back-end >  How remove records which are not alphabets inside specific column in pandas data frame?
How remove records which are not alphabets inside specific column in pandas data frame?

Time:06-14

I have data frame(df) as shown below and I want to drop records which contain special characters and numbers.

INPUT

df

 A     B
 ASR    IN
 33AB   ST
 AS_TY  YT
 45 TYY IN 
 TY HG  SG
 TRD    US
 YTR    WS

EXPECTED OUTPUT

 A     B
 ASR    IN
 TRD    US
 YTR    WS

How this be achieved in pandas dataframe

CodePudding user response:

This is a good job for a regex.

If you want to use only A:

out = df[df['A'].str.match('(?i)[a-z] $')]

For all columns:

out = df[df.apply(lambda c: c.str.match('(?i)[a-z] $')).all(1)]

output:

     A   B
0  ASR  IN
5  TRD  US
6  YTR  WS

CodePudding user response:

Check isalpha

df[df.A.astype(str).str.isalpha()]

CodePudding user response:

You can use regex match to find same, example

df[df.A.str.contains('/^[A-Za-z] $/', regex= True, na=False)]

CodePudding user response:

The fastest way to do this is by using a helper column to mask the rows that contain non alpha numerical characters as follows :

import pandas as pd

# dummy example
df = pd.DataFrame({'A' : ['ABC', 'acfx', 'a34xxf_', 'a_9R', 'rty']})

# create helper column
df['mask']  = df['A'].apply(lambda x : x.isalnum())

# get the rows that contain only alpha numerical characters
df[df['mask']==True].drop(['mask'], axis = 1)

>>>
    A
0   ABC
1   acfx
4   rty

  • Related