I have data frame(df)
as shown below and I want to drop records which contain special characters and numbers.
INPUT
df
A B
ASR IN
33AB ST
AS_TY YT
45 TYY IN
TY HG SG
TRD US
YTR WS
EXPECTED OUTPUT
A B
ASR IN
TRD US
YTR WS
How this be achieved in pandas dataframe
CodePudding user response:
This is a good job for a regex.
If you want to use only A:
out = df[df['A'].str.match('(?i)[a-z] $')]
For all columns:
out = df[df.apply(lambda c: c.str.match('(?i)[a-z] $')).all(1)]
output:
A B
0 ASR IN
5 TRD US
6 YTR WS
CodePudding user response:
Check isalpha
df[df.A.astype(str).str.isalpha()]
CodePudding user response:
You can use regex match to find same, example
df[df.A.str.contains('/^[A-Za-z] $/', regex= True, na=False)]
CodePudding user response:
The fastest way to do this is by using a helper column to mask the rows that contain non alpha numerical characters as follows :
import pandas as pd
# dummy example
df = pd.DataFrame({'A' : ['ABC', 'acfx', 'a34xxf_', 'a_9R', 'rty']})
# create helper column
df['mask'] = df['A'].apply(lambda x : x.isalnum())
# get the rows that contain only alpha numerical characters
df[df['mask']==True].drop(['mask'], axis = 1)
>>>
A
0 ABC
1 acfx
4 rty