I have a dataframe and I need to identify values that contain numbers or symbols in order to eliminate them. Only letters and spaces are allowed. The size of the dataframe is quite big and what I am trying doesn't result in anything:
df.NAME=df.NAME.replace(r"(/^[a-zA-Z\s]*$/)",np.nan,regex=True)
Any suggestions? Thank you
CodePudding user response:
If you need to only keep items with letters and spaces only, you need a silution based on Series.str.contains
, not replace
:
df['NAME']=df[df['NAME'].str.contains(r"^[a-zA-Z\s]*$", regex=True)]
That will keep all those items in NAME
column that only contain ASCII letters or/and whitespaces.
To support any Unicode letters, you'd need
df['NAME']=df[df['NAME'].str.contains(r"^(?:[^\W\d_]|\s)*$", regex=True)]
where (?:[^\W\d_]|\s)
matches either any Unicode letter (together with most diacritics) or a whitespace char.