Home > Software engineering >  Replace by NaN if string contains digits or symbols
Replace by NaN if string contains digits or symbols

Time:08-11

I have a dataframe and I need to identify values that contain numbers or symbols in order to eliminate them. Only letters and spaces are allowed. The size of the dataframe is quite big and what I am trying doesn't result in anything:

df.NAME=df.NAME.replace(r"(/^[a-zA-Z\s]*$/)",np.nan,regex=True)

Any suggestions? Thank you

CodePudding user response:

If you need to only keep items with letters and spaces only, you need a silution based on Series.str.contains, not replace:

df['NAME']=df[df['NAME'].str.contains(r"^[a-zA-Z\s]*$", regex=True)]

That will keep all those items in NAME column that only contain ASCII letters or/and whitespaces.

To support any Unicode letters, you'd need

df['NAME']=df[df['NAME'].str.contains(r"^(?:[^\W\d_]|\s)*$", regex=True)]

where (?:[^\W\d_]|\s) matches either any Unicode letter (together with most diacritics) or a whitespace char.

  • Related