Home > Software engineering >  Issue while filtering out a string with special characters and numbers in Pnadas
Issue while filtering out a string with special characters and numbers in Pnadas

Time:06-20

I have data frame(location) as shown below. I have also pasted my current code below but it filters out all record containing numbers and specials characters.

My issue lies when there is a space character between words eg NEWYORK CITY , NEW YORK CITY . I dont filter out space character between words .

INPUT

location.head(8)

    CITY        COUNTRY
    AGNIN34         FR
    (REYDON)        GB
    MARSCIANO       IT
   SANXIANG TOWN    CN
    SIZIANO         IT
    APELDOORN       NL
    REYDON          GB
    NEWYORK CITY    US

My current code:

out = location[location.apply(lambda c: c.str.match('(?i)[a-z] $')).all(1)]

Expected Output

        CITY        COUNTRY
        MARSCIANO       IT
       SANXIANG TOWN    CN
        SIZIANO         IT
        APELDOORN       NL
        REYDON          GB
        NEWYORK CITY    US

How can this be done?

CodePudding user response:

Check

out = location[location.CITY.astype(str).str.match('^[a-zA-Z ]*$')]

CodePudding user response:

Use str.contains along with the na=False flag set:

out = location[location["CITY"].str.contains(r'^[A-Za-z ] $', na=False, regex=True)]
  • Related