Remove strings with punctuation from list-CodePudding

I am learning Python and I am extracting some datas in a csv. My goal is to put one column, for example 'names' into a list. This, I know how to do it.

After, I would like to delete inccorect names into the list. For example :

lists = [Anderson, Byrne, Clark, Cooper, Davies, Evans,  Miller, Moore, Thom@s, W!lson]

and I want :

lists = [Anderson, Byrne, Clark, Cooper, Davies, Evans, Moore]

Because some names has or @ or ! inside. I don't want to change the character, I want to delete those names.

I tried, a classic if else but it didn't work.

CodePudding user response：

You can use str.isalpha:

names = [
    'Anderson',
    'Byrne',
    'Clark',
    'Cooper',
    'Davies',
    'Evans',
    ' Miller',
    'Moore',
    'Thom@s',
    'W!lson'
]

only_names = [n for n in names if n.isalpha()]
# only_names = list(filter(str.isalpha, names))

print(only_names)

CodePudding user response：

Assuming all elements in the list are strings, you can get the result you want with:

names = [
    'Anderson',
     'Byrne',
     'Clark',
     'Cooper',
     'Davies',
     'Evans',
     ' Miller',
     'Moore',
     'Thom@s',
     'W!lson'
]

names_filtered = [
    name for name in names if ''.join(c for c in name if c.isalpha()) == name
]

CodePudding user response：

You can use regular expression for it. Python supports a regular expression module, called re.

For example, a name consists of only alphabets. You can use list comprehension and re to filter only valid names as follows:

import re

names = [
    'Anderson',
     'Byrne',
    'Clark',
    'Cooper',
    'Davies',
    'Evans',
    ' Miller',
    'Moore',
    'Thom@s',
    'W!lson'
]

only_names = [n for n in names if re.match("^[A-Za-z]*$", n)]

print(only_names)
# ['Anderson', 'Byrne', 'Clark', 'Cooper', 'Davies', 'Evans', 'Moore']

For more information, check the re module documentation.

CodePudding user response：

For filtering valid names:

names = ["john", "mich@el", "suraj"]
valid_names = [name for name in names if name.isalpha()]
print(valid_names)

Output: ['john', 'suraj']

As asked by OP in the comments below, for filtering valid emails with valid usernames:

emails = ["[email protected]", "[email protected]", "[email protected]"]
valid_emails = []
allowed = "._"

for email in emails:
    username = email.split("@")[0]

    for ch in allowed:
        username = username.replace(ch, "")
    
    if username.isalnum():
        valid_emails.append(email)

print(valid_emails)

Output: ['[email protected]', '[email protected]']

Hope this helps!

CodePudding user response：

this code is is running without creating an additional list. it just modifies the list in-place by using a for loop:

lst = ['Anderson', 'Byrne', 'Clark', 'Cooper', 'Davies', 'Evans', ' Miller', 'Moore', 'Thom@s', 'W!lson']
for ind in range(len(lst) -1, -1, -1):
    if not lst[ind].isalpha():
        del lst[ind]
print(lst)