Home > Enterprise >  Python Clean Specific Elements in List of Lists
Python Clean Specific Elements in List of Lists

Time:12-01

I have a noisy list with 3 different rows that looks like this:

array = ['Apple Mug Seaweed Wallet Toilet Bear Toy Key Alcohol Paper',
         'cup, egg, pillow, leash, banana, raindrop, phone, animal, shirt, basket', 
         '1. Dog 2. America 3. Notebook 4. Moisturizer  5. ADHD 6. Balloon  7. Contacts 8. Blanket 9. Home 10. Pencil']

How do I index each element in each row to normalize all the strings and remove any "," as well as any numerical values to look like this:

array = ['Apple Mug Seaweed Wallet Toilet Bear Toy Key Alcohol Paper',
         'cup egg pillow leash banana raindrop phone animal shirt basket', 
         'Dog America Notebook Moisturizer ADHD Balloon Contacts Blanket Home Pencil']

I have tried:

 for i in array:
    for j in i:
        j.strip(" ")
        j.strip(",") 

However am confused by the order and sequence in which to store the words back into their specific row. Thanks

CodePudding user response:

One approach is to use list comprehension (optional) and regular expressions, where a pattern can be set to keep only alphabetical characters. (e.g.: [a-zA-Z] meaning one or more alpha characters.)

The str.join() function is used to combine the regex search output into a single string, which is added as an element to the output list.

For example:

import re

out = [' '.join(re.findall('[a-zA-Z] ', i)) for i in array]

Output:

['Apple Mug Seaweed Wallet Toilet Bear Toy Key Alcohol Paper',
 'cup egg pillow leash banana raindrop phone animal shirt basket',
 'Dog America Notebook Moisturizer ADHD Balloon Contacts Blanket Home Pencil']
  • Related