How to remove certain number values from a list in python?-CodePudding

So I have been have an issue at trying to remove certain string values that are numbers in a list:

["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]

How can I get the list to only show: ["December 31,2020", "testing"]? I did try a couple of the built in python functions, but could not figure out how to get the list I want.

Code:

number_list = ["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]
new_list = []
garbage_list = []
for i in number_list:
   if i.isdigit():
      garbage_list.append(i)
   else:
       new_list.append(i)
print(new_list)

Output: ["December 31, 2020", "10.00%", "$50", "1,452", "testing", "(1)", "(1000)"]

CodePudding user response：

If your only goal is to keep those two elements, you can do:

import string
number_list = ["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]
new_list = []
garbage_list = []

for elem in number_list:
    new_list.append(elem) if set(elem.lower()) & set(string.ascii_lowercase) else garbage_list.append(elem)

print(new_list)

The key idea is that I use the & operator (intersection) between the set of lowercase characters and the element we have reached in our iteration. If the set is nonempty, then that means there is a character and we want to keep it in our new set.

Note that string.ascii_lowercase is just the string abcde...xyz.

Output:

['December 31, 2020', 'testing']

CodePudding user response：

It would be much better if you could tell what all kind of values are kept in number_values like date/time (in other formats), currency (other formats like 500 won), etc which you may or may not want to keep.

I can see that you want to keep first a date format and a string, and exclude everything else, which can be done be like this:

number_list = ["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]
new_list = []
garbage_list = []
for i in number_list:
// only check for first or second character for digits
   if i[0].isdigit() or i[1].isdigit(): 
      garbage_list.append(i)
   else:
       new_list.append(i)
print(new_list)

Output:

['December 31, 2020', 'testing']

More information would require a robust solution.

CodePudding user response：

import re

number_list = ["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]

cleaned_number_list = [(i,re.sub("[^\w\s\.]", "", i)) for i in number_list]   # remove special characters except (dot)
# output: [('December 31, 2020', 'December 31 2020'), ('10.00%', '10.00'), ('$50', '50'), ('1,452', '1452'), ('7', '7'), ('testing', 'testing'), ('(1)', '1'), ('(1000)', '1000')]

non_number_list = [i[0] for i in a if not re.match("\d \.?\d*", i[1])]
print(non_number_list)
# output: ['December 31, 2020', 'testing']

Would be better if you could specify what formats of strings are you accepting, and what exact outcome do you want. This looks a little vague, but to solve your specific question, this should work.