So I have been have an issue at trying to remove certain string values that are numbers in a list:
["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]
How can I get the list to only show: ["December 31,2020", "testing"]
? I did try a couple of the built in python functions, but could not figure out how to get the list I want.
Code:
number_list = ["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]
new_list = []
garbage_list = []
for i in number_list:
if i.isdigit():
garbage_list.append(i)
else:
new_list.append(i)
print(new_list)
Output: ["December 31, 2020", "10.00%", "$50", "1,452", "testing", "(1)", "(1000)"]
CodePudding user response:
If your only goal is to keep those two elements, you can do:
import string
number_list = ["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]
new_list = []
garbage_list = []
for elem in number_list:
new_list.append(elem) if set(elem.lower()) & set(string.ascii_lowercase) else garbage_list.append(elem)
print(new_list)
The key idea is that I use the &
operator (intersection) between the set of lowercase characters and the element we have reached in our iteration. If the set is nonempty, then that means there is a character and we want to keep it in our new set.
Note that string.ascii_lowercase is just the string abcde...xyz
.
Output:
['December 31, 2020', 'testing']
CodePudding user response:
It would be much better if you could tell what all kind of values are kept in number_values
like date/time (in other formats), currency (other formats like 500 won
), etc which you may or may not want to keep.
I can see that you want to keep first a date format and a string, and exclude everything else, which can be done be like this:
number_list = ["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]
new_list = []
garbage_list = []
for i in number_list:
// only check for first or second character for digits
if i[0].isdigit() or i[1].isdigit():
garbage_list.append(i)
else:
new_list.append(i)
print(new_list)
Output:
['December 31, 2020', 'testing']
More information would require a robust solution.
CodePudding user response:
import re
number_list = ["December 31, 2020", "10.00%", "$50", "1,452", "7", "testing", "(1)", "(1000)"]
cleaned_number_list = [(i,re.sub("[^\w\s\.]", "", i)) for i in number_list] # remove special characters except (dot)
# output: [('December 31, 2020', 'December 31 2020'), ('10.00%', '10.00'), ('$50', '50'), ('1,452', '1452'), ('7', '7'), ('testing', 'testing'), ('(1)', '1'), ('(1000)', '1000')]
non_number_list = [i[0] for i in a if not re.match("\d \.?\d*", i[1])]
print(non_number_list)
# output: ['December 31, 2020', 'testing']
Would be better if you could specify what formats of strings are you accepting, and what exact outcome do you want. This looks a little vague, but to solve your specific question, this should work.