I have a list called "list", which consists of about 20K strings, and I need to remove from it elements which have "text": "" in it.
I'm creating a new clean list like this
clean_list = []
for i in list:
if '"text": ""' in i == False:
clean_list.append(i)
print(i)
But elements don't append and clean_list is empty. What can be the problem? Smth is wrong with the cycle.
How else can I get rid of some elements in the list?
CodePudding user response:
The reason this doesn't work is that the operators don't associate the way you think they do:
>>> '"text": ""' in "foo" == False
False
>>> ('"text": ""' in "foo") == False
True
Using in ... == False
is awkward/un-Pythonic in any case; it's better to do the more natural not in ...
:
>>> '"text": ""' not in "foo"
True
CodePudding user response:
First, you should not name variables with protected keywords like list.
For your use case you could use list comprehension:
clean_list = [string for string in list if '"text": ""' not in string]
CodePudding user response:
if '"text": ""' in i == False:
Don't use that syntax. The i == False
is unnecessary (and looks awkward), and in this specific case, it actually causes the problem you're having.
Use this syntax instead:
if '"text": ""' not in i:
If you want to know why this happens, keep reading.
This problem is due to operator chaining.
When you have an expression that contains two (or more) operators, such as this:
a < b < c:
Python treats that expression as if you had typed this:
a < b and b < c:
In your example, in
and ==
are both operators, so Python treated your expression as though you had typed this:
if '"text": ""' in i and i == False:
The first part of that is true, but the second part is not. So the expression as a whole is false.
CodePudding user response:
You shouldn't be using built-in named as variable names in Python.
s = 'This is a sample "text": "" and it should not have it.'
if r'"text": ""' in s:
print("Found.")
The output will be `Found.`
Now, with the help of this you can use:
clean_list = [i for i in list if r'"text": ""' not in i]
# This just creates a new list if an item `i` is found not having the pattern '"text": ""'. The r' refers to raw strings and can be helpful when using a lot of symbols and characters.