So I have a list of words below:
list = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3','cyber attack. Our', 'cyber intrusions, data']
What I want to do is to remove the phrases in the list if the third word has more than three characters. So the final list would be:
new_list = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks; 3','cyber attack. Our']
This is what I have so far but it also includes the phrases where the last word is more than three characters:
new_list = []
for phrase in list:
max_three_char = re.match('cyber\s\w{1,}(\.|,|;|\)|\/|:|"|])\s\w{,3}', phrase)
if max_three_char:
new_list.append(phrase)
CodePudding user response:
You could use a list comprehension as in
import re
lst = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3','cyber attack. Our', 'cyber intrusions, data']
pattern = re.compile(r'[, ] ')
new_lst = [item
for item in lst
for splitted in [pattern.split(item)]
if not (len(splitted) > 2 and len(splitted[2]) > 3)]
print(new_lst)
Which would yield
['cyber attacks, 28', 'cyber attacks. A', 'cyber attacks; 3', 'cyber attack. Our']
Don't name your variables after built-in things like list
, etc.
CodePudding user response:
No need for regex, you can use string.split
:
if len(my_phrase.split()[2]) <=3:
//process my_phrase
This works since there are spaces between the words.
CodePudding user response:
I would do:
import re
li = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3','cyber attack. Our', 'cyber intrusions, data']
>>> [s for s in li if re.search(r'(?<=\W)\w{1,3}$', s)]
['cyber attacks, 28', 'cyber attacks. A', 'cyber attacks; 3', 'cyber attack. Our']
Or if you can count on have a space delimiter:
>>> [s for s in li if len(s.split()[-1])<=3]
# same
CodePudding user response:
Since your separator is space, you don't need regex and you can do with python standard method string.split()
.
ls = ['cyber attacks, 28', 'cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3', 'cyber attack. Our',
'cyber intrusions, data']
def my_filter(i) -> bool:
sub_str = i.split(' ', 2)
if len(sub_str) < 3:
return False
return len(sub_str[2]) <= 3
print([i for i in ls if my_filter(i)])