i read an txt file with syslog stuff
Oct 3 12:09:01 webv2 CRON[1903]: (root) CMD (sudo /usr/bin/python3 /var/www/security/py_scripts/security_stuff.py 01_report_connections 0 &)
Oct 3 12:09:01 webv2 CRON[1906]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Oct 3 12:09:03 webv2 systemd[1]: Starting Clean php session files...
...
..
.
in an array named data (= insert len 6800)
data = string.splitlines()
,which should be filtered by an regex array
regexArray = [
['CRON:', [
'sec_stuff\.py report_cons'
,'\[ -x /usr/lib/php/sessionclean \] && if \[ ! -d /run/systemd/system \]; then /usr/lib/php/sessionclean; fi'
,'...'
,'..'
,'.'
]
],
[...]
]
over an normal function called
def search_regexStuff(what, strings, regexString = ''):
if what == 'allgemein':
return re.findall(r"" regexString "",strings)
,but the problem is, he found and delete only a part of each found regex stuff in the data array.
as example, for regex:
sec_stuff\.py report_cons
i have 2069 entries, but he delete in the data array only 1181. for other regex stuff is the same problem. for:
\[ -x /usr/lib/php/sessionclean \] && if \[ ! -d /run/systemd/system \]; then /usr/lib/php/sessionclean; fi
he found and delete 59 of 68
scope of that action is: i want to decrease the data len of that data array in each loop over pop or del to speed up the loop for the search. the rest of data array i write it in an other file. i cant find my fail why my code will not work. cant see the fail. =( plz help. thx
code:
for b in regexArray:
for c in b[1]:
regex = '.*' b[0][:-1] '.*' c '.*'
n = -1
for a in data:
n = 1
findLINE = search_regexStuff('allgemein', a, regex)
if len(findLINE) != 0: # found returned arraay not empty
del data[n]
n -= 1
o = ''
for i in data:
o = i '\n'
file = open('/folder/file_x.txt','w')
file.write(str(o))
file.close()
UPDATE (solution) and thx @timus:
i defined an extra function who throws me the new data array out to solve that problem
def cleanMyDataArray( data, regex):
o = ''; new_data = []
for a in data:
findLINE = search_regexStuff('allgemein', a, regex)
if len(findLINE) == 0: # not found
new_data.append( a )
return new_data
@code:
for b in regexArray:
for c in b[1]:
regex = '.*' b[0][:-1] '.*' c '.*'
data = cleanMyDataArray( data, regex)
thats it
CodePudding user response:
You're making a classic mistake: You remove items from a list while iterating over it. That tends to go south. Also, modifying a list with del
is usually not very efficient.
Example: A list numbers
from which you want to remove the even numbers. Your method
numbers = [1, 2, 3, 4, 5]
n = -1
for a in numbers:
n = 1
if a % 2 == 0:
del numbers[n]
n -= 1
print(numbers)
results in [1, 4, 5]
, which is obviously wrong. Why does that happen:
- Step: The first number is odd, so nothing happens, and
n
becomes0
. - Step: The next item
2
is even, so the list gets modified to[1, 3, 4, 5]
, andn
stays0
. - Step: Now the iteration grabs the item with index
2
, which is4
. Since it is even andn
is0 1 == 1
the item3
gets removed, the list is[1, 4, 5]
now, andn
stays0
. - Step: The remaining list has length
3
, so no item with index3
, thus the iteration stops.
How to fix this? Create a new list, ideally with a list comprehension:
numbers = [1, 2, 3, 4, 5]
numbers = [a for a in numbers if not a % 2 == 0]
print(numbers)
Result: [1, 3, 5]
Beyond that: You could use the pattern
for b in regexArray:
regex = b[0][:-1] '.*(' '|'.join(b[1]) ').*'
instead of iterating over b[1]
. But there are some odd parts in your use of regex:
- Why do you close your patterns with
.*
? - Why do you use
re.findall
instead ofre.search
? - Do you really use
'...'
,'..'
,'.'
instead of'\.\.\.'
etc.? A pattern likeCRON.*....*
matches anything that has at least 3 characters afterCRON
-- is that what you want?