Home > Blockchain >  python code dont delete all same entries in the array over an regex search function
python code dont delete all same entries in the array over an regex search function


i read an txt file with syslog stuff

Oct  3 12:09:01 webv2 CRON[1903]: (root) CMD (sudo /usr/bin/python3 /var/www/security/py_scripts/security_stuff.py 01_report_connections 0 &)
Oct  3 12:09:01 webv2 CRON[1906]: (root) CMD (  [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Oct  3 12:09:03 webv2 systemd[1]: Starting Clean php session files...

in an array named data (= insert len 6800)

data = string.splitlines()

,which should be filtered by an regex array

regexArray = [
  ['CRON:', [
     'sec_stuff\.py report_cons'
    ,'\[ -x /usr/lib/php/sessionclean \] && if \[ ! -d /run/systemd/system \]; then /usr/lib/php/sessionclean; fi'

over an normal function called

def search_regexStuff(what, strings, regexString = ''):
  if what == 'allgemein':
    return re.findall(r"" regexString "",strings)

,but the problem is, he found and delete only a part of each found regex stuff in the data array.

as example, for regex:

sec_stuff\.py report_cons

i have 2069 entries, but he delete in the data array only 1181. for other regex stuff is the same problem. for:

\[ -x /usr/lib/php/sessionclean \] && if \[ ! -d /run/systemd/system \]; then /usr/lib/php/sessionclean; fi

he found and delete 59 of 68

scope of that action is: i want to decrease the data len of that data array in each loop over pop or del to speed up the loop for the search. the rest of data array i write it in an other file. i cant find my fail why my code will not work. cant see the fail. =( plz help. thx


for b in regexArray:
  for c in b[1]:
    regex = '.*' b[0][:-1] '.*' c '.*'
    n = -1
    for a in data:
      n  = 1
      findLINE = search_regexStuff('allgemein', a, regex)
      if len(findLINE) != 0: # found returned arraay not empty
        del data[n]
        n -= 1
o = ''
for i in data:
  o  = i '\n'
file = open('/folder/file_x.txt','w')

UPDATE (solution) and thx @timus:

i defined an extra function who throws me the new data array out to solve that problem

def cleanMyDataArray( data, regex):
o = ''; new_data = []
for a in data:
  findLINE = search_regexStuff('allgemein', a, regex)
  if len(findLINE) == 0: # not found
    new_data.append( a )
return new_data


for b in regexArray:
  for c in b[1]:
    regex = '.*' b[0][:-1] '.*' c '.*'
    data = cleanMyDataArray( data, regex)

thats it

CodePudding user response:

You're making a classic mistake: You remove items from a list while iterating over it. That tends to go south. Also, modifying a list with del is usually not very efficient.

Example: A list numbers from which you want to remove the even numbers. Your method

numbers = [1, 2, 3, 4, 5]
n = -1
for a in numbers:
    n  = 1
    if a % 2 == 0:
        del numbers[n]
        n -= 1

results in [1, 4, 5], which is obviously wrong. Why does that happen:

  1. Step: The first number is odd, so nothing happens, and n becomes 0.
  2. Step: The next item 2 is even, so the list gets modified to [1, 3, 4, 5], and n stays 0.
  3. Step: Now the iteration grabs the item with index 2, which is 4. Since it is even and n is 0 1 == 1 the item 3 gets removed, the list is [1, 4, 5] now, and n stays 0.
  4. Step: The remaining list has length 3, so no item with index 3, thus the iteration stops.

How to fix this? Create a new list, ideally with a list comprehension:

numbers = [1, 2, 3, 4, 5]
numbers = [a for a in numbers if not a % 2 == 0]

Result: [1, 3, 5]

Beyond that: You could use the pattern

for b in regexArray:
    regex = b[0][:-1]   '.*('   '|'.join(b[1])   ').*'

instead of iterating over b[1]. But there are some odd parts in your use of regex:

  • Why do you close your patterns with .*?
  • Why do you use re.findall instead of re.search?
  • Do you really use '...', '..', '.' instead of '\.\.\.' etc.? A pattern like CRON.*....* matches anything that has at least 3 characters after CRON -- is that what you want?
  • Related