Home > Blockchain >  python code dont delete all same entries in the array over an regex search function
python code dont delete all same entries in the array over an regex search function

Time:10-25

i read an txt file with syslog stuff

Oct  3 12:09:01 webv2 CRON[1903]: (root) CMD (sudo /usr/bin/python3 /var/www/security/py_scripts/security_stuff.py 01_report_connections 0 &)
Oct  3 12:09:01 webv2 CRON[1906]: (root) CMD (  [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Oct  3 12:09:03 webv2 systemd[1]: Starting Clean php session files...
...
..
.

in an array named data (= insert len 6800)

data = string.splitlines()

,which should be filtered by an regex array

regexArray = [
  ['CRON:', [
     'sec_stuff\.py report_cons'
    ,'\[ -x /usr/lib/php/sessionclean \] && if \[ ! -d /run/systemd/system \]; then /usr/lib/php/sessionclean; fi'
    ,'...'
    ,'..'
    ,'.'
    ]
  ],
  [...] 
]

over an normal function called

def search_regexStuff(what, strings, regexString = ''):
  if what == 'allgemein':
    return re.findall(r"" regexString "",strings)

,but the problem is, he found and delete only a part of each found regex stuff in the data array.

as example, for regex:

sec_stuff\.py report_cons

i have 2069 entries, but he delete in the data array only 1181. for other regex stuff is the same problem. for:

\[ -x /usr/lib/php/sessionclean \] && if \[ ! -d /run/systemd/system \]; then /usr/lib/php/sessionclean; fi

he found and delete 59 of 68

scope of that action is: i want to decrease the data len of that data array in each loop over pop or del to speed up the loop for the search. the rest of data array i write it in an other file. i cant find my fail why my code will not work. cant see the fail. =( plz help. thx

code:

for b in regexArray:
  for c in b[1]:
    regex = '.*' b[0][:-1] '.*' c '.*'
    n = -1
    for a in data:
      n  = 1
      findLINE = search_regexStuff('allgemein', a, regex)
      if len(findLINE) != 0: # found returned arraay not empty
        del data[n]
        n -= 1
o = ''
for i in data:
  o  = i '\n'
file = open('/folder/file_x.txt','w')
file.write(str(o))
file.close()  

UPDATE (solution) and thx @timus:

i defined an extra function who throws me the new data array out to solve that problem

def cleanMyDataArray( data, regex):
o = ''; new_data = []
for a in data:
  findLINE = search_regexStuff('allgemein', a, regex)
  if len(findLINE) == 0: # not found
    new_data.append( a )
return new_data

@code:

for b in regexArray:
  for c in b[1]:
    regex = '.*' b[0][:-1] '.*' c '.*'
    data = cleanMyDataArray( data, regex)

thats it

CodePudding user response:

You're making a classic mistake: You remove items from a list while iterating over it. That tends to go south. Also, modifying a list with del is usually not very efficient.

Example: A list numbers from which you want to remove the even numbers. Your method

numbers = [1, 2, 3, 4, 5]
n = -1
for a in numbers:
    n  = 1
    if a % 2 == 0:
        del numbers[n]
        n -= 1
print(numbers)

results in [1, 4, 5], which is obviously wrong. Why does that happen:

  1. Step: The first number is odd, so nothing happens, and n becomes 0.
  2. Step: The next item 2 is even, so the list gets modified to [1, 3, 4, 5], and n stays 0.
  3. Step: Now the iteration grabs the item with index 2, which is 4. Since it is even and n is 0 1 == 1 the item 3 gets removed, the list is [1, 4, 5] now, and n stays 0.
  4. Step: The remaining list has length 3, so no item with index 3, thus the iteration stops.

How to fix this? Create a new list, ideally with a list comprehension:

numbers = [1, 2, 3, 4, 5]
numbers = [a for a in numbers if not a % 2 == 0]
print(numbers)

Result: [1, 3, 5]

Beyond that: You could use the pattern

for b in regexArray:
    regex = b[0][:-1]   '.*('   '|'.join(b[1])   ').*'

instead of iterating over b[1]. But there are some odd parts in your use of regex:

  • Why do you close your patterns with .*?
  • Why do you use re.findall instead of re.search?
  • Do you really use '...', '..', '.' instead of '\.\.\.' etc.? A pattern like CRON.*....* matches anything that has at least 3 characters after CRON -- is that what you want?
  • Related