Remove strings that contain more 2 consecutive numbers-CodePudding

I have a list of numbers (in hex form) and I’m trying to remove the strings that have more than two repeated numbers so, for example

200005
200108
2AFAFA
2BBB40
244422

The the script would remove number 5 and 1 since there are three in a row. Forgive me I’m new to this and I’m probably using the wrong language when I’m searching so bare with me. I created the list with NumPy and now I’m trying to figure out how to delete the strings from the list that have more than 3 digits in a row. I think I’m using the wrong language. Please help!

CodePudding user response：

You can try it with:

def check(strings: list):
    _strings = strings
    for i in strings:
        val = ""
        for p in i:
            if i.isdigit() and val.endswith(p) and val[:-1].endswith(p): # Check all the conditions
                _strings.remove(i) # Remove the value from list
                break # Stop the loop once the value is removed from the list
            val  = p
    return _strings
    
print(check(["200005", "200108", "2AFAFA", "2BBB40", "244422"]))

It will give you the ouput:

['200108', '2AFAFA', '2BBB40']

CodePudding user response：

You can check if the value has three consecutive digits or not, as follows:

def is_duplicated(val):
    for i in range(len(val)-2):
        if val[i].isdigit(): # check whether it is digit or not
            if (val[i] == val[i 1]) and (val[i] == val[i 2]):
                return True
    return False

values = ['200005', '200108', '2AFAFA', '2BBB40', '244422']

for value in values:
    if not is_duplicated(value):
        print(value)

#200108
#2AFAFA
#2BBB40

CodePudding user response：

Here is an example:

numbers = ["200005", "200108", "2AFAFA", "2BBB40", "244422"]
numbers_copy = numbers
threshold = 3

for item in numbers_copy:
    count = 0
    current_char = ""
    for char in item:
        if current_char != char:
            current_char = char
            count = 1
        else:
            if char.isnumeric():
                count  = 1

        if count >= threshold:
            break

    if count >= threshold:
        numbers.remove(item)

print(numbers)

This will give the result:

['200108', '2AFAFA', '2BBB40']

You can change the 'threshold' value at the top to change the max number of duplicates.

CodePudding user response：

Since you want to evaluate these numbers as strings the right tool for the job isn't Numpy.

the tool you are looking for in python is called itertools.groupby()

Here is the documentation on it. Since they are all in hex format they are all numbers. (base 16)

import itertools as it
 
my_set = set(["200005", "200108", "2AFAFA", "2BBB40", "244422"])
drop_this = set()
for number in my_set:
    for key, group in it.groupby(number):
        if len(list(group)) >= 3:  #add "and key.isdigit()" if you did need actual number characters
            drop_this.add(number)
print(f"these have no groups of 3 {my_set-drop_this}")

Please don't say X is the wrong language for this. Anything with functions nowadays can be the right language, it just sounds whiny.