Home > database >  How to determine that the value in df has recurrent sequence from 7 and above numbers
How to determine that the value in df has recurrent sequence from 7 and above numbers

Time:12-14

I have dataframe:

df1 = pd.DataFrame({'number': ['1111112357896', '45226212354444', '150000000064', '5485329999999', '4589622567431']})

Question: To find values where value has recurrent sequence from 7 and above numbers

number repeat
1111112357896 0
45226212354444 0
150000000064 1
5485329999999 1
4589622567431 0

CodePudding user response:

Use a regex with str.contains:

df1['repeat'] = df1['number'].str.contains(r'(\d)\1{6}').astype(int)

Regex:

(\d)     # match and capture a digit
\1{6}    # match the captured digit 6 more times

Output:


           number  repeat
0   1111112357896       0
1  45226212354444       0
2    150000000064       1
3   5485329999999       1
4   4589622567431       0

CodePudding user response:

Here's an approach:

def find_repeats(numbers, cutoff=7):
    repeated_numbers = []
    curr_n = None
    count = 0
    for n in str(numbers):
        if n == curr_n:
            count  = 1
            continue
            
        if count >= cutoff:
            repeated_numbers.append(curr_n)
        curr_n = n
        count = 1

    # check the end of the string as well
    if count >= cutoff:
        repeated_numbers.append(curr_n)
        
    return len(repeated_numbers)

df1 = pd.DataFrame({'number': ['1111112357896', '45226212354444', '150000000064', '5485329999999', '4589622567431']})
df1['repeat'] = df1.number.apply(find_repeats)
  • Related