I have dataframe:
df1 = pd.DataFrame({'number': ['1111112357896', '45226212354444', '150000000064', '5485329999999', '4589622567431']})
Question: To find values where value has recurrent sequence from 7 and above numbers
number | repeat |
---|---|
1111112357896 | 0 |
45226212354444 | 0 |
150000000064 | 1 |
5485329999999 | 1 |
4589622567431 | 0 |
CodePudding user response:
Use a regex with str.contains
:
df1['repeat'] = df1['number'].str.contains(r'(\d)\1{6}').astype(int)
Regex:
(\d) # match and capture a digit
\1{6} # match the captured digit 6 more times
Output:
number repeat
0 1111112357896 0
1 45226212354444 0
2 150000000064 1
3 5485329999999 1
4 4589622567431 0
CodePudding user response:
Here's an approach:
def find_repeats(numbers, cutoff=7):
repeated_numbers = []
curr_n = None
count = 0
for n in str(numbers):
if n == curr_n:
count = 1
continue
if count >= cutoff:
repeated_numbers.append(curr_n)
curr_n = n
count = 1
# check the end of the string as well
if count >= cutoff:
repeated_numbers.append(curr_n)
return len(repeated_numbers)
df1 = pd.DataFrame({'number': ['1111112357896', '45226212354444', '150000000064', '5485329999999', '4589622567431']})
df1['repeat'] = df1.number.apply(find_repeats)