Home > Enterprise >  write RE to get string in between first alphabet to last alphabet?
write RE to get string in between first alphabet to last alphabet?

Time:09-12

I need to print the 2 missing strings KMJD23KN0008393 and KMJD23KN0008394 but what I am receiving is KMJD23KN8393 and KMJD23KN8394 .I need those missing zeros also in our list.

ll = ['KMJD23KN0008391','KMJD23KN0008392','KMJD23KN0008395','KMJD23KN0008396']
missList=[]
for i in ll:
    reList=re.findall(r"[^\W\d_] |\d ", i)
    print(reList)

CodePudding user response:

The issue can be decomposed into three parts:

  1. Extracting the trailing number string
  2. Interpreting that substring as the actual number, which should form a consecutive sequence
  3. Re-formatting the missing items based on the surrounding items.

There are multiple assumptions implicit in these items. You need to be aware of these, and ideally make them explicit. In the following, I’ve worked with the following assumptions:

  • Anything that forms a number at the end of the string is considered. Everything before that is a prefix and is assumed to be identical throughout the series.
  • The trailing numbers in the series always have the same width.
  • The items in the list are sorted in ascending order by their trailing number.

The following code implements these assumptions:

all_missing = []
last_num = int(re.search(r'\d $', ll[-1])[0])
prefix = re.match('.*\D', ll[0])[0]

for item in ll:
    num_str = re.search(r'\d $', item)[0]
    num = int(num_str)
    num_width = len(num_str)
    for missing in range(last_num   1, num):
        all_missing.append(f'{prefix}{missing:0{num_width}}')

    last_num = num

print(all_missing)

Some notes here:

  • To extract the trailing number, a very simple regex is sufficient: \d $. That is: one or more digits, until the end of the string.
  • Conversely, to extract the prefix, we search for any sequence of arbitrary characters where the last character is a non-digit. That is: .*\D.
  • To re-format the missing items, we concatenate the prefix with the missing number, and we pad the missing number with zeros (from left) until it is of the expected width. This is achieved by using Python’s f-strings with the format specifier '0{num_width}'.
  • Related