Home > Software design >  Extract the substrings under certain conditions
Extract the substrings under certain conditions

Time:06-06

I have the following string in Excel: C37S25FF0000TD. In 7 different cells, I need to extract 7 substring according to these directions:

  • The first cell must contain the letters up to the first number. In this case it must contain only C.
  • The second cell must contain the first number that appears (this number can also have more than two digits). In this case it must contain only 37.
  • The third cell must contain the following letters, up to the next number. In this case only S.
  • The fourth cell must contain the next two digits (they are always two digits). In this case 25.
  • The fifth cell must contain the next 6 characters. In this case FF0000.
  • The sixth cell must contain the next character. In this case T.
  • The seventh cell must contain the last character. In this case D.

Can someone help me? Since substrings can have variable length I don't know how to use the function MID() correctly. Thanks in advance.

CodePudding user response:

Reminded me of enter image description here

Formula in B1:

=LET(A,A1,B,LEFT(A,LEN(A)-8),C,TEXTSPLIT(B,SEQUENCE(10,,0),,1),D,TEXTSPLIT(B,C,,1),E,MID(RIGHT(A,8),{1,7,8},{6,1,1}),HSTACK(TOROW(VSTACK(C,D),,1),E))

CodePudding user response:

Edited to allow for leading zero's in second substring

Here is a set of formulas that take into account the fact that some of the substrings may be of varying length.

Some of the formulas use functions only found in Office 365. However, substitutes for earlier versions are also available:

Letters up to first number: = LEFT(A1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))-1)

Next first full number (variable number of digits: 
    =LET(s,MID(A1,LEN(B1) 1,LEN(A1)),
       arr, MID(s,SEQUENCE(LEN(s)),1),
       len, MATCH(TRUE,ISERR(-arr),0),
       LEFT(s,len-1))

Subsequent letter(s): 
        =LET(s,MID(A1,SUM(LEN(B1:B2)) 1,LEN(A1)),
           pos,MIN(FIND({0,1,2,3,4,5,6,7,8,9},s&"0123456789")),
           LEFT(s,pos-1))

Next two characters: =MID(A1,SUM(LEN(B1:B3)) 1,2)

For the subsequent substrings, we can count from the right:
   First 6 of the last 8 characters: =MID(A1,LEN(A1)-7,6)

    Penultimate character: =MID(A1,LEN(A1)-1,1)

    Last character:  =RIGHT(A1)

enter image description here

CodePudding user response:

So, based on the single example you gave and making no assumptions about changing positions:

enter image description here

You can edit to deal with any of the numbers that may contain extra digits etc by expanding on the technique using len() and find() as necessary.

  • Related