Home > Software design >  Using T-SQL or C#, how can I determine the position and length of all substrings in a string?
Using T-SQL or C#, how can I determine the position and length of all substrings in a string?

Time:12-16

Given the following string (ignore double quotes):

"I101G2.2 OZ 001 0002200 L Y 0010000 "

I need to find the starting position of each substring and it's length. For example, using the string above, the first substring is I101G2.2 starts at position 0 and is 8 characters in length. Bear in mind that each space character (" ") should be considered a substring, always one character in length. So, the output should be something like:

**Sub-String Value**,**Starting Position**, **Length**
I101G2.2,0,8
{space},9,1
OZ,10,2
{space},12,1
{space},13,1
001,14,3
{space},17,1
0002200,18,7
{space},25,1
L,26,1
{space},27,1
{space},28,1
{space},29,1
{space},30,1
{space},31,1
{space},32,1
{space},33,1
{space},34,1
Y,35,1
{space},36,1
{space},37,1
{space},38,1
{space},39,1
{space},40,1
{space},41,1
{space},42,1
{space},43,1
0010000,44,7

etc...

We have a production system written in Cobol that outputs information in the above string format. I have a table that maps positions and string lengths to a column in another table. So, the idea is to get the position and length of the string and compare to the mapping table to determine what table column the belongs in. For instance 0002200 is the item class because it's at position 18, and is 7 characters in length.

Thanks in advance!

CodePudding user response:

I'm not sure of your exact requirements here, but this is a pretty standard string split with an ordinal type of situation.

Give this a whirl:

DECLARE @ThatsALongOne NVARCHAR(MAX) = 'I101G2.2,0,8
{space},9,1
OZ,10,2
{space},12,1
{space},13,1
001,14,3
{space},17,1
0002200,18,7
{space},25,1
L,26,1
{space},27,1
{space},28,1
{space},29,1
{space},30,1
{space},31,1
{space},32,1
{space},33,1
{space},34,1
Y,35,1
{space},36,1
{space},37,1
{space},38,1
{space},39,1
{space},40,1
{space},41,1
{space},42,1
{space},43,1
0010000,44,7';

DECLARE @WhaddaYaWant NVARCHAR(10) = '{space}';

;WITH RightRound AS (
SELECT @ThatsALongOne AS TheWholeShebang, SUBSTRING(@ThatsALongOne, 0, CHARINDEX(@WhaddaYaWant, @ThatsALongOne)) AS ThisUn, CHARINDEX(@WhaddaYaWant, @ThatsALongOne) AS Position, RIGHT(@ThatsALongOne,LEN(@ThatsALongOne)-CHARINDEX(@WhaddaYaWant,@ThatsALongOne)-LEN(@WhaddaYaWant)) AS WorkinOnIt, 1 AS Incidence, LEN(@ThatsALongOne) AS ThisLen
UNION ALL
SELECT TheWholeSheBang, SUBSTRING(WorkinOnIt, 0, CHARINDEX(@WhaddaYaWant, WorkinOnIt)), CHARINDEX(@WhaddaYaWant, WorkinOnIt), CASE WHEN CHARINDEX(@WhaddaYaWant,RightRound.WorkinOnIt) > 0 THEN RIGHT(WorkinOnIt,(LEN(WorkinOnIt)-CHARINDEX(@WhaddaYaWant,WorkinOnIt))-LEN(@WhaddaYaWant)) ELSE '' END, Incidence   1, LEN(WorkinOnIt) AS thisLen
  FROM RightRound
 WHERE CHARINDEX(@WhaddaYaWant, WorkinOnIt) <> 0
)

SELECT ThisUn, Position, Incidence, ThisLen
  FROM RightRound
 OPTION (MAXRECURSION 0)

ThisUn              Position    Incidence   ThisLen
I101G2.2,0,8        15          1           368
9,1 OZ,10,2         15          2           346
12,1                7           3           324
13,1001,14,3        17          4           310
17,10002200,18,7    21          5           286
25,1L,26,1          15          6           258
27,1                7           7           236
..

CodePudding user response:

I'm impressed you still got Cobol running, my mom used to code stuff in Cobol... In the 80s!

Here's my try (it handles all spaces even those at the end of the string):

IF object_id('tempdb..#t') IS NOT NULL
    DROP TABLE #t
GO
CREATE TABLE #t (str VARCHAR(100))

INSERT INTO #t
VALUES ('I101G2.2 OZ      001 0002200 L Y 0010000')

;WITH cte AS(
    SELECT  cast(CASE WHEN charindex(' ', t.str) = 1 THEN ' ' ELSE LEFT(str, charindex(' ', str)) END AS varchar(100)) str
    ,   0 AS startPosition
    ,   CASE WHEN charindex(' ', t.str) = 1 THEN 1 ELSE charindex(' ', t.str) END AS nextIndex
    ,   stuff(str, 1,charindex(' ', t.str)-1, '') AS strLeft 
    FROM    #t t
--  
    UNION ALL
    SELECT  cast(CASE WHEN charindex(' ', t.strLeft) = 1 THEN ' ' WHEN charindex(' ', t.strleft) = 0 THEN strLeft ELSE LEFT(strLeft, charindex(' ', strLeft)) END AS varchar(100))
    ,   startPosition   nextIndex
    ,   CASE WHEN charindex(' ', t.strLeft) = 1 THEN 1 ELSE charindex(' ', t.strLeft) END
    ,   stuff(strLeft, 1,charindex(' ', t.strLeft)   CASE WHEN charindex(' ', t.strleft) <> 1 THEN -1 ELSE 0 END, '')
    FROM    CTE t
    WHERE   datalength(strLeft) > 0
    )
SELECT  str, startPosition, datalength(str) AS length
FROM    cte
OPTION(maxrecursion 0);

CodePudding user response:

To do this in c#, you can simply walk the string in a loop. As we encounter non-space characters, we can append them to a StringBuilder. When we encounter a space, we would then first add the characters we've captured so far, along with their length and starting position, to a list of strings (each representing a substring), clear the StringBuilder for the next substring, and then add the space character and it's position and length. At the end of the loop, if our StringBuilder has a substring in it, add it to the list and then return the list.

One bug in your original example is that the first substring started at position 0 and had 8 characters. This means that the next substring should start at position 8, but your example shows it starting at 9.

Here's one way to do it:

public static List<string> GetParts(string input)
{
    if (input == null) return null;
    var result = new List<string>();
    if (input.Length == 0) return result;

    var currentPart = new StringBuilder();

    for (int i = 0; i < input.Length; i  )
    {
        if (input[i] == ' ')
        {
            if (currentPart.Length > 0)
            {
                var part = currentPart.ToString();
                result.Add($"{part},{i - part.Length},{part.Length}");
                currentPart.Clear();
            }

            result.Add($"{{space}},{i},1");
        }
        else
        {
            currentPart.Append(input[i]);
        }
    }

    if (currentPart.Length > 0)
    {
        var part = currentPart.ToString();
        result.Add($"{part},{input.Length - part.Length - 1},{part.Length}");
    }

    return result;
}

And an example calling it:

static void Main(string[] args)
{
    var input = 
        "I101G2.2 OZ  001 0002200 L       Y        0010000                     ";
        
    var parts = GetParts(input);
    parts.ForEach(Console.WriteLine);

    Console.Write("\n\nDone. Press any key to exit...");
    Console.ReadKey();
}
  • Related