Home > Software engineering >  Regex get value of a specific word
Regex get value of a specific word

Time:12-21

I've got the following value:

--> Some comment

CREATE VIEW ABC
   AS SELECT
   Z.NUMBER                                       AS    ID,
   Z.LANGUAGE                                     AS    LNG,
   SUBSTR(Z.VALUE_01,01,02)                       AS    RUN_NUMB,
   SUBSTR(Z.TXT_VALUE_01,01,79)                   AS    TXT
   FROM
   MYTABLE Z
   WHERE     ID                  = '0033'
   AND       LNG                 = 'DE'

I want a regular expression, where I can pass the value (or a part of the value) before the AS and I'll receive the AS-Value, e.g.

Z.NUMBER --> I'll receive ID

Z.LANGUAGE --> I'll receive LNG

Z.VALUE_01 --> I'll receive RUN_NUMB

Z.TXT_VALUE_01 --> I'll receive TXT

Currently I have something like this:

(?<=Z.NUMBER\sAS). ?(?=(,|FROM))

...but this doesn't work for my SUBSTR values

Edit: I'm using C# to execute the Regex:

string expr = @"--> Some comment ....."; //so the long text
string columnExprValue = "Z.LANGUAGE";
string asValue = Regex.Match(expr, @"(?<="   columnExprValue   @"\sAS). ?(?=(,|FROM))")?.Value.Replace("AS", "").Trim() ?? ""; //Workaround to remove AS, because I don't know how to remove it in Regex

CodePudding user response:

Check this :

/^ \h*  (?:substr[(])?(?: Z.TXT_VALUE_01 )(?:,[^,] ,[^,] [)])? \h* AS \h  (\w ) \v* [,]? \v* $/gmxi

CodePudding user response:

This should work, but the implementation is "naive" in sense that it always expects correct valid parameters that do really exists, you can add necessary checks needed.

So the regex I'm going to use is this .*Z\.VALUE_01.*\s AS\s (?<Alias>[^,\s]*), where "Z\.VALUE_01" I will do as parameter. See regex tester - https://regex101.com/r/UJi8pY/1 The idea here is that in Group named "Alias" we should have the exact thing you are looking for

Then C# code will look like this:

public static string GetAlias(string input, string column)
{
    var regexPart = column.Replace(".","\\.");
    
    return Regex.Match(input, $".*{regexPart}.*\\s AS\\s (?<Alias>[^,\\s]*)").Groups["Alias"].ToString();
}
public static void Main()
{
    string val = @"--> Some comment
    CREATE VIEW ABC
    AS SELECT
    Z.NUMBER                                       AS    ID,
    Z.LANGUAGE                                     AS    LNG,
    SUBSTR(Z.VALUE_01,01,02)                       AS    RUN_NUMB,
    SUBSTR(Z.TXT_VALUE_01,01,79)                   AS    TXT
    FROM
    MYTABLE Z
    WHERE     ID                  = '0033'
    AND       LNG                 = 'DE'";
    
    Console.WriteLine(GetAlias(val, "Z.NUMBER"));
    Console.WriteLine(GetAlias(val, "Z.LANGUAGE "));
    Console.WriteLine(GetAlias(val, "Z.VALUE_01"));
    Console.WriteLine(GetAlias(val, "Z.TXT_VALUE_01"));
}

.NET Fiddle - https://dotnetfiddle.net/Z9kd8h

Good suggestion in another answer from @the-fourth-bird to use Regex.Escape instead of column.Replace(".","\\."), so all regex symbols would be escaped

CodePudding user response:

Getting the values with a regex from sql can be very brittle, this pattern is based on the example data.

To get the values only you might use lookarounds:

(?<=\bZ\.(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b.*?\sAS\s )[^\s,] (?=,|\s FROM\b)

Explanation

  • (?<= Lookbehind assertion
    • \b A word boundary
    • Z\. Match Z.
    • (?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b Match any of the alternatives followed by a word boundary (Or just match a single string like Z\.LANGUAGE)
    • .*? Match optional characters, as few as possible
    • \sAS\s Match AS between whitespace chars
  • ) Close the lookbehind
  • [^\s,] Match 1 non whitspace chars except for a comma
  • (?=,|\s FROM\b) Positive lookahead, assert either , or FROM to the right

See a .NET regex demo.

Or a capture group variant:

\bZ\.(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b.*?\sAS\s ([^\s,]) (?:,|\s FROM\b)

See another .NET regex demo.

If you want to make the pattern dynamic, you can make use of Regex.Escape to escape the meta characters like the dot to match it literally, or else it would match any character.

For example:

string input = @"--> Some comment

CREATE VIEW ABC
AS SELECT
Z.NUMBER                                       AS    ID,
Z.LANGUAGE                                     AS    LNG,
SUBSTR(Z.VALUE_01,01,02)                       AS    RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79)                   AS    TXT
FROM
MYTABLE Z
WHERE     ID                  = '0033'
AND       LNG                 = 'DE'";           
string columnExprValue = Regex.Escape("Z.LANGUAGE");
string pattern = @"(?<=\b"   columnExprValue   @"\b.*?\sAS\s )[^\s,] (?=,|\s FROM\b)";
string asValue = Regex.Match(input, pattern)?.Value ?? "";
Console.WriteLine(asValue);

Output

LNG
  • Related