So i have a string that contains snowflake columns and i want to split the string to each column, I’m trying to use Regex to do this as it split string won't work in this situation. The string pattern i have tried is
string pattern = @"([^\s]*\s[^\s]*),"
though this pattern splits after the second consecutive space. Im not sure how to split it just after the alias. I am also using .net core 3.1. Any help would be appreciated..
current snowflake datacolumn string:
string columns = "nvl(u.\"Country\",'#N/A') \"Country\",u.\"CreatedDate\" \"CreatedDate\",nvl(u.\"Email\",'#N/A') \"Email\",u.\"LastModifiedDate\" \"LastModifiedDate\",nvl(u.\"Name\",'#N/A') \"Name\"";
expected output:
nvl(u."Country",'#N/A') "Country"
u."CreatedDate" "CreatedDate"
nvl(u."Email",'#N/A') "Email"
u."LastModifiedDate" "LastModifiedDate"
nvl(u."Name",'#N/A') "Name"
CodePudding user response:
You can use
string[] result = Regex.Split(text, @"(?<=\s""\w ""),");
Another idea is to extract the matches with
var result = Regex.Matches(text, @"\b(?:nvl\([^()]*\)|u\.""[^""]*"")\s ""[^""]*""")
.Cast<Match>()
.Select(x => x.Value);
Details:
\b
- word boundary(?:nvl\([^()]*\)|u\."[^"]*")
-nvl(...)
oru."..."
\s
- one or more whitespaces"[^"]*"
-"
, zero or more non-"
s, and a"
.
CodePudding user response:
You can use a capture group (group 1) and exclude the comma in the second part after matching the space. To match all parts, you can match either a comma or the end of the string at the end of the pattern.
This part [^\s]*
can be written as \S*
(\S*\s[^\s,]*)(?:,|$)
(
Capture group 1\S*\s[^\s,]*
Match optional non whitespace chars, match a whitespace char and match optional non whitespace chars except a comme
)
Close group 1(?:,|$)
Match either a comma or assert the end of the string
For example
string pattern = @"(\S*\s[^\s,]*)(?:,|$)";
string input = @"nvl(u.""Country"",'#N/A') ""Country"",u.""CreatedDate"" ""CreatedDate"",nvl(u.""Email"",'#N/A') ""Email"",u.""LastModifiedDate"" ""LastModifiedDate"",nvl(u.""Name"",'#N/A') ""Name""";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups[1].Value);
}
Output
nvl(u."Country",'#N/A') "Country"
u."CreatedDate" "CreatedDate"
nvl(u."Email",'#N/A') "Email"
u."LastModifiedDate" "LastModifiedDate"
nvl(u."Name",'#N/A') "Name"
A bit more specific pattern using
to match 1 or more characters and match word characters between double quotes:
(\S \s"\w ")(?:,|$)