I have following string:
1, 20045, abc, "new york, some2", new york, your name
How do I split this string using comma when it also contains comma in one of the values?
CodePudding user response:
As the comments by @jmcilhinney mention, you should ideally be using a CSV parser here. If you want to go the splitting approach, I would suggest a regex find all on the following pattern:
".*?"|[^\s,] (?: [^\s,] )*
This pattern says to match:
".*?"
first try to consume a doubly quoted term, possibly containing commas|
OR[^\s,]
match a term not including comma(?: [^\s,] )*
possibly followed by space and another term, 0 or more times
This regex trick eagerly matches doubly quoted terms, and only that failing will use comma as a separator.
Sample script:
string text = "1, 20045, abc, \"new york, some2\", new york, your name";
string search = @""".*?""|[^\s,] (?: [^\s,] )*";
MatchCollection matches = Regex.Matches(text, search);
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
Console.WriteLine(groups[0].Value);
}
This prints:
1
20045
abc
"new york, some2"
new york
your name
CodePudding user response:
It is likely best to pick some library that can handle CSV files.
Otherwise, this could work in cases like yours:
public static string[] Split(string str)
{
var indices = new List<int>();
var insideQuote = false;
for (var i = 0; i < str.Length; i)
{
switch (str[i])
{
case '"':
insideQuote ^= true;
break;
case ',':
if (!insideQuote) { indices.Add(i); }
break;
}
}
if (indices.Count == 0)
{
return new[] { str, };
}
var arr = new string[indices.Count 1];
arr[0] = str.Substring(0, indices[0]);
for (var i = 1; i < arr.Length - 1; i)
{
arr[i] = str.Substring(indices[i - 1] 1, indices[i] - indices[i - 1] - 1);
}
arr[arr.Length - 1] = str.Substring(indices[arr.Length - 2] 1);
return arr;
}