how to trim a string upto particular string which starts with a symbol in C#-CodePudding

input - one two three #abc four five six #xyz
output1 - one two three #abc
output2 - four five six #xyz

getting op - 
data1 = one two three
data2 = four five six

here #abc and #xyz can be any word. i want to break the sentence till the first string which starts with the #abc sentence contains only two merged things which need be segregate. code

 {
  String input = "one two three #abc four five six #xyz";
    String data1 = "";
    String data2 = "";
    data1= input.Substring(0,input.IndexOf('#'));//one two three #abc

        string[] digits = Regex.Split(input, @"(?<!\w)#\w ");  
        Console.WriteLine(digits[0]);
        Console.WriteLine(digits[1]);
}

CodePudding user response：

You can use the following regex with Regex.Split:

string[] digits = Regex.Split(input, @"(?<=\B#\w )\b\s*")
    .Where(x => !string.IsNullOrEmpty(x)).ToArray();

// digits[0] => one two three #abc
// digits[1] => four five six #xyz

See the regex demo and the C# demo. Regex details:

(?<=\B#\w ) - a positive lookbehind that matches a location that is immediately preceded with a # (either at the start of the string or right after a non-word char and then one or more word chars)
\b - a word boundary
\s* - zero or more whitespaces.

CodePudding user response：

String.Substring takes the length of the substring as the second parameter, so you'd have to determine the length of the hashtag (or whatever you'd like to call it) in order to split it after that.

var firstPart = string.Empty;
var secondPart = string.Empty;

var sharpIndex = input.IndexOf('#'); // strictly speaking we'd have to check whether the '#' was found
var spaceIndex = input.IndexOf(' ', sharpIndex) // finds the space after the first occurrence of '#'
if(spaceIndex > 0) // is there a space after '#'
{
    firstPart = input.Substring(0, spaceIndex)
    secondPart = input.Substring(spaceIndex)
}
else
{
    firstPart = input
    secondPart = ""
}

Please note that this searches for the first "#abc" only. If you'd like to split it by more than that, you'll have to get more advanced.

You could use a regular expression, too, e.g.

^([a-zA-Z\s]*#[a-zA-Z] )([a-zA-Z\s#] )?$

The respective C# code could look like the following

var firstPart = string.Empty;
var secondPart = string.Empty;

var match = Regex.Match(input, "^([a-zA-Z\s]*#[a-zA-Z] )([a-zA-Z\s#] )?$");
if(match.Success)
{
    firstPart = match.Groups[1].Value;
    secondPart = match.Groups.Count > 2 ? match.Groups[2].Value : string.Empty;
}

The first group ([a-zA-Z\s]#[a-zA-Z] ) matches everything up to the #abc. The other group ([a-zA-Z\s#] ) matches the rest. If the text contains more than text characters and whitespaces, you'll have to adapt the expression accordingly. Again, this splits the string after the first occurrence of any #abc, so "This is a #test my #test #abc" would be split to "This is a #test" and " my #test #abc".