Home > Mobile >  How to split string but ignore until found certain char?
How to split string but ignore until found certain char?

Time:11-27

I'm tasked to create an intrepreter for c#

basically I want to split this string:

"multiply (add 5 5) 5"

into an array of string:

["multiply", "(add 5 5)", "5"]

if I'm using string.Split based on delimiter of space: " ", the result will be:

["multiply", "(add", "5", "5)", "5"] which is not what I'm expecting

Is this achievable in C#?

Edit: I also need to support nested expression:

"multiply (add (add 2 5) 5) 5"

For example above, needs to become:

["multiply", "(add (add 2 5) 5)", 5]

CodePudding user response:

For the very basic use cases you have provided, you could achieve the 'interpretation' by looping through your source string, keeping track of which nested level you are currently in (if any), and using a StringBuilder to build your interpreted parts character by character.

You could implement a method containing the following logic:

  1. Create an empty StringBuilder object, which will hold the part of the source string you are currently interpreting
  2. Create a counter to keep track of which nested level you are currently on (starting at 0 -- no nested level)
  3. Loop through your source string, character by character. For each character:
    1. Check whether the character is a space, and whether you are currently inside a nested part or not.
      • If the character is a space and you are not currently inside a nested part, it means you have reaced the end of the part that is currently being interpreted. Create a string of your StringBuilder object, return it and clear the StringBuilder object to prepare for the next part to be interpreted. Continue to the next character in the source string (i.e. skip the subsequent steps for the current character).
    2. Check whether the current character is an opening inner delimiter ( '(' ). If it is, it means that you are entering the next nested level. Update your nested level counter to reflect that.
    3. If the current character was not an opening inner delimiter; check whether it is a closing inner delimiter ( ')' ). If it is, update your nested level counter to reflect that.
    4. Append the current character to your StringBuilder object.
  4. After having looped through the whole source string, make sure to return the current content of your StringBuilder object as a string. (Unless your source string ends with a space character, the StringBuilder object will at this point contain the final interpreted part.)

Here is a possible implementation:

private static IEnumerable<string> GetInterpretation(string source)
{
    var delimiter = ' ';
    var innerDelimiter = (Opening: '(', Closing: ')');
    
    StringBuilder currentPart = new StringBuilder();
    
    var innerLevel = 0;
    
    Func<bool> isInnerPart = () => innerLevel > 0;
    
    foreach (var ch in source)
    {
        if (ch == delimiter && !isInnerPart())
        {
            yield return currentPart.ToString();
            
            currentPart.Clear();
            
            continue;
        }
        
        if (ch == innerDelimiter.Opening)
        {
            innerLevel  ;
        }
        else if (ch == innerDelimiter.Closing)
        {
            innerLevel--;
        }
        
        currentPart.Append(ch);
    }
    
    if (currentPart.Length > 0)
    {
        yield return currentPart.ToString();
    }
}

If your input string variable is named source, you can use the method as follows:

var interpretation = GetInterpretation(source).ToArray();

For the given inputs, the resulting output is:

var source = "multiply (add 5 5) 5";

multiply
(add 5 5)
5

var source = "multiply (add (add 2 5) 5) 5";

multiply
(add (add 2 5) 5)
5

Example fiddle here.


Note: As stated, this logic and implementation is naive and can be used for the described simple use case.

It assumes:

  • you also want to split up words that are separated by only a space
    (as opposed to a space in combination with a parenthesis)
    --> "ABC DEF (GHI) JKL MNO" will be interpreted as "ABC", "DEF", "(GHI)", "JKL", "MNO"
  • nested parts (wrapped inside ( )) that are on the top 'nesting level' are always preceded and succeeded by a space
    --> "ABC DEF(GHI)JKL MNO" will hence be interpreted as "ABC", "DEF(GHI)JKL", "MNO"

CodePudding user response:

You can separate the string by ( , ) . In order not to lose the character itself, replace it with two characters:

 string s = "multiply (add 5 5) 5";
 string[] reslts = s.Replace("(","((").Replace(")", "))")
                  .Split(new string[] { " (", ") " },StringSplitOptions.RemoveEmptyEntries);

result:

["multiply", "(add 5 5)", "5"]

Edit:

string s = "multiply (add (add 2 5) 5) 5";
var start=s.Select((b, i) => b.Equals('(') ? i : -1).Where(i => i != -1).FirstOrDefault();
var end = s.Select((b, i) => b.Equals(')') ? i : -1).Where(i => i != -1).LastOrDefault();

s = s.Insert(start, "#").Insert(end 2, "#");
string[] reslts = s.Split('#');

result:

["multiply", "(add (add 2 5) 5)", 5]


  • Related