Home > Software engineering >  c# efficient way to add an escape for some characters in a string
c# efficient way to add an escape for some characters in a string

Time:06-14

i need to replace some charachters in a string with a \ plus the original character

so giving thats string and array

string origin = "words&sales -test\strange";
string[] specialChars = new string[]{"\", "&", "-", "?",......}; 

i want to get

"words\&sales \-test\\strange"

notice that the \ itself is a character to find and replace

thanks

CodePudding user response:

Generally speaking, the fastest way to build String values in C#/.NET is with a StringBuilder, even if you're transforming another String value.

The other problem is the "best" way to determine which char values should be escaped or not: if the set of escapable characters is fixed at compile-time, then use a switch() statement, as that will be compiled to a native jump-table, which is faster than using a runtime HashSet<Char> for determining set-membership:

e.g.:


static String Escape( String input )
{
    StringBuilder sb = new StringBuilder( capacity: 5 * input.Length / 4 ); // Assuming 25% length increase due to escaping.

    foreach( Char c in input )
    {
        switch( c )
        {
        case '\\':
        case '&':
        case '-':
        case '?':
            _ = sb.Append( '\\' ).Append( c );
            break;
        default:
            _ = sb.Append( c );
            break;
        }
    }

    return sb.ToString();
}

If the set of escapable character is defined at runtime then using a HashSet<Char> will likely be the best overall option - though if you know you're only processing chars with Unicode code-points within a limited range (say ASCII-compatible chars in the range 0x00 to 0x7F) then you could use a Boolean[127] array to store the escape flag map.

Using a HashSet<Char>, it would be like this:

static String Escape( String input, IEnumerable<Char> escapableChars )
{
    HashSet<Char> escapeThese = new HashSet<Char>( escapableChars );

    StringBuilder sb = new StringBuilder( capacity: 5 * input.Length / 4 ); // Assuming 25% length increase due to escaping.

    foreach( Char c in input )
    {
        if( escapeThese.Contains( c ) )
        {
            _ = sb.Append( '\\' ).Append( c );
        }
        else
        {
            _ = sb.Append( c );
        }
    }

    return sb.ToString();
}

Of course, the above code can be optimized further: some suggestions:

  • First check to see if the String input even has any escapable characters in the first place: if none of its characters are escapable then just return input directly without having created a new StringBuilder.
  • Create an (on-demand) pool of StringBuilder instances instead of creating new instances on every call.
  • Allow ReadOnlySpan<Char> instead of String for input and writing output to Span<Char> - you'll need an initial step to calculate the required minimum size of the Span<Char> first though, and pass that info back to the caller.
    • The same minimum-size calculation can be done to have an exactly correct capacity: value for the StringBuilder instead of my (lazy) 25% estimate.
  • Add memoization: use a Bloom filter and output cache keyed by the input value.
  • Related