Home > front end >  How do I write a single backslash (\) in a string?
How do I write a single backslash (\) in a string?

Time:02-22

I am trying to detect non printable characters in a string ('\n', '\r', etc.) and insert a single backslash before them. So, for example if I have a string "Hello\nWorld", I want it to be "Hello\\nWorld". I have a code example that should do it, but it inserts a double backslash ('\\'), so the result is "Hello\\\nWorld". Is there a way to insert a single backslash in a string?

expression = Regex.Replace(expression, @"\p{Cc}", m =>
            {
                int code = m.Value[0];

                return code < 32
                            ? @"\"   $"{Convert.ToChar(code)}"
                            : Convert.ToChar(code).ToString();
            });

CodePudding user response:

If you don't want the TLDR, skip to the end..

When you write this:

var s = "Hello\nWorld";

The compiler turns the \n into a newline character giving you:

 Hello
 World

When you write this:

var s = "Hello\\nWorld";

The compiler turns the \\ into a single backslash character, giving you:

Hello\nWorld

When you write this verbatim string:

var s = @"Hello\nWorld";

The leading @ turns off compiler conversions of any slashed characters so you get:

Hello\nWorld

When you look at a string in the debugger tooltip or autos/locals window it shows you non-verbatim strings. i.e. it shows you the string you would have to paste into your source code to get the string you want output:

enter image description here

If you want to look at how the string actually would appear if you e.g. wrote it to a file and opened it in Notepad, click the magnifying glass next to the string value

enter image description here


If you edit the value by writing into the tooltip or the autos window, and you write a verbatim string by preceding it with an @:

enter image description here

Remember that it will go back to being a non-verbatim string when the debugger tooltip shows it to you next:

enter image description here

Here there are now 4 slashes because we edited it by making a verbatim string that had 2 slashes, and 2 real-slashes double up to 4 sourcecode-slashes. This is so that if you pasted it into code as a non-verbatim string, the compiler would convert those 4 slashes down to 2 slashes when compiling..


Hopefully you're now down with "compiler slashes". Here's the next thing to get on board with..

The regex engine is also a compiler of sorts, that also does these conversions.

When you have a regex of "a word character":

\w

You need to get past the C# compiler conversion first - the C# compiler conversion happens at compile time, but the Regex engine conversion happens at runtime

If you just write this:

var r = new Regex("\w");

The compiler will try and convert that \w and choke on it because it doesn't have a slash conversion for \w like it does for \newline or \tab

This means to get the regex engine to see \w you need to do either:

var r = new Regex("\\w");
var r = new Regex(@"\w");

Both of these become \w by the C# compiler so that's what the Regex engine sees


Some slashed-characters have meaning to both the compiler and the regex engine

The regex engine can understand either \n (2 chars: literally a slash followed by an n) or a newline (1 char, character 10 in the ascii table) so to get Regex to hunt for a newline you could:

var r = new Regex("\n");    //compiler converts to newline char
var r = new Regex(@"
");                         //source code literally contains a newline char
var r = new Regex(@"\n");   //compiler ignores, regex engine interprets \n as newline
var r = new Regex("\\n");   //compiler converts \\ to \, regex engine interprets \n as newline

So bear in mind this two step conversion. It's probably easiest to use @ strings to turn off compiler conversions and then your slashes get through to the regex engine as you wrote them in the source. If you need to get a " through to Regex, write ""

var r = new Regex(@"He said ""I don't know"" to me");

And also note that in recent visual studio, strings inside a regex get extra helpful syntax highlighting for what the regex engine sees:

enter image description here


Now that we have all that out of the way, and you appreciate the multiple levels of conversion going on, hopefully you can appreciate that you can't do what you're asking with Regex. There isn't any notion that the following string:

Hello
World

Which, in source code would be either:

var s1 = "Hello\nWorld";
var s2 = @"Hello
World";

Could "have a slash placed in front of the newline" and pop back out as \n because it isn't an n in the string. The string "Hello World" with some whitespace between the words doesn;t contain an n at all, anywhere

The compiler has essentially done:

code = code.Replace(@"\n", @"
");                                 //change slash-n to newline char 10

You cannot invert that with:

var x = code.IndexOf("
");                                 //find newline char
code = code.Insert(x, @"\");        //insert slash before newline

A string of "slash-newline" is not "slash-n"

The only reversion is:

code = code.Replace(@"
", @"\n");                          //replace newline char with slash-n

There aren't slash codes for everything you'll find. About the only thing I guess you could do with your current approach would be something like:

expression = Regex.Replace(expression, @"\p{Cc}", m => $@"\u{(int)m.Value[0]:x4}");

This will take some string like:

Hello
World

And turn it into

Hello\u000aWorld

If you want it to be \n you'll have to code for it (and all the other slash-whatevers) specifically by having a big table of replacements:

enter image description here

Table courtesy of https://www.tutorialspoint.com/csharp/csharp_character_escapes.htm

  • Related