Strings in Structs-CodePudding

Strings are reference types so if I use them in structs just the reference will be stored in the stack. But why this code gives me foo1.Bar different from foo2.Bar?

var foo1 = new Foo();
foo1.Bar = "test";

var foo2 = foo1;
foo2.Bar = "test2";

Console.WriteLine($"foo1 -> {foo1.Bar}");
Console.WriteLine($"foo2 -> {foo2.Bar}");

struct Foo
{
    public string Bar;
}

Shouldn't foo1.Bar and foo2.Bar store the same reference and in this case show the same result at the end?

I'm using .NET 6, C# 10.0

CodePudding user response：

Let's work through the first four lines one at a time:

var foo1 = new Foo();

Above creates a new Foo instance in the variable foo1.

foo1.Bar = "test";

Now a string object is created in heap memory, and the reference is assigned to the Bar property of the foo1 variable. The C# compiler gives strings some special treatment so they have some value type semantics (ie: you didn't have to write foo1.Bar = new string("test");), but they are actually reference types.

var foo2 = foo1;

This copies the Foo instance in foo1 to a new foo2 variable. Because this is a struct, and not a class, the contents of foo1 are copied to foo2, still presumably on the stack. If the Foo type were a class instead of a struct, then foo2 would only receive a reference to the same object as foo1, but as it is these two variables are now completely different objects. The .Bar reference is also copied, so you now have two different references that refer to the same "test" string object.

foo2.Bar = "test2";

A new "test2" string object is created in heap memory and assigned to the foo2.Bar property. This replaces the old reference. However, this reference is not the same as foo1.Bar — they formerly referred to the same object, but were different references — and therefore foo1 is unchanged. Again, if Foo were a class instead of a struct, then the the foo1 and foo2 variables at this point would hold reference values for the same object, in which case updating foo1.Bar would also update foo2.Bar, but as a struct we ended up with copies instead, and copies are free to diverge.

Some additional reading in this area:

https://stackoverflow.com/a/52428042/3043
https://docs.microsoft.com/en-us/archive/blogs/ericlippert/the-truth-about-value-types

CodePudding user response：

When you create an instance of Foo you are (for all intents and purposes) creating an instance on the stack.

When you assign one Foo to another you are also creating a new instance of Foo because Foo is a struct.

Foo only contains a reference to a string. So when you assign one Foo to another you are making of a copy of that string reference. When you then assign a different string to the copy of Foo you are only assigning the string to the copy. The original Foo is untouched. Hence the original Foo retains its original string reference.

CodePudding user response：

After

foo2.Bar = "test2";

foo2.Bar points to a different string. The assignment changes which object the reference points to (as opposed to making changes to the object that the reference points to):

var foo1 = new Foo();
foo1.Bar = "test";
//
//                               "test" 
// foo1.Bar  ---------------------┘                  


var foo2 = foo1;
//
//                               "test" 
// foo1.Bar  ---------------------┘  |                
//                                   |
// foo2.Bar  ------------------------┘                  

foo2.Bar = "test2";

//
//                               "test" 
// foo1.Bar  ---------------------┘                 
//                                
//                               "test2"
// foo2.Bar  ---------------------┘

This is not specific to strings. Here's an example with a list (inspired by Value types (C# reference)):

A a1 = new A() { L = new List<string> {"1", "11" } };

A a2 = a1; // Shallow copy


Console.WriteLine(a1);  // [1,11]
Console.WriteLine(a2);  // [1,11]

a2.L.Add("X");

Console.WriteLine(a1);  // [1,11,X]
Console.WriteLine(a2);  // [1,11,X]

// this does not make changes to the object that a2.L points to, 
// it changes which object a2.L points to.
a2.L = new List<string> {"2", "22" };                                  

Console.WriteLine(a1); // [1,11,X]
Console.WriteLine(a2); // [2,22]

public struct A
{
    public List<string> L {get; set; }

    public override string ToString() => $"[{string.Join(",", L)}]";
}

Please note that your question is NOT about immutability of strings. This is because nowhere in your code the strings are modified.

From Strings and string literals:

Because a string "modification" is actually a new string creation, you must use caution when you create references to strings. If you create a reference to a string, and then "modify" the original string, the reference will continue to point to the original object instead of the new object that was created when the string was modified. The following code illustrates this behavior:
string str1 = "Hello ";
string str2 = str1;
str1  = "World";

System.Console.WriteLine(str2);
//Output: Hello

^ This is about changing strings and your example does not change strings.

Reference for value type assignment:

From Value types (C# reference):

By default, on assignment, passing an argument to a method, and returning a method result, variable values are copied.

and

If a value type contains a data member of a reference type, only the reference to the instance of the reference type is copied when a value-type instance is copied. Both the copy and original value-type instance have access to the same reference-type instance.

CodePudding user response：

Here is what is happening behind the scenes

 var foo1 = new Foo();

Allocate some memory and assign foo1 to point to it. Lets call that location A So foo1 -> mem(a)

 foo1.Bar = "test";

set the data at mem(a) offset Bar to point at "test"

 var foo2 = foo1;

Assignment of an object creates a "copy" of the object. So a new memory location ---- mem(b) is created and foo2 is assigned to point to it.

 foo2.Bar = "test2";

set the data at mem(b) offset Bar to point at "test2"

Ok so when you print out foo1 and foo2 they still point to mem(a) and mem(b).

NOTE! This is very different with C and C where you actually change the pointers themselves. In those language (which are not memory managed) you can have two variables and structs that with variables that point to the location.

CodePudding user response：

This is really an interesting phenomenon. The = operator does a shallow copy of a struct and your expectation that the reference to the string should have been maintained is understandable. (The Struct.Clone() method does a deep copy)

But your expectation holds good for other primitive types, but not string. Remember, String derives from System.Object and not from System.ValueType. System.String is one of the few classes in the .NET Framework Base Class Library that is given special treatment by the CLR. It behaves much like a value type at runtime. String copy and compare operations result in value semantics rather than reference semantics.

So when you assign a struct to another with the = operator, the shallow copy creates a copy of the string instead of referring to the same string in memory.

The CLR optimizes string management by maintaining an internal pool of string values known as an intern pool for each .NET application. If the value being assigned to a string variable is the same as one of the strings already in the intern pool, no new string is created and the variable points to the address of the string in the pool. The compiler is capable of using the intern pool to optimize string initialization and have two string variables pointing to the same String object in memory. This optimization step is done only at compile time and isn’t performed at run time, though, because the search in the pool takes time and can even fail, adding overhead to the application.

Source: https://thedeveloperspace.com/back-to-basics-string-type-in-net/