How pass by Reference works in C#-CodePudding

I am trying to write some code to create a linkedlist but I am confused on how the pass by reference is working in c#. Below is my code for AddNodeToEnd method which is taking as input the Head of the LinkedList and the data element to add.

    public LinkedList AddNodeToEnd(LinkedList head, string data)
    {
        var node = new LinkedList() { Data = data };
        if (head == null)
            return node;

        while (head.Next != null)
        {
            head = head.Next;
        }
        head.Next = node;

        return head;
    }

Below is my code for adding elements to the list.

    var linkedList = new LinkedListDriver();
    var head = linkedList.AddNodeToEnd(null, "1");
    linkedList.AddNodeToEnd(head, "2");
    linkedList.AddNodeToEnd(head, "3");
    linkedList.AddNodeToEnd(head, "4");
    linkedList.AddNodeToEnd(head, "5");
    Console.Write(linkedList.PrintList(head));

This is printing the output as 1 => 2 => 3 => 4 => 5 (as expected).

My question is how the head element is getting changed in the AddNodeToEnd method? New Nodes are getting added to the LinkedList correct, but in same method, I am traversing on the Head node itself without assigning it to a different/temporary variable but in my Main method, still the Head remains on 1 why? Based on above code, I was expecting the Head would move to 5 because of head = head.Next; as the head is passed as reference (by default in C#).

Appreciate an explanation of this behavior in C#.

public class LinkedList
{
    public string Data { get; set; }
    public LinkedList Next { get; set; }
    public LinkedList Previous { get; set; }
}

CodePudding user response：

When you pass head to the AddNodeToEnd you are passing a copy of the reference. The copy initially points to the head but in AddNodeToEnd you change the reference that the copy points to. You are not changing the original head.

Had the signature of AddNodeToEnd been public LinkedList AddNodeToEnd(ref LinkedList head, string data) then your code would have behaved in the way you thought it should have. In this case you would have not passed a copy of the reference, you would have passed the reference itself.

CodePudding user response：

When you pass a reference type (i.e. a custom type/class, not a build in type/struct like int) as an argument to a function, you are actually getting a pointer to a memory location of an existing object. Unless you pass with the extra ref argument, you are not allowed to replace the whole object, but you are allowed to modify its internal values.

In this case something interesting is happening: by passing head you are given a reference type, which you can modify. Thus, you modify the existing object. However, when you assign to head you replace a locally copied reference to the object. That's doesn't modify the existing object.

CodePudding user response：

You are confusing two concepts which are instead different concepts. The concepts I'm referring to are the followings:

the difference between value types and reference types
the difference between passing arguments by value and passing arguments by reference

Let's start with value type VS reference type.

Value types are types whose value is the data itself. When I say "tha data itself", I mean an actual instance of the value type. An example of a value type is the struct named System.Int32, which is a 32 bit signed integer number. Consider the following variable declaration:

int number = 13;

In this case the number variable contains the actual value 13, which is an instance of the System.Int32 type. Put another way, the memory location in your computer memory you access via the number identifier directly contains the integer number 13.

Based on what I explained above, the following lines of code create a copy of the value contained inside the a variable and assign the copy to the b variable. After the execution of the code the two variables contain indipendent copies of the same integer value (13). Put another way, there are two indipendent memory locations in your RAM which contain two indipendent copies of the integer number 13:

int a = 13;
int b = a;

Reference types are types whose value is a reference to the data itself. When I say "a reference to the data itself", I mean a reference to an instance of the type. An example of a reference type is the class System.String, which is used to represent string of characters. Instances of reference types are stored in a portion of memory called managed heap, this portion of memory is handled by the garbage collector which is in charge of handling the deallocation of the memory occupied by instances of the reference types (this is done when these instances are no more referenced by anyone, so they can safely be removed from memory). Consider the following line of code:

string name = "Enrico";

Here the variable name does not contain the string "Enrico", instead it contains a reference to the string "Enrico". This means that somewhere in the managed heap there is some memory containing the actual data (the sequence of characters composing the string "Enrico") and that the memory location you access via the name identifier contains a reference to the memory location containing the actual string data. You can imagine the thing I'm calling a reference, as a fictious arrow (a pointer) which points to another memory location, which actually contains the sequence of characters composing the string "Enrico".

Consider the following code:

string a = "Hello";
string b = a;

This is what happens here:

some memory is allocated in the manged heap to contain the sequence of characters composing the string "Hello". At this memory location there is the real data, the string itself, the actual instance of the System.String type.
the variable a contains a pointer which points to the real data, that is a pointer to the memory location described at step 1.
the variable b contains a copy of the content of variable a. This means that the variable b contains a pointer pointing to the memory location described at step 1. There are now 2 indipendent pointers to the same memory location, which contains the actual data, that is the sequence of characters composing the string "Hello".

Notice that, at this point, you can access the same object (the string "Hello", which is an instance of the System.String type), by using two different pointers: both the a and b variables are referencing the string data stored somewhere in the managed heap. The very important part here is that there is only one string instance in memory.

We can now talk about pass by value and pass by reference. Simply put, by default in C# all the method arguments are passed by value. This is the most important part of my answer and I have noticed many times some confusion about this. But it's really that simple: unless you specify that you want a pass by reference behavior, the default behavior of C# is passing method arguments by value. You can opt out by this default behavior by using the ref or the out keywords: this way you decide that you do want a pass by reference behavior when you pass arguments to a method.

Passing by value means passing a copy of the value to the method as an argument.

What is really important to understand is what "a copy of the value" actually means. But you already know the answer:

a copy of the value for a value type, means a copy of the actual data (a physical copy of the real value)
a copy of the value for a reference type, means a copy of a reference to the actual data. You are not creating a copy of the actual object stored in memory, you are creating a copy of a pointer to that object.

Now we can consider the final example, that I hope will clarify your doubt. I need a class (which is a reference type) and a couple of methods.

public class Person 
{
  public string Name { get; set; }
}

public static void DoSomething(Person person) 
{
  person = new Person 
  {
    Name = "Bob"
  };

  Console.Writeline(person.Name); // this prints Bob
}

public static void Main(string[] args) 
{
  Person alice = new Person 
  {
    Name = "Alice"
  };

  DoSomething(alice);

  Console.Writeline(alice.Name); // this prints Alice
}

Here is what happens:

the Main method creates an instance of the Person class in the managed heap, whose name is "Alice". A variable named alice is assigned to that instance, so the variable alice contains a pointer to the Person class instance. There is 1 variable and 1 object in memory. The variable points to the object.
the DoSomething method is invoked and the variable alice is passed by value as the argument for the person parameter. The variable person is a copy of the variable alice: these two copies are independent and both of them point to the same memory location, which contains the object created at point 1 (the Person class instance whose name is "Alice").
inside the method DoSomething a new object is created in memory, this object is an instance of the Person class having the name "Bob". The method parameter, person, is assigned the newly created object. There are now two objects in memory, both of them are instances of the Person class. The parameter person contains a reference to one of these objects (the one having name "Bob"), while the variable alice of the Main method contains a reference to the other object (the one having name "Alice"). This is perfectly fine because there is no bound between the parameter person and the variable alice: they are totally independent and they are free to reference different objects.
when the execution of DoSomething ends, the method parameter person goes out of scope and is no more accessible via code. We are back to the Main method and the variable alice is still in scope and accessible via code. This variable is untouched by the exection of DoSomething and keeps pointing to the instance of the Person class created at point 1 (the one having name "Alice").