C Pass by Ref vs Pointer-CodePudding

I am new to C . My background is mostly on Java/C#. While learning C I am having a hard time wrapping around knowing when to use pointers vs references. My question is: in Java/C# you can pass an object as an argument to a function and then assign this argument to an internal class variable so that you can use it outside the scope of the method later on. However, in C I am not sure how to achieve the same thing. If I pass by reference, I can only use it within the scope of that method. If I assign the reference to an internal variable of the same type, changes on one don't affect the other. I cannot declare an uninitialized reference either (maybe through constructor). The only solutions I have found are to either pass in the reference every time I need to work with it (as an argument) or pass in a pointer instead once (e.g. through a constructor) and convert it to a reference every time I need to use it outside the scope.

Here is an example method I have for testing this:

initially the value referenced by get and setValue is set to zero.
I call Controller2.initialize(Controller &controller, Controller *controllerPtr)
I call Controller2::process(Controller &controller)

The output is shown after the code blocks below

#include "Controller2.h"

Controller2::Controller2()
{
}

void Controller2::initialize(Controller &controller, Controller *controllerPtr)
{
    _controller = controller;
    _controllerPtr = controllerPtr;

    Controller &controllerRef = *_controllerPtr;

    controller.setValue(5);
    Serial.println("");
    Serial.print("_Controller in initialize(): ");
    Serial.print(_controller.getValue());
    Serial.print("  Controller in initialize(): ");
    Serial.print(controller.getValue());

    Serial.print("  Controller Ptr in initialize(): ");
    Serial.print(controllerRef.getValue());

    Serial.println();
}

void Controller2::process(Controller &controller)
{
    Serial.println("");
    Serial.print("_Controller in process(): ");
    Serial.print(_controller.getValue());
    Serial.print("  Controller in process(): ");
    Serial.print(controller.getValue());

    Controller &controllerRef = *_controllerPtr;
    Serial.print("  Controller Ptr in process(): ");
    Serial.print(controllerRef.getValue());

    Serial.println();
}

Controller2.h:

#include "Arduino.h"
#include "Controller.h"

#ifndef Controller2_h
#define Controller2_h

class Controller2
{
public:
    Controller2();
    void initialize(Controller &controller, Controller* controllerPtr);
    void manage();
    void process(Controller &controller);

private:
    Controller _controller;
    Controller* _controllerPtr;
};

#endif

Controller Class:

#include "Controller.h"

Controller::Controller()
{
}

void Controller::initialize()
{
}

void Controller::setValue(int val)
{
    value = val;
}

int Controller::getValue()
{
    return value;
}

Controller.h:

#include "Arduino.h"

#ifndef Controller_h
#define Controller_h

class Controller
{
public:
    Controller();
    void initialize();
    void manage();
    void setValue(int val);
    int getValue();

private:
    int value = 0;
};

#endif

And the main class:

#include <Arduino.h>
#include <Controller.h>
#include <Controller2.h>

Controller controller;
Controller2 controller2;

void setup()
{
  Serial.begin(115200);

  Serial.println("");
  Serial.print("Controller initial: ");
  Serial.print(controller.getValue());
  Serial.println();

  controller2.initialize(controller, &controller);
  controller2.process(controller);
}

void loop()
{
}

The output results in:

Controller initial: 0

_Controller in initialize(): 0  Controller in initialize(): 5  Controller Ptr in initialize(): 5

_Controller in process(): 0  Controller in process(): 5  Controller Ptr in process(): 5

Is this correct or am I missing something here?

I would appreciate any help!

Thanks

CodePudding user response：

The truth is that C references behave in many ways like pointers, without the pointer-specific syntax. For both pointers and references you have the pointer/reference-to-an-object and then you have the object itself, and the lifetime of the pointer/reference can differ from the lifetime of the object that it points-to/references, so in cases where the pointer/reference outlives the object, you have to be very careful not to dereference the pointer/reference after the object has been destroyed, or else you'll invoke undefined behavior and your program won't behave well.

So for example this is valid:

class Controller2
{
public:
   Controller2(Controller & controllerRef) 
      : _controllerRef(controllerRef)
   {/*empty*/}

private:
   Controller & _controllerRef;
};

... and behaves much the same as the pointer-based implementation:

class Controller2
{
public:
   Controller2(ControllerPtr * controllerPtr) 
      : _controllerPtr(controllerPtr)
   {/*empty*/}

private:
   Controller * _controllerPtr;
};

... the main difference being that in the reference-based implementation, there is no (legal) way for the user to pass in a NULL reference to the Controller2 constructor, therefore your code doesn't have to worry about checking _controllerRef to see if it's NULL, since the language guarantees it won't be NULL (or to be more specific, it says that if the reference is NULL, then the program is already broken beyond repair, so you can assume it isn't NULL).

In both cases, passing a raw-pointer/reference to an external object is a bit risky, so unless you have some other way to guarantee that the pointed-to/referenced object will outlive any possible dereferencing of _controllerRef, you might be better off either making a private copy of the object, or if that isn't practical, using something like a shared_ptr or unique_ptr instead to guarantee that the referenced object won't be destroyed until after you've stopped holding any references to it.

CodePudding user response：

Try to start thinking of things in terms of ownership of the memory. C doesn't have a garbage collector, so it's your job to manage who owns what memory.

Normally, the owner of the memory maintains the actual memory itself, or in the case of large (or virtual) data, maintains a std::unique_ptr to it. std::unique_ptr is just like a raw pointer except that it enforces (a) the memory is cleaned up when you're done with it, and (b) there's only one unique pointer to the data at a given moment.

If you need to let someone borrow the data (i.e. let a function do something with it), then you pass a reference. MyClass& is a type that looks at someone else's MyClass instance and might modify it. const MyClass& is a type that looks at someone else's MyClass and doesn't modify it (for obvious reasons, you should default to the latter unless mutability is required).

If you need a value to have multiple owners, then first think about whether you really need it. Lots of things fit into the single-ownership model, and generally with some minor restructuring you can get by just fine with references and unique pointers. But if you really need multi-ownership, you can use std::shared_ptr and std::weak_ptr to get that behavior (strong vs. weak references; you can read more about the difference at those links). But this should be the exception rather than the norm.

You should use raw pointers (i.e. MyClass*) almost never. In fact, as a C beginner, you should use raw pointers literally never. For collections, use std::vector, not a raw array that decays to a pointer (there are reasons to use the latter, but those are technical and only serve to confuse when just starting out). For individual data, use single- or multi- ownership as discussed above. If you write MyClass* somewhere, you should be able to replace it with a reference or a smart pointer (unique, shared, or weak). If you write new or delete somewhere, you should be able to replace it with a smart pointer constructor or simple value allocation (lots of Java devs coming to C find themselves writing MyClass* x = new MyClass() when MyClass x{} will do just fine).

My last major C project was a 13kloc programming language interpreter, and I used exactly one raw pointer in it (I remember specifically when I made this concession), to implement an obscure optimization trick in a critical path. There is a paragraph and a half of comments around that explaining why I had to do it and who actually owns the memory, since the type no longer communicates that information. Everything else was references and smart pointers. When you get the hang of it, you almost never need actual raw pointers.

Finally, a couple of pieces of advice.

Java and C# rely heavily on null. Try to avoid nullptr / NULL in C . If you want a value that may or may not exist, std::optional is the idiomatic way to do it (if you don't have access to C 17, Boost has a header-only library to do the same). In particular, you should never make a value into a pointer for the sole purpose of allowing nullptr; that's what std::optional was designed specifically to do.
You may run into some really old code or tutorials online that use std::auto_ptr. Do not use this type. std::unique_ptr does everything it does better. std::auto_ptr is a broken implementation and should never be used.

CodePudding user response：

The other answers give the C perspective. To give you a starting point, as you come from a Java/C# perspective:

A variable that holds an object in Java (or C#) corresponds roughly to a variable that holds a shared_ptr in C . There are differences, so don't expect that you should (or even can) simply "think Java" and use a shared_ptr. (You should "think C " when coding in C , just like you should "think Java" when coding in Java.) However, when comparing Java code and C code, thinking of Java objects as C shared pointers covers the 80% case. This should help you catch yourself when you try to apply semantics from another language (Java or C#) to C code.

Some additional reading:

Even though you asked about "pass by" (implying function parameters), the real issue behind your symptom appears to be the _controller member. This is neither a reference nor a pointer; it is an object. Each Controller2 object has a Controller object as a member. The number of Controller objects will be at least the number of Controller2 objects. These are not shared. The line _controller = controller; does not make _controller refer to the same object as controller. No, it copies values to _controller from controller, the same way that assigning int variables copies a value.

One way to see this is to add a line to your code:

    controller.setValue(5);   \\ existing line, affects parameter
    _controller.setValue(20); \\ new line, affects member

This gives your two Controller objects values that are distinct from each other, and distinct from the initial value. (The two Controller objects are the global variable named controller and the _controller member of the global variable named controller2.)

Some additional reading:

Should I prefer pointers or references in member data?