Data structures in memory

Storage classes: storage duration and scope

All variables are characterised by storage classes, which describes a variables lifetime (through storage duration, below), linkage (visibility in or out of the file) and memory location (CPU register, or memory stack or heap).

All variables have a storage duration:

Automatic storage duration
Static storage duration
Dynamic storage duration

Which part of the application over which a variable name is valid is known as scope. Any reference to the variable that is legal is phrased as in scope, at which point the variable can be used in an expression.

Memory addresses are zero-based. Memory is partitioned into code, stack (stack frame or activation record) and heap (free store) sections.

Variables declared in a block (of curly braces) are automatic variables (have automatic storage duration) and are in scope when they are declared up to the end of the closing brace. Such variables are stored on the stack. Systems are given a default stack size, typically of the order of megabytes. Variables out of scope are automatically removed from the stack, so any reference to them will cause a runtime error. An automatic variable can be declared with the keyword auto but this is not necessary as it is implied.

Other function calls that declare variables are allocated a portion of the stack on a LIFO basis. The term stack relates to how functions stack on top of each other as they are called. When a function returns, its stack frame is cleared automatically.

Variables declared in main() have global scope (or file scope) and are classed as static variables (have static storage duration). Such variables are also placed on the stack. It is also possible to declare any variable as static with the keyword static, which means the variable will be in scope for the entire duration of the application. Static variables will always be initialised a value (the type’s equivalent of 0) if none are provided.

Variables that may or may not have definite size (e.g. arrays) are handled as dynamic variables and are handled with dynamic storage duration. For many applications, having control over memory usage is preferred. Dynamic variables are located on the heap and referred to by pointers, which themselves reside on the stack. The heap is a randomly arranged portion of memory where data resides in randomly allocated heaps. Data which utilises the heap must be allocated and de-allocated (to prevent memory leaks).

Pointers, Arrays and Pointer Arithmetic

Pointers assigned to a variable (as such) can be used to update the variable. Some conventions prefix a pointer name with p.

long someLong = 20L;
// general form of declaring pointers is: type *identifier
long *pSomeLong = &someLong;

// assign a value by dereferencing
long anotherLong = *pSomeLong;

// update the assigned value, to 30 then 31
*pSomeLong = 30L;
*pSomeLong++;

Assuming the above are not declared in the main block (have automatic storage duration), then both the variable and pointer would reside on the stack.

A pointer to an array can be expressed in the form:

 // array of pointers to integers (read the expression from right to left)
 int *ptr[arraySize];

 // for comparison, an array of integers
 int someArray[arraySize];

The array name or literal is essentially a pointer to the first element. When written to the right of an assignment operator, the expression *ptr dereferences the pointer ptr i.e. refers to the value of the first element of the array and so is equivalent to ptr[0]. Under the same circumstances, the expression *(ptr + 1) refers the value of the second element, and is equivalent to ptr[1] (the indirection operator * takes precedence over +, so parentheses are used to override this precedence).

If ptr[1] is written to the left of an assignment operator, then the element is assigned the value, whereas if ptr[1] is written to the right, then its value is returned. The same can be said for pointer notation:

int *ptr[6];

// array names can be used as pointers, so these are equivalent
ptr[0] = 8;
*ptr = 8;

// this would be invalid
ptr = 8;

// note both of the following, without the indirection operator, will nullify the pointer
// (all variables have non-zero addresses, so zero is undefined i.e. null)
ptr = 0;
ptr = NULL;

// assign the second element a value 5
ptr[1] = 5;
*(ptr + 1) = 5;

// assign the value of the third element to the second element
ptr[1] = ptr[2];
*(ptr + 1) = *(ptr + 2)

// this assign the address of the third element of ptr to a 
// new pointer, otherPtr
int *otherPtr;
otherPtr = &ptr[2];

// post-increment operator increments the address of otherPtr by 4 bytes (int)
// i.e. the next int element
otherPtr++;
int fourthElement = *otherPtr;

In general, *(ptr + n) is equivalent to ptr[n] and are examples of ‘pointer arithmetic’. This is revisited later re. pointers to strings.

In the context of main():

int main(){
  int a = 10;
  int *p; // NULL pointer, assign ASAP
  p = &a; // pointer p references int a (both in the stack)
  
  // dereferences p then assigns the value (not the address)
  int t = *p; 

  // since pointer p now stores the address to a, any
  // operation on pointer p also changes the value of a;
  // the following changes the value "a" through its pointer
  *p = 20;
}

C and C++ pointers and the heap

Pointers are required in order to manage variables on the heap (free store). In short:

int main(){
  int *p;
  p = new int[5];  // this allocated 5*size of int in the heap, somewhere

  // the C based version of new assuming int's are 2 bytes long
  //p = (int *)malloc(2 * 5);
  
  // do stuff to p...

  delete[] p;      // clear the heap (do this before resetting the pointer!)
  p = NULL;        // reset the pointer
}

Grant access to stack and heap resources, by address. Functions called are stored in the stack, along with their local variables

Storing data on the heap in C++ with `new` and `delete`

In C, data is stored (dynamically) in the heap using malloc().

#include<stdlib.c>

int main(){
  int *p;
  malloc(5*sizeof(int)); //allocates space in the heap
  p = (int*)(malloc(5*sizeof(int)));
}

In C++, use the keyword new.

// pointer p resides on the stack, and points to an array (of int)
// that resides in the heap
int *p = new int[5];

This next example shows some examples about pointer arithmetic and how to free arrays in heap.

 #include <iostream>

 using std::cin;
 using std::cout;
 using std::endl;

 int main(){

  int arraySize = 5;

  // assign the int pointer (which resides on the stack) 
  // to a new array of int (which resides in the heap);
  // note that both int declarations must match
  int *arrayOne = new int[arraySize];
  int *tempArray;

  int added = 0;
  int heapIndex = 0;
  int answer = 0;

  // assume that at least one value is entered
  while (true){
   cout << "Enter an non-zero integer value (enter 0 to quit): ";

   // using the int equivalent of the input would also work here
   cin >> answer;

   if (answer == 0){
    cout << "Terminating input...";
    break;
   }

   added++;
   arrayOne[heapIndex++] = answer;
   // or similarly, use *(arrayOne + heapIndex++) = answer;

   // if the last element was assigned,
   if (heapIndex == arraySize){
    cout << "Increasing the array size..." << endl;

    arraySize += 5;
    cout << "New array size: " << arraySize << endl;

    // tempArray points to a new larger array (of int)
    tempArray = new int[arraySize];

    for (int j = 0; j < added; j++){
      // get everything from the smaller array to the newer, larger array 
      tempArray[j] = arrayOne[j];
    }

    // clear the heap of the smaller array's elements; note that arrayOne 
    // is itself a pointer (to an int) that resides on the stack: this 
    // operation does not remove arrayOne
    delete[] arrayOne;

    // pointers (unlike C++ references) are mutable so reassign arrayOne
    arrayOne = tempArray;

    // arrayOne refers to a larger array, so reset the address stored
    // by tempArray
    tempArray = 0;
   }
  }

  cout << "You entered a total of " << heapIndex << " value(s)" << endl;

  int sum = 0;
  bool groupDoneYet = false;
  int i = 0;

  // only process as many elements that were added
  for (; i < added; i++){
   groupDoneYet = false;
   cout << arrayOne[i] << " ";
   sum += arrayOne[i];

   // process the sum and average of each group of five
   if ((i + 1) % 5 == 0){
    cout << "sum: " << sum << " and average: " <<  
       static_cast<double>(sum/5) << endl;
    sum = 0;
    groupDoneYet = true;
   }
  }

  // handle the remaining elements (not part of a complete group of five)
  if (!groupDoneYet)
   cout << "sum: " << sum << " and average: " <<
       static_cast<double>(sum/(i%5)) << endl;

  cout << "Clearing up..." << endl;

  // release all heap memory
  delete[] arrayOne;

  // remove the address (set arrayOne to NULL)
  arrayOne = 0;

  return 0;
 }

Pointers to char

Despite the syntax, there are no pointers to a single char but there are pointers to an array of characters, i.e strings. One can initialise the value the pointer points to directly with a string literal:

// a bit misleading, someString is actually pointing to an array of characters
char *someString = "Welcome to the world of C++ pointers";

// compare this to pointers to other types
double someDouble = 2.2;
double *ptrDouble = &someDouble;

*ptrDouble = 5.5;

A string is a sequence of characters with a terminating null character /0. The string itself is immutable, though the pointer is mutable. To effectively change the string one would re-assign the pointer to a new string literal. (Variables of type char do not have an additional terminating null character.)

To get the first character of a pointer to an array of characters, one would use the indirection operator twice:

// the string literal resides on the heap
char *someString = "Welcome to the world of C++ pointers";
char someStringAgain[] = "Welcome to the world of C++ pointers again";

char firstChar = *(*someString);
char firstCharAgain = someStringAgain[0];

char secondChar = *(*someString + 1);
char secondCharAgain = someStringAgain[1];

Arrays of pointers to char

Since there are only pointers to an array of characters, i.e. pointers to strings, it is also true that an array of pointers to a char is in fact an array of pointers to strings.

char *someArrayOfStrings[2];

// assign the first element
*someArrayOfStrings = "First element";

// assign the second element (uses pointer arithmetic)
*(someArrayOfStrings + 1) = "Second element";

// someArrayOfStrings[0] and *someArrayOfStrings represents the first element
char aString[100] = *someArrayOfStrings;

// someArrayOfStrings[1] and *(someArrayOfStrings + 1) represents the second element
char a2String[100] = *(someArrayOfStrings + 1);

// someArrayOfStrings[1][0] and *(*(someArrayOfStrings + 1)) represents the first char of the second element
char aChar = *(*(someArrayOfStrings + 1));

// someArrayOfStrings[1][3] and *(*(someArrayOfStrings + 1) + 3) represents the fourth char of the second element
char aChar = *(*(someArrayOfStrings + 1) + 3);

Generally, anArray[i][j] is equivalent to *(*(anArray + i) + j)). One could combine array and pointer notation but this then gets a bit unwieldy.

References (C++)

References are exclusive to C++ (not C) and can be viewed (syntactically) as aliases to variables.

They do not create copies of variables they reference to in any function call (other than main()). Unlike pointers, references are immutable and so must be assigned as an alias to one variable for the program’s entire lifecycle.

Functions which handle the references do not reside in separate stack frames and instead are part of the main() stack frame.

//general form i: type & identifier
 int main(){
  int aValue = 10;
  int &rAValue = aValue;

  // update the value of aValue via rAValue
  rAValue = 12;
 }

References do not need dereferencing and their identifier (i.e. r) automatically provides the value it points to. The address the reference points to is given by &r.

The address of a and r are the same, so any operations on a and rare the same. So r++ is equivalent to a++. The tokens &r to the left of the assignment operator assign a reference r to some variable.

Tokens &r to the right of the assignment operator return the address the reference points to. For example:

  int* intPtr = &r;

All references MUST be initialised to some (address of a) variable.

Pointers and Structures

Pointers can be assigned to the address held by references, e.g. structures:

 struct Rectangle *p = &r;

 // dereferencing operator * has lower precedence than the 
 // member access operator . hence the parenthesese 
 // dereference the structure pointer p first before 
 // accessing the member
 (*p).length = 20;

Alternatively, one can use the indirect member operator (->) instead of (*p). The indirect member selection is also used with C++ classes.

 p->length = 20;

Functions and parameter passing

Passing by value assigns a variable, local to the function, with the parameter. The variable passed as the parameter resides in a different part of memory. Only a copy of the variable is handled by the function: hence the original variable cannot be changed. To change a parameter passed as a variable, send its address in the form of a reference or a pointer.

Passing pointers and function overloading

From the given block (quite often main()), the address of parameter(s) can be sent to a function so as to allow the parameter values to change.

#include<iostream>

void swap(int *pIntA, int *pIntB){
	if (pIntA != NULL && pIntB != NULL){
		int pTempInt = *pIntA;
		
		*pIntA = *pIntB;
		*pIntB = pTempInt;
		std::cout << "Integers swapped..." << std::endl;
		return;
	}

	std::cout << "Looks like a null pointer was passed" << std::endl;
}

int main(){
	int intA = 12;
	int intB = 15;

	int *pIntA = &intA;
	int *pIntB = &intB;

	std::cout << "Integer A is " << intA << std::endl;
	std::cout << "Integer B is " << intB << std::endl;

	swap(pIntA, pIntB);

	std::cout << "Integer A is now " << intA << std::endl;
	std::cout << "Integer B is now " << intB << std::endl;

	// swap them back again but without pointer variables
	swap(&intA, &intB);

	std::cout << "Integer A is back to " << intA << std::endl;
	std::cout << "Integer B is back to " << intB << std::endl;
}

The above case compiles because the compiler is aware of swap(). The function header (the expression void swap(int *pIntA, int *pIntB)) is accompanied with the function body (the function definition enclosed in curly braces).

In most real-world programs, it is generally best to first declare to the compiler which functions it can expect to build from. This is achieved through function prototyping. In this case, the function name and parameter list are listed near the top of the source file, before main().

#include<iostream>

// function prototype (since swap() is defined after it is 
// called [by main()] it must be represented by a prototype before the call)
void swap(int *pIntA, int *pIntB);

int main(){
	int intA = 12;
	int intB = 15;

	int *pIntA = &intA;
	int *pIntB = &intB;

	std::cout << "Integer A is " << intA << std::endl;
	std::cout << "Integer B is " << intB << std::endl;

	swap(pIntA, pIntB);

	std::cout << "Integer A is now " << intA << std::endl;
	std::cout << "Integer B is now " << intB << std::endl;

	// swap them back again but without pointer variables
	swap(&intA, &intB);

	std::cout << "Integer A is back to " << intA << std::endl;
	std::cout << "Integer B is back to " << intB << std::endl;
}

void swap(int *pIntA, int *pIntB){
	if (pIntA != NULL && pIntB != NULL){
		int pTempInt = *pIntA;
		
		*pIntA = *pIntB;
		*pIntB = pTempInt;
		std::cout << "Integers swapped..." << std::endl;
		return;
	}

	std::cout << "Looks like a null pointer was passed" << std::endl;
}

The parameters declared in the function prototype and function function header need not be identical. It is also possible to omit names in the function prototype if preferred.

void swap(int*, int*);

Functions with the same name but different parameter list are examples of function overloading.

 void swap(int *x, int *y);

 // provide function overloading capability (this will require 
 // a separate function definition)
 void swap(double *x, double *y);

Function overloading permits a more readable code base, particularly when the only difference between all functions is the parameter list. If function overloading becomes excessive then consider using function templates instead.

Function templates

Function templates are somewhat similar syntactically to Java generics. A general identifier T is used to represent a variable of unknown type and found throughout the template defintion.

#include<iostream>

// template function prototype (could also bring the template 
// definition up here before it is called [by main()])
template<typename T> void swap(T *pVarA, T *pVarB);

int main(){
	int intA = 12;
	int intB = 15;

	int *pIntA = &intA;
	int *pIntB = &intB;

	std::cout << "Integer A is " << intA << std::endl;
	std::cout << "Integer B is " << intB << std::endl;

	swap(pIntA, pIntB);

	std::cout << "Integer A is now " << intA << std::endl;
	std::cout << "Integer B is now " << intB << std::endl;
	std::cout << std::endl;

	double dblA = 12.2;
	double dblB = 13.3;

	std::cout << "Double A is " << dblA << std::endl;
	std::cout << "Double B is " << dblB << std::endl;

	swap(&dblA, &dblB);

	std::cout << "Double A is now " << dblA << std::endl;
	std::cout << "Double B is now " << dblB << std::endl;
}

// could also write template<class T> but some programmers prefer
// not to confuse this notation with real C++ classes; typename is
// more neutral
template<typename T> void swap(T *pVarA, T *pVarB){
	if (pVarA != NULL && pVarB != NULL){
		T pTempVar = *pVarA;
		
		*pVarA = *pVarB;
		*pVarB = pTempVar;
		std::cout << "Variables swapped..." << std::endl;
		return;
	}

	std::cout << "Looks like a null pointer was passed" << std::endl;
}

The main() method

The main() method can have arguments defined which end up representing command-line parameters.

 int main(int argc, char* argv[]){
  // do stuff...
 }

The first argument argc represents the number of arguments and the second argument argv is an array of pointers to strings (recall above ideas). Each character pointer points to the first character of the string. The first element of argv is always the program name and so argc is always at least 1.

Passing by reference (C++ only)

Instead of passing by value or pointer, one can use references to the values and pass references.

 swap(int &x, int &y);

The above call assigns x and y as references to the function-local parameters a and b. Thus, the function call swap(a, b) can change the values of the actual-parameters. This provides an alternative to the pointer approach above. The function prototype would take the form of:

 void swap(int &x, int &y);

Passing arrays

Recall that arrays can be thought of as pointers to the first element of an array. The type of the array and number of elements present indicates how much storage space is required.

Passing arrays is equivalent to passing pointers.

 fun(int A[], int n);

The above call is equivalent to the following but not specific for arrays:

 fun(int *A, int n);

The function prototype would be something like:

 void fun(int someArray[], int someInt);

Passing read-only addresses

It is possible to force the compiler to prevent the function from editing the address of variable by using the const keyword in the function prototype:

 void swap(const int &x, int &y);

In this case, the reference corresponding to x cannot be changed, however, that corresponding to y can.

Passing structures

One can pass structures as values, references and pointers. Here is an example of a function definition involving references and pointers to structures. Both approaches allow one to change the actual-parameter passed.

 int area(struct Rectangle &z){
  z.length++;
  return z.length * z.breadth;
 }


 int area2(struct Rectangle *p){
  (p*).length++;
  return p->length* p->breadth;
 }

Static variables

The keyword static declares variables (as well as objects and functions) which are initialised once (all subsequent initialisations are ignored) and retain their value for the duration of the function call, including main().

Returning pointers and references

In all cases, do not return the address (by pointer or reference) of local variables to the function. This is because the local variable is freed once the function terminates and so the pointer or reference will be pointing to an undefined region of memory. Instead, build a new pointer to the local variable in the heap with new and return the pointer.

The first snippet will not work but the second will:

 double* someFunc(double data){
  double someData = 3*data;
  // not good: someData will be on the stack and go out of scope
  return &someData;
 }

 double* someFunc2(double data){
  // okay: the heap has a new double, which is remains in scope on exit
  double* something = new double(3*data);
  return something;
 }

Returning references is also similarly fraught with errors. Additionally, avoid the mistake of returning pointers when references are called for:

 double& someFunc3(double someArray[]){
  // do stuff...
  return someArray[2];
 }

Here the function is defined such that it returns a reference to the third element of someArray. Note that using &someArray[2] would return the address (pointer) of the result, which is not an alias to the result.

Overall, someArray[] was initialised before someFunc3 was called.

Pointers to functions

In relation to the syntax for returning pointers, it is also possible to build pointers to functions.

// returns a pointer to a double
 double* someFunc(double data){
  // do stuff
 }

 // standard function delarations
 double specificFunc(double Num1, char* charPointer);
 double specificFunc2(double NumA, char* charPointerB);

// declares a pointer to a function, returning a double
 double (*pSomeFunc)(double, char*);

 // assign the pointer to a function with same signature;
 // note the syntax does not refer to the address of the function (as one would
 // for initialising pointers to variables)
 pSomeFunc = specificFunc;

 // pointers are mutable so one can re-assign (sometimes useful to 
 // call different functions with the same pointer in a block; see next section)
 pSomeFunc = specificFunc2;

 // or declare and assign in one go...
 double (*pSomeOtherFunc)(double, char*) = specificFunc2;

In the above case, the function name is someFunc and has two arguments, one of type double and the second of type pointer to char. The pointer pSomeFunc can be assigned to any function with the same signature.

The pointer can then be used in place of the function.

// continuing from the above...

double someDouble = 3.3;

char someChar = 'E';
char* charPointer = &someChar;

pSomeFunc(someDouble, charPointer);

Functions as arguments of other functions

With the pointer to a function, one can define a function argument list where at least one argument is a pointer to a function. This provides a way for the calling function to invoke other functions via the pointer.

#include <iostream>

// these would normally be defined after main()
double callingFunc(double anArray[], double (*pSomeFunc)(int));
double callingFunc2(double anArray[], double (*pSomeFunc)(int));
double intToDouble(int value);

int main(){
  double array[] = {1.1, 2.2};

  // intToDouble is executed, in the context of both callingFunc and 
  // callingFunc2, before the latter functions (almost like the application
  // is establishing return values to both calls to intToDouble)
  double newDouble =  callingFunc(array, intToDouble);
  double newDouble2 =  callingFunc2(array, intToDouble);

  // returns 5
  std::cout << "newDouble returned: " << newDouble << " " << std::endl;

  // returns 10
  std::cout << "newDouble returned: " << newDouble2 << " " << std::endl;
}

double intToDouble(int value){
  std::cout << "intToDouble: " << static_cast<double>(value) << std::endl;
  return static_cast<double>(value);
}

double callingFunc(double anArray[], double (*pSomeFunc)(int)){
  if (anArray[1] == 2.2){
    return pSomeFunc(5);
  }
  return 0.5;
}

double callingFunc2(double anArray[], double (*pSomeFunc)(int)){
  if (anArray[0] == 1.1){
    return pSomeFunc(10);
  }
  return 0.5;
}

Note that pSomeFunc is assigned to intToDouble without params so that it is called within the calling function, i.e. not

double newDouble =  callingFunc(array, intToDouble(5));

As mentioned, intToDouble(5) is executed before callingFunc(), i.e. callingFunc does not call intToDouble within the body. One assigns pSomeFunc to intToDouble by passing intToDouble as a parameter and let callingFunc use the pointer.

Array of pointers to functions

One can also declare an array of pointers to functions and call a specific element (function) with an index.

// assigns a constant argument
char charFunc(char);
char charFuncAgain(char);

char (*pArrayFunc[2])(char) = {charFunc, charFuncAgain};

// call the second function element
char someChar = pArrayFunc[1];

Default arguments

Default arguments amount to constant arguments which are assumed if the function call does not pass a required parameter. The default argument must be included in the function prototype.

#include <iostream>

void printMe(const char message[] = "Default message");

int main(){
  const char alternativeMessage[] = "Something odd going on here";

  // use the default
  printMe();

  // use the alternative
  printMe(alternativeMessage);
}

void printMe(const char message[]){
  std::cout << message << endl;
}