Arrays in C++
Integers and characters are simple variables or scalar variables.
Arrays are collections, described as vector variables.
Array initialisation need not assign all elements. All non-assigned elements are padded with zeros.
int A[5] = {2,4}; //this would give {2,4,0,0,0}
Array size is allocated explicitly (A[5]) or by passing values (A[] = {2,3,4,5,6,7,8}).
Arrays can be accessed using A[i] or i[A] or with a pointer *(A+2). The first two calls access the i-th element, the third call access the third element.
The address of a given element is achieved with &A[2]. The difference in the address given by, for example, &A[2] and &A[3] indicates the size of each element (compiler dependent).
Static and dynamic arrays
Static arrays are fixed in size, dynamic arrays are allocated memory on run-time. Both arrays would reside in the stack.
To declare an array on the heap, one would use pointers:
int *p;
p = new int[5];
...
delete []p;
Above, the array is in the heap whereas the pointer is in the stack. To simulate a truly dynamic array one would need to declare and copy an array to another array of greater capacity.
int *p = new int[5];
int *q = new int[10];
for(int i = 0; i < 5; i++){
q[i] = p[i];
}
delete []p;
p = q;
q = NULL;
Arrays are contiguous structures and therefore the last element will likely neighbour some other unrelated variable in the heap, temporarily limiting the present array length.
Multi-dimensional arrays
Declare as int A[m][n], where m is the number of rows and n is the number of columns, very much like conventional matrices.
In memory, multidimensional arrays are stored linearly:
A = { row1, row2, row3, ..., rowN };
We can declare 2D arrays with pointers, this time the pointer is of type array[no_of_arrays]. For example, int *A[3] is a pointer to the first of three arrays A. The address of the first array is given by &A[1], where the index starts at one. The pointer *A[1] resides in the stack.
When one initialises the elements in each array, one uses zero-based indices:
A[0] = new int[4];
A[1] = new int[4];
A[2] = new int[4];
A[1][2] = 5; //etc...
One can also set a pointer to a +pointer to an +array, where +pointer resides in the heap.
int **A; //in the stack
A = new int*[3]; //the pointer to the first array of three resides in the heap
//these arrays are also on the heap
A[0] = new int[4];
A[1] = new int[4];
A[2] = new int[4];
One accesses each element in the array of arrays using nested for loops by accessing the +pointer first and then processing each element in the pointed array.
In C, the above pointer to a pointer of arrays is achieved using:
int **C; //in the stack
C = (int **) malloc(3*sizeof(int*));
C[0] = (int*) malloc(4*sizeof(int));
C[1] = (int*) malloc(4*sizeof(int));
C[2] = (int*) malloc(4*sizeof(int));
//then the usual assignments:
C[1][0] = ...;
C[2][3] = ...;
free C[0];
free C[1];
free C[2];
free C;
Array representation in memory
The memory address of an array is only known at run-time. The compiler obtains the address of an element by starting from the first element: location A[0] + i*sizeof(element), where i is the index. That is, the first element has an index i = 0, which is simply given by the address location A[0].
The location of A[0] is updated at run-time. Arrays are zero-based because the allocation of address involves minimal operations. Compare the above convention to arrays which are one-based: location B[i] + (i-1)sizeof(element). The operation i-1 must be carried out for all arrays in the program.
The address of elements of multidimensional arrays is deduced by using either:
- row-major format mapping
- column-major format mapping
Row-major mapping
Rows, denoted by the first token [m] of A[m][n] are traversed left-to-right, i.e. row index [m] first, then column index [n]. Here is the sequential representation of a 2D array:
int A[3][4];
//initialise as needed
We use the variables i and j to denote a specific position in the zero-based matrix. The tokens m and n are used to denote the dimensions (number of elements) of the matrix; both are non-zero. The last element of rows 0 and 1 precede the first elements of rows 1 and 2, respectively. There are three rows (m = 3) and four elements (n = 4) per row. How does one deduce the address of A[1][2]?
- Take the row index (i.e. i = 1) and skip ahead to the second row (row i = 1) i.e.
location(A[0][0] + 1*n*sizeof(element)). In this case, n = 4 but writing it as such then presents a general expression for any dimensionn. - Then we home in on the third element of row 1:
location(A[0][0] + 1*n*sizeof(element) + 3*sizeof(element)). We do not needmhere since we known exactly how many elements to traverse. This simplifies tolocation(A[0][0] + (1*n + 3)*sizeof(element)). - We can generalise here. The location of element
jin rowi(soA[i][j]) is given by:location(A[0][0] + (i*n + j)*sizeof(element)). Note that the dimensionm(row number) is not mentioned in this formula.
The zero-based nature of arrays minimised the number of operations for all multidimensional arrays. For 2D arrays, we reduce each address query by two operations when seeking a particular element.
Column-major mapping
In row-major mapping, row indices are traversed before column indices. For column-major mapping, column indices are traversed first. In the notation A[m][n], this is right-to-left.
There are three rows (m = 3) and three elements (n = 3) per row. How does one deduce the address of A[1][2] or element a(12) above? In memory, a(20) resides immediately before a(01) etc.
- Take the column index (i.e. j = 2) and skip ahead to the third column (column 2) i.e.
location(A[0][0] + 2*m*sizeof(element)). The generalisation ofmallows us to accommodate any dimensionm. - Then we home in on the second row (i = 1):
location(A[0][0] + 2*m*sizeof(element) + 1*sizeof(element)). This simplifies tolocation(A[0][0] + (2*m + 1)*sizeof(element)) - We can generalise here. The location of element
jin rowi(soA[i][j]) is given by:location(A[0][0] + (j*m + i)*sizeof(element)). There is no mention of the dimensionn.
The difference is that the indices i and j are swapped.
Both row-major and column-major mappings are of equal time and storage complexity. C and C++ compilers tend to use row-major mappings.