Arrays in C++
Integers and characters are simple variables or scalar variables.
Arrays are collections, described as vector variables.
Array initialisation need not assign all elements. All non-assigned elements are padded with zeros.
int A[5] = {2,4}; //this would give {2,4,0,0,0}
Array size is allocated explicitly (A[5]
) or by passing values (A[] = {2,3,4,5,6,7,8}
).
Arrays can be accessed using A[i]
or i[A]
or with a pointer *(A+2)
. The first two calls access the i-th element, the third call access the third element.
The address of a given element is achieved with &A[2]
. The difference in the address given by, for example, &A[2]
and &A[3]
indicates the size of each element (compiler dependent).
Static and dynamic arrays
Static arrays are fixed in size, dynamic arrays are allocated memory on run-time. Both arrays would reside in the stack.
To declare an array on the heap, one would use pointers:
int *p;
p = new int[5];
...
delete []p;
Above, the array is in the heap whereas the pointer is in the stack. To simulate a truly dynamic array one would need to declare and copy an array to another array of greater capacity.
int *p = new int[5];
int *q = new int[10];
for(int i = 0; i < 5; i++){
q[i] = p[i];
}
delete []p;
p = q;
q = NULL;
Arrays are contiguous structures and therefore the last element will likely neighbour some other unrelated variable in the heap, temporarily limiting the present array length.
Multi-dimensional arrays
Declare as int A[m][n]
, where m is the number of rows and n is the number of columns, very much like conventional matrices.
In memory, multidimensional arrays are stored linearly:
A = { row1, row2, row3, ..., rowN };
We can declare 2D arrays with pointers, this time the pointer is of type array[no_of_arrays]. For example, int *A[3]
is a pointer to the first of three arrays A
. The address of the first array is given by &A[1]
, where the index starts at one. The pointer *A[1] resides in the stack.
When one initialises the elements in each array, one uses zero-based indices:
A[0] = new int[4];
A[1] = new int[4];
A[2] = new int[4];
A[1][2] = 5; //etc...
One can also set a pointer to a +pointer to an +array, where +pointer resides in the heap.
int **A; //in the stack
A = new int*[3]; //the pointer to the first array of three resides in the heap
//these arrays are also on the heap
A[0] = new int[4];
A[1] = new int[4];
A[2] = new int[4];
One accesses each element in the array of arrays using nested for loops by accessing the +pointer first and then processing each element in the pointed array.
In C, the above pointer to a pointer of arrays is achieved using:
int **C; //in the stack
C = (int **) malloc(3*sizeof(int*));
C[0] = (int*) malloc(4*sizeof(int));
C[1] = (int*) malloc(4*sizeof(int));
C[2] = (int*) malloc(4*sizeof(int));
//then the usual assignments:
C[1][0] = ...;
C[2][3] = ...;
free C[0];
free C[1];
free C[2];
free C;
Array representation in memory
The memory address of an array is only known at run-time. The compiler obtains the address of an element by starting from the first element: location A[0] + i*sizeof(element)
, where i is the index. That is, the first element has an index i = 0, which is simply given by the address location A[0]
.
The location of A[0] is updated at run-time. Arrays are zero-based because the allocation of address involves minimal operations. Compare the above convention to arrays which are one-based: location B[i] + (i-1)sizeof(element)
. The operation i-1
must be carried out for all arrays in the program.
The address of elements of multidimensional arrays is deduced by using either:
- row-major format mapping
- column-major format mapping
Row-major mapping
Rows, denoted by the first token [m]
of A[m][n]
are traversed left-to-right, i.e. row index [m]
first, then column index [n]
. Here is the sequential representation of a 2D array:
int A[3][4];
//initialise as needed
We use the variables i
and j
to denote a specific position in the zero-based matrix. The tokens m
and n
are used to denote the dimensions (number of elements) of the matrix; both are non-zero. The last element of rows 0 and 1 precede the first elements of rows 1 and 2, respectively. There are three rows (m = 3) and four elements (n = 4) per row. How does one deduce the address of A[1][2]
?
- Take the row index (i.e. i = 1) and skip ahead to the second row (row i = 1) i.e.
location(A[0][0] + 1*n*sizeof(element))
. In this case, n = 4 but writing it as such then presents a general expression for any dimensionn
. - Then we home in on the third element of row 1:
location(A[0][0] + 1*n*sizeof(element) + 3*sizeof(element))
. We do not needm
here since we known exactly how many elements to traverse. This simplifies tolocation(A[0][0] + (1*n + 3)*sizeof(element))
. - We can generalise here. The location of element
j
in rowi
(soA[i][j]
) is given by:location(A[0][0] + (i*n + j)*sizeof(element))
. Note that the dimensionm
(row number) is not mentioned in this formula.
The zero-based nature of arrays minimised the number of operations for all multidimensional arrays. For 2D arrays, we reduce each address query by two operations when seeking a particular element.
Column-major mapping
In row-major mapping, row indices are traversed before column indices. For column-major mapping, column indices are traversed first. In the notation A[m][n]
, this is right-to-left.
There are three rows (m = 3) and three elements (n = 3) per row. How does one deduce the address of A[1][2]
or element a(12) above? In memory, a(20) resides immediately before a(01) etc.
- Take the column index (i.e. j = 2) and skip ahead to the third column (column 2) i.e.
location(A[0][0] + 2*m*sizeof(element))
. The generalisation ofm
allows us to accommodate any dimensionm
. - Then we home in on the second row (i = 1):
location(A[0][0] + 2*m*sizeof(element) + 1*sizeof(element))
. This simplifies tolocation(A[0][0] + (2*m + 1)*sizeof(element))
- We can generalise here. The location of element
j
in rowi
(soA[i][j]
) is given by:location(A[0][0] + (j*m + i)*sizeof(element))
. There is no mention of the dimensionn
.
The difference is that the indices i
and j
are swapped.
Both row-major and column-major mappings are of equal time and storage complexity. C and C++ compilers tend to use row-major mappings.