int* p; | Pointers to data |
int[3] s; | Static arrays |
int[] a; | Dynamic arrays |
int[char[]] x; | Associative arrays |
int* p;These are simple pointers to data, analogous to C pointers. Pointers are provided for interfacing with C and for specialized systems work. There is no length associated with it, and so there is no way for the compiler or runtime to do bounds checking, etc., on it. Most conventional uses for pointers can be replaced with dynamic arrays, out and inout parameters, and reference types.
int[3] s;These are analogous to C arrays. Static arrays are distinguished by having a length fixed at compile time.
int[] a;Dynamic arrays consist of a length and a pointer to the array data. Multiple dynamic arrays can share all or parts of the array data.
int[] a; // dynamic array of ints int[4][3] b; // array of 3 arrays of 4 ints each int[][5] c; // array of 5 dynamic arrays of ints. int*[]*[3] d; // array of 3 pointers to dynamic arrays of pointers to ints int[]* e; // pointer to dynamic array of ints
// dynamic array of ints int[] a; int a[]; // array of 3 arrays of 4 ints each int[4][3] b; int[4] b[3]; int b[3][4]; // array of 5 dynamic arrays of ints. int[][5] c; int[] c[5]; int c[5][]; // array of 3 pointers to dynamic arrays of pointers to ints int*[]*[3] d; int*[]* d[3]; int* (*d[3])[]; // pointer to dynamic array of ints int[]* e; int (*e[]);Rationale: The postfix form matches the way arrays are declared in C and C++, and supporting this form provides an easy migration path for programmers used to it.
The handle to an array is specified by naming the array, as in p, s or a:
int* p; int[3] s; int[] a; int* q; int[3] t; int[] b; p = q; p points to the same thing q does. p = s; p points to the first element of the array s. p = a; p points to the first element of the array a. s = ...; error, since s is a compiled in static reference to an array. a = p; error, since the length of the array pointed to by p is unknown a = s; a is initialized to point to the s array a = b; a points to the same array as b does
int[10] a; // declare array of 10 ints int[] b; b = a[1..3]; // a[1..3] is a 2 element array consisting of // a[1] and a[2] foo(b[1]); // equivalent to foo(0) a[2] = 3; foo(b[1]); // equivalent to foo(3)The [] is shorthand for a slice of the entire array. For example, the assignments to b:
int[10] a; int[] b; b = a; b = a[]; b = a[0 .. a.length];are all semantically equivalent.
Slicing is not only handy for referring to parts of other arrays, but for converting pointers into bounds-checked arrays:
int* p; int[] b = p[0..8];Slicing for bit arrays is only allowed if the slice's lower bound falls on a byte boundary:
bit[] b; ... b[0..8]; // ok b[8..16]; // ok b[8..17]; // ok b[1..16]; // error, lower bound is not on a byte boundaryMisaligned bit array slices will cause an ArrayBoundsError exception to be thrown at runtime.
int[3] s; int[3] t; s[] = t; the 3 elements of t[3] are copied into s[3] s[] = t[]; the 3 elements of t[3] are copied into s[3] s[1..2] = t[0..1]; same as s[1] = t[0] s[0..2] = t[1..3]; same as s[0] = t[1], s[1] = t[2] s[0..4] = t[0..4]; error, only 3 elements in s s[0..2] = t; error, different lengths for lvalue and rvalueOverlapping copies are an error:
s[0..2] = s[1..3]; error, overlapping copy s[1..3] = s[0..2]; error, overlapping copyDisallowing overlapping makes it possible for more aggressive parallel code optimizations than possible with the serial semantics of C.
int[3] s; int* p; s[] = 3; same as s[0] = 3, s[1] = 3, s[2] = 3 p[0..2] = 3; same as p[0] = 3, p[1] = 3
int[] a; int[] b; int[] c; a = b ~ c; Create an array from the concatenation of the b and c arraysMany languages overload the + operator to mean concatenation. This confusingly leads to, does:
"10" + 3produce the number 13 or the string "103" as the result? It isn't obvious, and the language designers wind up carefully writing rules to disambiguate it - rules that get incorrectly implemented, overlooked, forgotten, and ignored. It's much better to have + mean addition, and a separate operator to be array concatenation.
Similarly, the ~= operator means append, as in:
a ~= b; a becomes the concatenation of a and bConcatenation always creates a copy of its operands, even if one of the operands is a 0 length array, so:
a = b a refers to b a = b ~ c[0..0] a refers to a copy of b
In general, (a[n..m] op e) is defined as:
for (i = n; i < m; i++) a[i] op e;So, for the expression:
a[] = b[] + 3;the result is equivalent to:
for (i = 0; i < a.length; i++) a[i] = b[i] + 3;When more than one [] operator appears in an expression, the range represented by all must match.
a[1..3] = b[] + 3; error, 2 elements not same as 3 elements
int[3] abc; // static array of 3 ints int[] def = [ 1, 2, 3 ]; // dynamic array of 3 ints void dibb(int *array) { array[2]; // means same thing as *(array + 2) *(array + 2); // get 2nd element } void diss(int[] array) { array[2]; // ok *(array + 2); // error, array is not a pointer } void ditt(int[3] array) { array[2]; // ok *(array + 2); // error, array is not a pointer }
double[][] matrix;declares matrix as an array of pointers to arrays. (Dynamic arrays are implemented as pointers to the array data.) Since the arrays can have varying sizes (being dynamically sized), this is sometimes called "jagged" arrays. Even worse for optimizing the code, the array rows can sometimes point to each other! Fortunately, D static arrays, while using the same syntax, are implemented as a fixed rectangular layout:
double[3][3] matrix;declares a rectangular matrix with 3 rows and 3 columns, all contiguously in memory. In other languages, this would be called a multidimensional array and be declared as:
double matrix[3,3];
int[4] foo; int[] bar = foo; int* p = &foo[0]; // These expressions are equivalent: bar[] bar[0 .. 4] bar[0 .. length] bar[0 .. bar.length] p[0 .. length] // 'length' is not defined, since p is not an array bar[0]+length // 'length' is not defined, out of scope of [ ] bar[length-1] // retrieves last element of the array
.sizeof | Returns the array length multiplied by the number of bytes per array element. |
.length | Returns the number of elements in the array. This is a fixed quantity for static arrays. |
.ptr | Returns a pointer to the first element of the array. |
.dup | Create a dynamic array of the same size and copy the contents of the array into it. |
.reverse | Reverses in place the order of the elements in the array. Returns the array. |
.sort | Sorts in place the order of the elements in the array. Returns the array. |
Dynamic array properties are:
.sizeof | Returns the size of the dynamic array reference, which is 8 on 32 bit machines. |
.length | Get/set number of elements in the array. |
.ptr | Returns a pointer to the first element of the array. |
.dup | Create a dynamic array of the same size and copy the contents of the array into it. |
.reverse | Reverses in place the order of the elements in the array. Returns the array. |
.sort | Sorts in place the order of the elements in the array. Returns the array. |
Examples:
p.length error, length not known for pointer s.length compile time constant 3 a.length runtime value p.dup error, length not known s.dup creates an array of 3 elements, copies elements s into it a.dup creates an array of a.length elements, copies elements of a into it
array.length = 7;This causes the array to be reallocated in place, and the existing contents copied over to the new array. If the new array length is shorter, only enough are copied to fill the new array. If the new array length is longer, the remainder is filled out with the default initializer.
To maximize efficiency, the runtime always tries to resize the array in place to avoid extra copying. It will always do a copy if the new size is larger and the array was not allocated via the new operator or a previous resize operation.
This means that if there is an array slice immediately following the array being resized, the resized array could overlap the slice; i.e.:
char[] a = new char[20]; char[] b = a[0..10]; char[] c = a[10..20]; b.length = 15; // always resized in place because it is sliced // from a[] which has enough memory for 15 chars b[11] = 'x'; // a[15] and c[5] are also affected a.length = 1; a.length = 20; // no net change to memory layout c.length = 12; // always does a copy because c[] is not at the // start of a gc allocation block c[5] = 'y'; // does not affect contents of a[] or b[] a.length = 25; // may or may not do a copy a[3] = 'z'; // may or may not affect b[3] which still overlaps // the old a[3]To guarantee copying behavior, use the .dup property to ensure a unique array that can be resized.
These issues also apply to concatenating arrays with the ~ and ~= operators.
Resizing a dynamic array is a relatively expensive operation. So, while the following method of filling an array:
int[] array; while (1) { c = getinput(); if (!c) break; array.length = array.length + 1; array[array.length - 1] = c; }will work, it will be inefficient. A more practical approach would be to minimize the number of resizes:
int[] array; array.length = 100; // guess for (i = 0; 1; i++) { c = getinput(); if (!c) break; if (i == array.length) array.length = array.length * 2; array[i] = c; } array.length = i;Picking a good initial guess is an art, but you usually can pick a value covering 99% of the cases. For example, when gathering user input from the console - it's unlikely to be longer than 80.
try { for (i = 0; ; i++) { array[i] = 5; } } catch (ArrayBoundsError) { // terminate loop }The loop is correctly written:
for (i = 0; i < array.length; i++) { array[i] = 5; }Implementation Note: Compilers should attempt to detect array bounds errors at compile time, for example:
int[3] foo; int x = foo[3]; // error, out of boundsInsertion of array bounds checking code at runtime should be turned on and off with a compile time switch.
int[3] a = [ 1:2, 3 ]; // a[0] = 0, a[1] = 2, a[2] = 3This is most handy when the array indices are given by enums:
enum Color { red, blue, green }; int value[Color.max] = [ blue:6, green:2, red:5 ];If any members of an array are initialized, they all must be. This is to catch common errors where another element is added to an enum, but one of the static instances of arrays of that enum was overlooked in updating the initializer list.
bit[10] x; // array of 10 bitsThe amount of storage used up is implementation dependent. Implementation Note: on Intel CPUs it would be rounded up to the next 32 bit size.
x.length // 10, number of bits x.size // 4, bytes of storageSo, the size per element is not (x.size / x.length).
Dynamic arrays in D suggest the obvious solution - a string is just a dynamic array of characters. String literals become just an easy way to write character arrays.
char[] str; char[] str1 = "abc";char[] strings are in UTF-8 format. wchar[] strings are in UTF-16 format. dchar[] strings are in UTF-32 format.
Strings can be copied, compared, concatenated, and appended:
str1 = str2; if (str1 < str3) ... func(str3 ~ str4); str4 ~= str1;with the obvious semantics. Any generated temporaries get cleaned up by the garbage collector (or by using alloca()). Not only that, this works with any array not just a special String array.
A pointer to a char can be generated:
char *p = &str[3]; // pointer to 4th element char *p = str; // pointer to 1st elementSince strings, however, are not 0 terminated in D, when transferring a pointer to a string to C, add a terminating 0:
str ~= "\0";The type of a string is determined by the semantic phase of compilation. The type is one of: char[], wchar[], dchar[], and is determined by implicit conversion rules. If there are two equally applicable implicit conversions, the result is an error. To disambiguate these cases, a cast is appropriate:
cast(wchar [])"abc" // this is an array of wchar charactersString literals are implicitly converted between chars, wchars, and dchars as necessary.
Strings a single character in length can also be exactly converted to a char, wchar or dchar constant:
char c; wchar w; dchar d; c = 'b'; // c is assigned the character 'b' w = 'b'; // w is assigned the wchar character 'b' w = 'bc'; // error - only one wchar character at a time w = "b"[0]; // w is assigned the wchar character 'b' w = \r; // w is assigned the carriage return wchar character d = 'd'; // d is assigned the character 'd'
str ~= "\0"; printf("the string is '%s'\n", (char *)str);The second way is to use the precision specifier. The way D arrays are laid out, the length comes first, so the following works:
printf("the string is '%.*s'\n", str);In the future, it may be necessary to just add a new format specifier to printf() instead of relying on an implementation dependent detail.
Associative arrays are declared by placing the KeyType within the [] of an array declaration:
int[char[]] b; // associative array b of ints that are // indexed by an array of characters. // The KeyType is char[] b["hello"] = 3; // set value associated with key "hello" to 3 func(b["hello"]); // pass 3 as parameter to func()Particular keys in an associative array can be removed with the delete operator:
delete b["hello"];This confusingly appears to delete the value of b["hello"], but does not, it removes the key "hello" from the associative array.
The InExpression yields a pointer to the value if the key is in the associative array, or null if not:
int* p; p = ("hello" in b); if (p != null) ...KeyTypes cannot be functions or voids.
If the KeyType is a struct type, a default mechanism is used to compute the hash and comparisons of it based on the binary data within the struct value. A custom mechanism can be used by providing the following functions as struct members:
uint toHash(); int opCmp(KeyType* s);For example:
import std.string; struct MyString { char[] str; uint toHash() { uint hash; foreach (char c; s) hash = (hash * 9) + c; return hash; } int opCmp(MyString* s) { return std.string.cmp(this.str, s.str); } }
.size | Returns the size of the reference to the associative array; it is typically 8. |
.length | Returns number of values in the associative array. Unlike for dynamic arrays, it is read-only. |
.keys | Returns dynamic array, the elements of which are the keys in the associative array. |
.values | Returns dynamic array, the elements of which are the values in the associative array. |
.rehash | Reorganizes the associative array in place so that lookups are more efficient. rehash is effective when, for example, the program is done loading up a symbol table and now needs fast lookups in it. Returns a reference to the reorganized array. |
import std.file; // D file I/O int main (char[][] args) { int word_total; int line_total; int char_total; int[char[]] dictionary; printf(" lines words bytes file\n"); for (int i = 1; i < args.length; ++i) // program arguments { char[] input; // input buffer int w_cnt, l_cnt, c_cnt; // word, line, char counts int inword; int wstart; input = std.file.read(args[i]); // read file into input[] foreach (char c; input) { if (c == '\n') ++l_cnt; if (c >= '0' && c <= '9') { } else if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z') { if (!inword) { wstart = j; inword = 1; ++w_cnt; } } else if (inword) { char[] word = input[wstart .. j]; dictionary[word]++; // increment count for word inword = 0; } ++c_cnt; } if (inword) { char[] word = input[wstart .. input.length]; dictionary[word]++; } printf("%8ld%8ld%8ld %.*s\n", l_cnt, w_cnt, c_cnt, args[i]); line_total += l_cnt; word_total += w_cnt; char_total += c_cnt; } if (args.length > 2) { printf("-------------------------------------\n%8ld%8ld%8ld total", line_total, word_total, char_total); } printf("-------------------------------------\n"); char[][] keys = dictionary.keys; // find all words in dictionary[] for (int i = 0; i < keys.length; i++) { char[] word; word = keys[i]; printf("%3d %.*s\n", dictionary[word], word); } return 0; }