Obscure C99 Array Features
Paul J. Lucas
Posted on January 14, 2022
Introduction
C99 introduced a number of new features for arrays. Even though C99 is over 20 years old, you seldom see these new features used in the wild. Because of that, you’re less likely to be familiar with them and so less likely to use them in your own code (but maybe you shouldn’t!). So here’s a tour of those features.
Flexible Array Members
Introduced in C99, the last member of a struct
with more than one named member may be a flexible array member, that is an array of an unspecified size:
struct s {
size_t n;
double d[]; // Flexible Array Member
};
Typically, such a struct
serves as a “header” for a larger region of memory, perhaps containing a binary file read from disk.
Note that it’s up to your code to somehow remember how big the array is. (This can, of course, be stored in a member that precedes the array in the struct
.)
When sizeof
is applied to such a struct
, it’s as if the array isn’t there — except there may be some additional padding.
While you can have such a struct
on the stack, it’s not useful since no size is set aside for the array (hence, accessing the array is undefined behavior). To be useful, such a struct
has to be allocated on the heap: it’s then when the size is specified:
struct s *ps = malloc( sizeof(struct s) + sizeof(double[n]) );
Lastly, note that assignments among such struct
s do not copy the array (because the compiler has no idea how big it is):
struct s *ps1, *ps2;
// ...
*ps1 = *ps2; // copies only 'n' member
Incidentally, C++ never adopted flexible array members from C99.
Variable Length Arrays
Prior to C99, all arrays had to be declared to be of a fixed length (known at compile-time). Introduced in C99, variable length arrays (VLAs, not to be confused with flexible array members) can be declared to be of a variable length (not known until run-time). For example:
void f( size_t n ) {
int a[n];
Not only that, but the sizeof
operator, historically a compile-time operator, is now sometimes a run-time operator — when its argument is a VLA:
size_t sz = sizeof(a) / sizeof(a[0]); // sz = 10
(The first sizeof
is evaluated at run-time because its argument is a VLA; the second sizeof
is still evaluated at compile-time.)
Variable Length Arrays Caveat
One serious caveat to VLAs is that, if the length is too big, it will silently overflow the stack. Additionally, unlike malloc()
returning NULL
upon failure, there's no way to detect when a VLA overflows the stack. Hence, if the size can be “too big,” your code has to guard against it:
size_t const A_LEN_MAX = 1024;
// ...
if ( n > A_LEN_MAX )
// Do something else?
int a[n];
However, if you know that A_LEN_MAX
is the maximum safe size, then you might as well just declare a
to be of that size and not use a VLA.
Incidentally, you could do small size optimization:
void f( size_t n ) {
int a[A_LEN_MAX];
int *const p = n <= A_LEN_MAX ? a : malloc( n * sizeof(*p) );
// use only 'p' to access array
if ( p != a )
free( p );
}
That is, if n
isn’t too big, use the (fixed sized) array on the stack; otherwise, use a dynamically sized array in the heap. This has the advantage of saving on the calls to malloc()
and free()
for “small” n
yet still works for “large” n
. But notice that a VLA is not being used.
Hence, the moral is: use VLAs only when you don’t know the size at compile time but can guarantee that it won’t be “too big.” However, this is pretty much never true.
In hindsight, VLAs, though they seem convenient at times, are problematic, so much so that C11 made VLAs an optional feature. Incidentally, C++ never adopted VLAs from C99.
Array Syntax for Parameters
As you should be aware, array syntax can be used to declare function parameters, but, as you should also be aware, it’s just syntactic sugar since the compiler rewrites such parameters as pointers:
void f( int a[] ); // int *a
Note that I’m intentionally writing “array syntax for parameters” and not “array parameters” because “array parameters,” despite appearances, simply don’t exist in C.
The only potential benefit of using array syntax for parameters is that it conveys to the human reader that a
is presumed to be a pointer to at least one int
rather than a exactly one int
. However, it’s only a presumption and not a guarantee since you can call such a function with a null pointer:
f( NULL ); // f’s 'a' will be NULL
Note that adding a size doesn’t help:
void f( int a[10] ); // int *a
While again this might convey to a human reader that a
is presumed to be an array of 10 int
s, the compiler ignores the size.
Array syntax for parameters in C is a remnant of how pointers are declared in New B (the precursor to C). See The Development of the C Language, Dennis M. Ritchie, April, 1993.
Non-Null Array Syntax Pointers for Parameters
One of the features added in C99 was the ability to declare an “array” function parameter that must not be null and be of a minimum size:
void f( int a[static 10] );
If you try to pass either NULL
or an array that has fewer than 10 int
s, the compiler will warn you.
This marks yet another overloading of the
static
keyword in C since thisstatic
has nothing to do with either linkage or duration. Incidentally, C++ never adopted this syntax.
Also incidentally, C99 did not introduce a parallel way to specify that a pointer parameter must not be null.
Array Syntax for Parameters Caveat
Using array syntax for function parameters can also be dangerous:
void f( int a[10] ) { // int *a
for ( size_t i = 0; i < sizeof(a)/sizeof(*a); ++i ) {
// ...
The intention is to iterate over all the elements of the array, but, despite the sizeof
expression being correct for an array, a
is, again, not an array, but a pointer; so you’ll get the size of a pointer divided by the size of an int
. Fortunately, gcc
will warn about this.
Qualified Array Syntax for Parameters
To drive home that parameters declared with array syntax really are pointers, you can change them:
int ra[10]; // real array
void f( int pa[] ) { // int *pa
++ra; // error (as expected)
++pa; // OK (surprisingly)
C99 also added the ability to qualify the rewritten pointer:
void f( int pa[const] ) { // int *const pa
++pa; // error now
In addition to
const
, you can also qualify the pointer withvolatile
andrestrict
(CVR).
Note that neither of these:
void f( int const pa[] ); // pointer to const int
void f( const int pa[] ); // same as above
is the same thing: the const
outside the []
refers to the int
and not pa
.
Why doesn’t the compiler convert parameters with array syntax to
const
pointers? Becauseconst
wasn’t a part of C when Ritchie invented it.
Incidentally, C++ never adopted this syntax.
Variable Length Array Syntax for Parameters
C99 also added the ability to use VLAs for function parameters:
void f( size_t n, int a[n] ) { // int *a
That is, the size of the “array” is given by an integral parameter that precedes it. Note, however, that a
is still a pointer. Despite having the size information at run-time, sizeof(a)
will still return the size of the pointer. Hence, this “feature” serves only to convey to the human reader that n
is the presumed size of the “array.”
However, this “feature” is actually useful for multidimensional arrays. But before we get to that, a quick refresher on multidimensional array syntax for function parameters.
Multidimensional Array Syntax for Parameters
As you should be aware, array syntax can also be used to declare function parameters for multidimensional “arrays”:
void f( int a[10][20] ); // int (*a)[20]
The rule that the compiler converts array syntax for a function parameter into a pointer happens only for the first (left-most) dimension; the remaining dimension(s) keep their “array-ness.” Hence, a
is a pointer to a real array of 20 int
s.
Note that the parentheses are necessary: without them, it would be an array of 20 pointers to
int
. FYI: to help decipher cryptic C declarations, you can use cdecl.
Pointers to array don’t often occur in C programs since the name of an array “decays” into a pointer to its first element. In most cases, this is good enough even though the size information is lost. However, a pointer to an array retains the array’s size as part of the type, so assignments between pointers to arrays of different size are warned about:
int (*p3)[3]; // pointer to array 3 of int
int (*p5)[5]; // pointer to array 5 of int
p5 = p3; // warning: incompatible pointers
In particular, given:
int a[10];
int *pi = a; // pointer to int (via decay)
int (*pa)[10] = &a; // pointer to array 10 of int
both pi
and pa
point to the same location in memory (here, &a[0]
), but a “pointer to array” is an entirely different thing from a pointer that results from array decay. For pi
, the compiler “forgets” the size of the array to which it points; for pa
, it “remembers” the size.
Part of the reason pointers to array aren’t used much is because it’s clunky to access array elements since you have to dereference the pointer first:
int e1 = (*p3)[1]; // must dereference p3 first
However, you can dereference the pointer once into another pointer then use that pointer:
int *p = *p3;
int e1 = p[1]; // same as: (*p3)[1]
Multidimensional VLA Parameters
Multidimensional array syntax for parameters can be used for VLAs:
void f( size_t m, size_t n, int a[m][n] ) {
Since the compiler rewrites only the first array dimension as a pointer, the above is really:
void f( size_t m, size_t n, int (*a)[n] ) {
that is a
is a pointer to a VLA of n
int
s. In this case, the VLA is actually a useful feature since the n
allows the compiler to know the length of each row of the array. Additionally, sizeof
(the first one below) once again becomes a run-time operator:
size_t sz = sizeof(*a) / sizeof(**a); // sz = n
Unlike VLAs in general, VLAs used for function parameters are safe since the actual arrays passed to the function can be (and often are) normal arrays:
void f( size_t m, size_t n, int a[m][n] ) {
// ...
}
void g( void ) {
int a[10][20];
f( 10, 20, a );
}
There’s no new VLA being created at run-time here, so it can’t overflow the stack.
Multidimensional VLA Parameter Declarations
When declaring (as opposed to defining) functions, C allows you to omit the parameter names; however, if you do that, then there’s no name to specify the size of a VLA; but C99 added a new syntax for this case:
void f( size_t, size_t, int[][] ); // error
void f( size_t, size_t, int[*][*] ); // OK
That is, you use *
to denote a VLA of an unnamed size. Note that since the first array dimension is always converted to a pointer, the *
is needed only starting with the second dimension:
void f( size_t, size_t, int[][*] ); // same as previous
Hence, you never need *
when using single dimension array syntax.
Conclusion
C99 added the new array features of:
- Flexible Array Members.
- VLAs (which are unsafe, so you probably shouldn’t use them).
- VLAs for function parameters (which are safe, but really only useful for multidimensional arrays).
-
static
that requires non-null, minimum-sized arrays be passed for parameters. - The ability to
const
,volatile
, orrestrict
qualify the rewritten pointers that decayed from arrays for function arguments.
Posted on January 14, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024
November 28, 2024