Set values

Most languages supporting sets allow set values to be denoted by explicit enumerations such as {2, 3, 5, 7}.

Very few languages, however, allow a version of the most general notation for sets: {x | φ(x)} where φ(x) is some statement about x. Since the general notation is very expressive, the expressive power of a language will be greatly enhanced if it provides a version of this notation.

When devising a version of the general notation that is suitable for a programming language, the first thing to note is that it is not practical to always interpret x as potentially being any member of the universal domain of discourse. Thus, the programmer should be required to explicitly indicate the set of values that x could possibly be. For example:

{x {2...9} | Is-a-prime(x)}

It is also useful to allow x to be an expression, possibly containing more than one local variable. For example

{a+b, a {`a',`b'}, b {`1',`2'}}
which is equivalent to
{`a1', `a2', `b1', `b2'}

It is not difficult to generate the potential members of sets specified in the general format. Nor is it difficult to decide if a potential member qualifies for membership. However, a simplistic compiler faced with

{x Integer | Is-a-prime(x) and x < 10}
is likely to generate code that will consider all integers from min(Integer) to max(Integer). To do otherwise, the compiler will have to analyze the selection condition. Such an analysis is not trivial for arbitrary conditions. Moreover, in the example, Is-a-prime may well have been defined in such a way that that even a sophisticated analysis algorithm will fail to determine that it will return False for every argument value less than 2.

The potential generation of atrocious code for examples such as the one above is probably the reason why so few languages allow set values to be specified using the {x | φ(x)} notation. However, potential inefficiency is not a good enough reason to eliminate an otherwise useful language construct. After all, even a simplistic compiler is likely to produce acceptable code for

{x {2...9} | Is-a-prime(x)}

Moreover, in a case where the set cannot be as easily specified by an explicit enumeration of values, for example

{x {1...1000} | A-complicated-rule(x)}
an optimizing compiler may well compute the members of this set during compilation, whereas a programmer denied the use of the general notation would probably have used a run-time routine to compute the set, which would be much more difficult to optimize away.

The following syntax is proposed for set-values:

set-value =
`{' {expression [`...' expression] [`,']} `}' |
`{' all [expression] [where]
{identifier `' expression}
[`|' expression] `}'
Note that the noise words all and where are needed to make it clear that
{all a+b where a {`a',`b'}, b {`1',`2'}}
is not a set of three elements, of which the last two are boolean values. Such an interpretation is possible when given only
{a+b, a {`a',`b'}, b {`1',`2'}}


Prof Herman Venter
Mon Apr 15 13:55:08 GMT 1996