Data structures kept in shared memory areas have long been the principal concept used to achieve co-operation among different processes. However, along with the obvious advantages, allowing processes to share memory has certain disadvantages:
It is possible for an operating system designer to accept these disadvantages as unavoidable and to provide generalized mechanisms that allow arbitrary processes to share memory (as is done in UNIX System V [6]). However, since non-shared memory multi-processors (for example, hundreds of workstations inter-connected by means of a high-speed local-area network) are likely to become more common, if not prevalent, it makes sense to avoid memory sharing among processes. It therefore also makes sense to ask whether it is necessary to provide complex, generalized mechanisms to allow arbitrary processes to share memory.
Memory sharing only makes sense if the sharing processes are
a) executed by a single physical processor, or
b) executed by separate physical processors that share access to a common physical memory.
a) Assuming that the processes that are co-operating via memory sharing are all executed by the same physical processor, the question arises why they should not be consolidated into a single process. The multiple processes cannot together execute faster than a single process since the available real processor cycles remain fixed. In fact, the overhead incurred by extra context switches and process synchronization make such multi-process formulations slower than single-process formulations.
The main argument for having multiple co-operating processes on the same physical processor is that multiple-process solutions can sometimes be simpler to program than single-process solutions. A typical case is a server process that may receive requests from different clients in quick succession, each of which may involve waiting for some event to happen in the environment of the server (for example the completion of a disk access). Unless a single-process server explicitly breaks up each request into multiple subrequests, none of which require waiting, and interleaves the execution of different requests on this complicated basis, the single-process server will be slower than a multi-process server since context switching among the various co-operating processes automatically achieves the desired interleaving.
However, it is straightforward to incorporate a local scheduler into the code of a single-process server. This makes it possible to formulate the server as a set of memory-sharing, co-operating ``light-weight'' processes, while actually executing the server as a single operating system process. Since the ``light-weight'' scheduler is ``user code'', it can dispense with many of the precautions that the operating system ``heavy-weight'' scheduler must take. It can therefore be much more efficient to construct a set of memory-sharing, co-operating processes without the aid of expensive operating system mechanisms.
There will, of course, be cases where memory-sharing, co-operating processes need the separate identities and relative isolation available only to operating system, ``heavy-weight'' processes. However, these cases should be comparatively rare, and confined mainly to ``systems programming'' situations. It thus seems reasonable to require these processes to be declared as ``privileged and trusted'' (by suitably authorized users) and to be explicitly tied to specific physical processors. It is then possible to provide a simple mechanism that dispenses with most of the complications of generality and security checking to enable such privileged processes to obtain access to shared memory areas.
Another use for memory sharing among processes is to allow them to access common blocks of code, such as language-provided run-time systems. However, such code sharing need not affect the logic of a process, and the issue can thus be relegated to a convention between the linker and loader, rather than be treated as part of the definition of a process. In some systems, memory sharing is also used by debuggers. However, such debuggers can be accommodated by providing ``privileged and trusted'' server processes that allow them to achieve the same ends.
b) Assuming now that there are multiple physical processors that have access to a shared physical memory, and therefore that the memory-sharing processes can be executed in true parallel, the ``light-weight'' process solution is no longer applicable, and the ``privileged and trusted'' solution too primitive and restrictive. Furthermore, the existence of ``teams'' of closely co-operating processes executing in true parallel opens up new opportunities and complications, such as ``co-scheduling'' [3] and non-blocking (``spin-lock'') forms of synchronization.
It is, however, possible to accommodate such hardware without resorting to generalized mechanisms by extending the ``light-weight'' process solution. On non-shared memory systems, the routines for implementing a ``light-weight'' scheduler will be provided as a standard library, to be linked into the code of the ``heavy-weight'' process hosting the collection of ``light-weight'' processes. However, on shared-memory multi-processors these routines will simply invoke ``hidden'' system calls, which see to it that the ``light-weight'' processes are executed on different physical processors, while otherwise being part of a single ``heavy-weight'' process.
The discussion on shared memory can be summarized as follows:
It is easier and more efficient to implement a process concept that does not allow processes to share memory. Furthermore, an application formulated as a set of processes that do not share memory can be executed on a wider range of computer systems, which is highly desirable.
However, when it is essential to exploit a shared-memory multi-processor, this can be achieved by introducing ``light-weight'' processes that are ``internal'' to normal ``heavy-weight'' processes. These ``light-weight'' processes can also be supported on a single processor by incorporating an internal ``user-code'' scheduler into the code of a single ``heavy-weight'' process. Thus, applications that explicitly exploit shared-memory multi-processors can still be ported to other kinds of computer systems, albeit with a performance penalty in some cases.
It should be noted that the concept of a ``light-weight'' process is independent from the concept of a ``heavy-weight'' process. The rest of this paper is exclusively concerned with ``heavy-weight'' processes.