TIME-OUTS

What happens to a process between blocking and unblocking is not strictly a concern of the multi-programming mechanism. However, in the system under discussion, processes mostly block subject to time-outs. That is, a process unblocks when the event it is waiting for occurs, or when an associated time-interval expires.

Thus, blocking on a time-out happens almost as frequently as context switches. Furthermore, the time-out mechanism interacts closely with the multi-programming mechanism, and also deals with hardware timers and interrupts. Consequently, it is convenient to treat the time-out mechanism as part of the multi-programming mechanism.

Thus, the multi-programming mechanism is extended with the following two calls:

  Block_with_time_out (time_out)
  Handle_time_out

Additionally, Make_ready and Execute_next are extended to appropriately update the time-out data structure when a process becomes ready before its time-out expires. That is, the rest of the operating system can always make processes ready by using these calls, regardless of whether the processes have pending time-outs.

The following high level description of the time-out mechanism suffices to specify what is required of the mechanism:

Let $C$ ₁, C₂, ..., C_n be a set of (non-negative) integer counters and associate a counter with each process in the system. Initially, all counters are set to infinity.

Let $E$ ₁, E₂, ..., E_m be a sequence of events, where each event causes either:

A) All counters to be decremented by one.

B) A given counter to be set to a given value.

Let furthermore, the following conventions apply:

When a process blocks, pending a time-out, its counter is set to the duration of the time-out.
When a process unblocks, before it times out, its counter is set to infinity.
When a counter reaches zero, it is set to infinity and its associated process is unblocked on a time-out.

Events of type A correspond to ``timer ticks'' and events of type B correspond to processes being added to or being removed from the list of processes waiting on time-outs. It thus follows that the above description describes exactly what the time-out mechanism must accomplish.

Clearly, the repeated action of decrementing all the counters will represent the bulk of the cost of a mechanism implemented exactly as outlined above. This cost can be reduced by various techniques, as described below. However, the minimal cost of the mechanism will increase as the length of the time-out list increases. This unsatisfactory state of affairs can only be rectified by hardware means. That is, by providing as many timers as there are processes.

The above description of the time-out mechanism fails to exploit three properties which allow cost reduction. The first such property is that it is not necessary to decrement counters with infinite values. The second is that a counter with a non-infinite value can only be decremented or set to infinity (because a process cannot block while it is blocked). The third property is that the sequence of events can be expected to have long runs of events of type A (timer ticks).

The first property can be exploited by maintaining an actual list of the PCBs of processes waiting on time-outs, and decrementing only their counters. However, care must be taken that the cost of maintaining such a list does not exceed the cost of the redundant decrements (assuming that the number of counters will have a small upper bound). The second property makes it easier to maintain the list, since it ensures that a PCB will never be added to the list if it is already in it.

The third property can be exploited by collapsing runs of type A. That is, by decrementing the counters with the run length at the end of a run, rather than by one for every event. The principal problem with this technique is that counters may reach zero during a run. That is, the tests for zero cannot be reduced to the same extent as the decrementing of the counters.

The problem can be solved by refining the technique to end a run as soon as the run length equals the minimum counter value. If, as can be expected, time-outs (counters reaching zero) occur rarely, few runs will actually be broken up.

The refinement introduces the need to find the minimum counter value. This can be done by using the loop which decrements the counters to also find the resulting minimum counter value. Thus, the minimum value at the end of a run can be found at a modest cost. Moreover, it is straightforward to update this minimum for each event of type B, making it possible to have the minimum available at the start of the next run of events of type A.

Collapsing runs should save effort when most events of type A come in runs of about three or more. And since a type A event corresponds to a timer tick every millisecond, it is very likely that this will be the case.

It should be clear by now that a hardware timer can be used to process type A events. That is, when a process is added to or removed from the time-out list (type B events), the following must be done:

  add:
    stop the hardware timer
    subtract the elapsed time from all counters and ready all 
      processes with counters <= 0
    add the PCB
    start the hardware timer with the new minimum

  remove:
    remove the PCB
    stop the hardware timer
    subtract the elapsed time from all counters and ready all 
      processes with counters <= 0
    start the hardware timer with the new minimum

If the hardware timer counts down to zero, its interrupt is processed similarly, omitting the add/remove step.

Prof Herman Venter
Fri Apr 26 09:15:11 GMT 1996