How to Win the CPU Battle with Class Scheduling for OpenVMS

This article is based on session VS059 presented at DECUS '95 in San
Francisco and has also been published in the Canadian DECUS magazine,
DECUScope.

Abstract

CPU class scheduling has long been available on IBM operating systems. This
capability was implemented as of OpenVMS V6.0 within the OpenVMS scheduler
although it has not received the amount of attention that it deserves. Hooks
are available for customers to write their own CPU class scheduling policies. 
This article discusses some potential uses of class scheduling and the callouts
available to system programmers for writing a class scheduler policy.

About the author

Curtis Rempel is an OpenVMS systems programmer.  He has managed OpenVMS systems
for over 12 years and specializes in OpenVMS internals and system performance
issues.


                                     ---


Imagine for a moment the following scenario. You arrive at the office and your
phone is already ringing. You answer it and discover that the caller is another
user calling to complain about poor response time.   The boss has been on your
case to "fix it" for what seems like an eternity and now he is standing in
front of your desk leading an angry mob of users.  You've got a problem.  What
do you do?  Tempting though it might be, hiding under your desk and calling 911
is not the solution.  Then you suddenly remember reading something somewhere
about CPU class scheduling and wonder if it might be able to calm the natives
and reduce the amount of coffee you've been drinking lately.

Fact or fiction?  Perhaps you have actually experienced this situation
yourself. Is your system response time less than adequate?  Are your users
constantly complaining that the system is slow?  Is productivity going down the
drain?  If you answered "yes" to any of these questions, read on.  Help may be
available.

First, let's take a look at some of the common potential causes of poor system
response time.

Incompatible job mix:

The jobs you run on your system may be competing against each other for
resources.  One classic example of an incompatible job mix is production versus
development work.  Production jobs need to be completed within a specified
time, meanwhile, your developers are banging the heck out of the CPU debugging
their code.  Development work typically starts with "make it work" and then
"make it efficient".  It's the "make it work" part consisting of repeated
compilation and unoptimized code that eats away at your CPU resources in a
hurry. Obviously this behavior is going to have an affect on your production
work.

Another example of an incompatible job mix is keystroke sensitive applications
versus compute bound jobs. Are you running timing sensitive applications like
editors and data entry/update forms which require quick cursor response while
at the same time you have number crunchers, report generators, and other
compute bound jobs running?  Setting different base priorities can help to a
degree but as we'll see later, this doesn't always work.  You can sometimes
fool users into perceiving snappier keyboard response by playing around with
quantum but there are tradeoffs and we'll discuss this later as well.

Insufficient CPU capacity

You workload may simply be too much for the CPU to properly handle.  Have you
outgrown your CPU capacity?  If you have, class scheduling probably will not be
of any value to you.  Useful tools like Digital's POLYCENTER Performance
Advisor can help you determine if you are in fact out of capacity.


Now let's take a look at some possible solutions to the problem.


Priority Based Scheduling

The most common method of priority based scheduling used on OpenVMS systems is
"interactive" and "batch" priority.  Interactive priority is typically set at
the default base value of 4 while batch priority is lower, and varies in value
depending on the value of PRIORITY_OFFSET in SYSGEN and the number and types of
batch queues that you have.

With priority based scheduling, higher priority jobs will pre-empt lower
priority jobs. While this may seem to be adequate, batch jobs tend to take up
the "slack" CPU time, which can be as much as 90% or more if you have only a
few small activity interactive keystroke type jobs.  The problem here is that
although the interactive jobs will pre-empt the lower priority batch jobs, the
system is perceived by the users to be sluggish.  Adjusting quantum to favor
interactive users may be all that is required to satisfy your users. There are
caveats however.  More on this later.  


Job Mix Realignment

Job mix realignment involves separating jobs on separate CPU's (i.e.
development machines & production machines). Note that there are some potential
costs involved and include such things as additional hardware, licenses,
administration, and support.   This solution is optimal if you can afford the
extra costs involved.


Working Hours Realignment

Altering the working hours of your employees is another possible solution,
however, you can count on it being met with varying degrees of opposition. What
impact would a different work schedule have on your customers?  Could they get
the information/goods they need when they need it?  Other costs involved with
changing the work schedule could include building expenses (utility costs),
employee productivity and morale, and of course, maintaining contact with the
other vendors that you may depend upon.  If your particular application is not
affected by these variables, simply distributing your workload throughout the
day might be all that is required to solve the problem.


Adjusting CPU Quantum

Quantum, simply stated, is the CPU "time slice" that you are allocated before
you must give up control of the CPU.  The default value for quantum is normally
set at 200 ms.  Jobs may voluntarily or involuntarily give up control of the
CPU before quantum end due to such events as waiting for I/O to complete or
because a higher priority process has become computable.

Process quantum is controlled by the SYSGEN parameter QUANTUM. A short quantum
favors interactive response time while long quantum favors batch job
throughput. 

How can you compromise on the value of quantum? One way is to have two values
for quantum, a short quantum for prime time jobs (interactive users) and a long
quantum for non-prime time jobs (batch jobs).  A recurring batch job could be
used to lower and raise quantum automatically since it is a dynamic parameter
(it does not require a system reboot to take effect).  Note that only processes
created after quantum is changed will receive the new quantum value.

There is of course a trade-off with lowering quantum.  Jobs execute for shorter
periods of time before being pre-empted by quantum end processing which adds a
certain degree of overhead.  You end up with extra overhead (primarily observed
through a higher level of interrupt stack activity) but the system appears to
be snappier to interactive keystroke applications although you are really not
getting any more work done.


Finally, let's look at how CPU class scheduling may be of help to us.

With class scheduling, CPU time is allocated to job "classes" at regular
intervals instead of individual processes. Processes in the same class share
the CPU time allocated to the class which means that CPU bound jobs can impact
"normal" jobs if grouped into the sames class.  Thus, it is important to give
some thought to how you want to group jobs into scheduling classes. Each class
cannot exceed the CPU time allocated to that class by the scheduler policy.  Be
careful not confuse this CPU time limit with with the AUTHORIZE /CPUTIME
qualifier which limits the amount of process CPU time per session.
 
Implementing class scheduling typically costs next to nothing (some programming
is required, your mileage may vary). It is important to remember that class
scheduling does NOT give you increased capacity but rather it can help to you
better manage and control your CPU resources and possibly prevent workloads
from interfering with each other. In the course of implementation, you may find
that the productivity of some workloads is actually lowered by class
scheduling. Thus, the actual implementation becomes a process of making an
initial estimate of job classes and assigning processes to those classes,
measuring and evaluating the results, and repeating the procedure again until
you are satisfied.


Defining scheduling classes

The algorithm for grouping jobs into scheduling classes is up to you. You may
wish to consider grouping jobs by workload (developers versus production), by
work group (sales, accounting, etc.), or by application type. Any other way you
can think of to distinguish separate job classes according to your particular
requirements is your decision.  However, be sure to assign similar jobs to the
same scheduling class to prevent one or more jobs from monopolizing the class
CPU time.

Some sample job class definitions are shown in Figure 1, 2, and 3.


Implementation

Class scheduling is implemented under OpenVMS through the use of an executive
loadable image.

By default, this image is not loaded when OpenVMS boots.  To make it available,
use SYSMAN and then reboot:

  $ MCR SYSMAN SYS_LOADABLE ADD OPENVMS CLASS_SCHEDULER
  $ @SYS$UPDATE:VMS$SYSTEM_IMAGES
  $ @SYS$SYSTEM:SHUTDOWN

Once you have the class scheduler execlet loaded, you need to write a program
which calls the SYS$SCHED system service.


Overview of SYS$SCHED

Format: SYS$SCHED (func,p1,p2,p3)

    func = function code specifying action to perform
    p1, p2, p3 = arguments depending on the function code specified

Privileges required:

  ALTPRI         required to affect processes
  GROUP or WORLD required to affect processes in the same UIC group or in 
                 different UIC groups
  SYSPRV         required to set the timeout value


SYS$SCHED Function Codes

CSH$_READ_ALL - Returns a buffer containing information for all processes

CSH$_READ_NEW - Returns a buffer containing information for all processes that
have not been placed into a scheduling class

CSH$_READ_QUANT - Returns a buffer containing information about how many ticks
are left for each class

CSH$_SET_CLASS - Places processes in classes with or without windfall
capability

CSH$_SET_NEW - Indicates to the class scheduler that the next READ_NEW will
return information about the calling process

CSH$_SET_QUANT - Establishes class quantum and enables class scheduling

CSH$_SET_TIMEOUT - Establishes a nonstandard timeout


A Real Life Situation ...

To cut costs, management insists that developers and production users share the
same machine (a recipe for disaster).  Thus, two different workloads are
competing for CPU time and interfering with each other.  An upgrade to Alpha in
the next budget year is planned.  Until then a solution must be found.

Goal #1: Minimize the impact on production users
Goal #2: Solution must be quick and cost little or nothing 
Goal #3: Get the users and the boss off your back (you might want to make
         this goal #1).


How was this problem solved? 

Ten job classes were defined: CLASS_0 through CLASS_9 where CLASS_1 was used to
indicate a class with maximum 10% CPU time so on up to CLASS_9 with a 90% CPU
time limit.  CLASS_0 was reserved for jobs that should not be class scheduled.

The "account" field in SYSUAF.DAT was used to group users into two of these
classes.  For production users, the production job UAF records were modified to
have an account field of CLASS_0 (not class scheduled).  For developers,
development job UAF records were modified to have an account field of CLASS_2
(20% CPU). Using the account field is an easy method of grouping jobs into
classes since the account field is one of the items returned by SYS$SCHED using
the CSH$_READ_ALL and CSH$_READ_NEW function codes.  By using an account field
of CLASS_0 and CLASS_2, the scheduler policy code is able to distinguish
between the two different job classes.

The scheduling interval was then defined: 0800 - 1700 M-F (prime).

The class scheduler policy program was then written in FORTRAN, however it
could have been written in any high level language such as C such as the
example (SYS$EXAMPLES:CLASS.C) provided by Digital. FORTRAN was readily
available and some shops with "number crunchers" have FORTRAN compilers and so
this code will be easily to implement.

The class scheduler was started at the beginning of the scheduling interval and
stopped at the end of the scheduling interval using POLYCENTER Scheduler (a
recurring batch job would have worked just as well).  In this case, we weren't
concerned about about non-prime response time (no class scheduling).


Test Results

To demonstrate the effectiveness of the class scheduler, the following test
environment was configured on an idle system: A user session (BIGJOB1) was
created to run a compute bound job in the 20% CPU class:

   10     GO TO 10
    
          END
                                                       
A second session was used to run 

   $ MONITOR PROCESS /TOPCPU

A third session was used to run the class scheduler program.

The compute bound job was run with class scheduling first disabled (figure 4)
and as expected, it proceeded to consume all available CPU time.

Next, the class scheduler program was activated (figure 5) and it was observed
that the total CPU consumption by BIGJOB1 was reduced to 20% despite the fact
that no other jobs were executing, thus demonstrating that the 20% CPU class
was in effect.

A second compute bound job (BIGJOB2) was then run along with the first job in
the 20% CPU class (figure 6) to demonstrate the sharing of CPU time within the
class.  BIGJOB1 and BIGJOB2 each received 10% of the total CPU time.


Figure 4 - Class Scheduling Disabled - 1 Job

                            OpenVMS Monitor Utility 
                             TOP CPU TIME PROCESSES 
                                 on node PROD1 
                            27-OCT-1995 16:22:53.01 
 
                                     0         25        50        75       100 
                                     + - - - - + - - - - + - - - - + - - - - -+ 
 23C000E8  BIGJOB1                99  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
                                     |         |         |         |          | 
 23C0006F  DECW$TE_006F            1 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     + - - - - + - - - - + - - - - + - - - - -+ 


Figure 5 - Class Scheduling Enabled - 1 Job

                            OpenVMS Monitor Utility 
                             TOP CPU TIME PROCESSES 
                                 on node PROD1 
                            27-OCT-1995 16:24:02.02 
 
                                     0         25        50        75       100 
                                     + - - - - + - - - - + - - - - + - - - - -+ 
 23C000E8  BIGJOB1                20  XXXXXXXX 
                                     |         |         |         |          | 
 23C0006F  DECW$TE_006F            1 
                                     |         |         |         |          | 
              
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     + - - - - + - - - - + - - - - + - - - - -+


Figure 6 - Class Scheduling Enabled - 2 Jobs

                            OpenVMS Monitor Utility 
                             TOP CPU TIME PROCESSES 
                                 on node PROD1 
                            27-OCT-1995 16:26:37.76 
 
                                     0         25        50        75       100 
                                     + - - - - + - - - - + - - - - + - - - - -+ 
 23C000E8  BIGJOB1                10  XXXX 
                                     |         |         |         |          | 
 23C00060  BIGJOB2                10  XXXX
                                     |         |         |         |          | 
 23C0006F  DECW$TE_006F            1 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     |         |         |         |          | 
 
                                     + - - - - + - - - - + - - - - + - - - - -+


Conclusion

CPU class scheduling can help you better manage your CPU resources and possibly
allow you to defer a CPU upgrade but it will NOT give you more capacity.
Implementation is relatively easy and cost effective. With class scheduling,
your phone may finally stop ringing and the boss will quit hassling you (ya,
right).


Questions?

Email: vmsguy@home.com
       curtis.rempel@shaw.ca