Introduction to Datatrieve

                                 Application Design

                                       Outline

          1.  System Analysis / Decision Phase 

              1.  Initial considerations 

              2.  Record Layout 

              3.  Data  Requirements  (save  records,  delete  old records, move
                  old records to history file, etc.) 

              4.  Report Requirements 


          2.  System Foundations 

              1.  Define Record Description 

              2.  Define File (Datatrieve, EDIT/FSL, RMSDEF) 


          3.  Create Update Routines 

          4.  Create Reports 


Introduction to Datatrieve                                               Page  2
Application Design


          The  degree  of success with which any application is implemented with
     Datatrieve (or any other product)  depends  to  a  large  extent  upon  the
     planning  which goes into the implementation.  This session will cover some
     of the steps involved with emphasis on their  relationship  to  Datatrieve.
     Because  every  application  is  different,  and because some decisions may
     depend upon subjective judgments, or company standards,  or  even  personal
     preference,  it is difficult to give more than general guidelines, and some
     examples showing the relative merits of some different options.  

          Before  anything  is  actually done in Datatrieve, some time should be
     given to planning the application design.  In all cases, the designer  must
     consider  both  the  information  source  and the final result together, to
     produce the best design.  If there is any fixed rule which  can  be  stated
     for  all  applications,  it  is this:  that the end result should always be
     kept in view during all stages of application development.  It is  all  too
     easy  to lose track of the original application and the persons for whom it
     was supposed to be designed.  Some common pitfalls are:  

          1.  The  designer  assumed  that  the  user  would  be  familiar  with
              certain data, or procedures, or other background information.  

              Often  the  user knows less than the designer has assumed would be
              known;  this can be such simple things as assuming that  the  user
              will  know  (without  being  reminded)  that  department codes are
              always 4 characters long, or that his department's purchase orders
              always  begin  with the letter "W" and have a hyphen after the 3rd
              digit, etc.;  but it can also involve such basic functions of  the
              computer  system,  such  as  not  realizing  that when you erase a
              record  it  is  gone  forever  (some  people   might   assume   it
              automatically goes into some sort of backup or recovery area) 

          2.  The  designer  puts  in  extra features because they were "easy to
              add", or "look nice".  

              Sometimes  this  leads  to an application which is too complicated
              for the intended user to understand.  If the application is  in  a
              new  area  or for users who are new to computing, it may be better
              to do only the most simple application first, and wait to add  the
              additional  features  after the users have adjusted to the system,
              even if this means more application work.  (Of course, it  usually
              doesn't  hurt to do some advance work, such as including fields in
              the record definition which you know will be needed in the future:
              they can be "hidden" with a redefines until needed).  

          3.  The  application  makes  no  provision  for occasions where a data
              item is missing or incorrect.  

              Good  designs  should  make  provisions  for  exception  handling,
              correction of data on entry, validation of data on entry, or  some
              combination  of  the three.  The classic example is not making any
              provision for a customer who overpays a bill  or  invoice:   other

Introduction to Datatrieve                                               Page  3
Application Design


              examples  might  be such things as looking up purchase orders on a
              partial order number when the user has lost or  torn  the  invoice
              and  does  not  have  a  complete number, or being able to enter a
              telephone number without an area code when someone forgot  to  ask
              for  it or write it down, and be able to add it later.  Similarly,
              if the application takes data from one file (or domain)  and  uses
              it to access data in another file (domain), there needs to be some
              way to escape the procedure gracefully if the first file  contains
              a data item which is incorrect.  

          4.  The application has nothing to do with the needs of the user. 

              Sometimes  the  application  was incorrectly described;  sometimes
              the user doesn't know what can be done and so  asks  for  what  he
              thinks  is  what  he wants;  sometimes the person implementing the
              application doesn't understand it (or perceives it to be something
              it is not);  and sometimes the designer just goes off on a tangent
              and comes up with the wrong application altogether.  This is  best
              avoided  by having the designer and the user work together through
              the application, and not just for a brief meeting to describe  the
              application  at  the  beginning,  and complain about it when it is
              finished.  One of the great advantages of Datatrieve is that it is
              simple  enough  to  learn  that  the  users  can develop their own
              applications, and this is one of  the  best  ways  to  insure  the
              application fits the needs of the user.  


          Once  the  purpose  of  the  application is clear, the next step is to
     define all of the items of data which will be required by the  application,
     so that the record can be defined.  Some possible sources are:  


          1.  Blank  Forms.   If  an  existing  procedure  using  printed  paper
              forms is to be moved to Datatrieve, the form will have the desired
              data items marked on it.  

          2.  Existing  Reports.   There  may  be  existing  reports produced by
              hand or other means, which are to be produced by Datatrieve.   The
              data  items  in  the report will indicate the data required in the
              application.  

          3.  Other  Documents.   You  may  be  required  to supply documents to
              other  personnel,  accounting  departments,   or   to   government
              agencies:   you  can  examine  past  documents  to  determine what
              information is involved, and base your design on that.  This might
              include inventory reports, employment records, sales receipts, tax
              records, etc.  

          4.  Copy  an  existing  application.   Very often a new application is
              similar to an existing one, and only needs the addition of  a  few

Introduction to Datatrieve                                               Page  4
Application Design


              new  data items, or the re-naming of some data items, and this can
              be much faster than designing an application from scratch.  

          5.  The  data  already  exists.   There  are situations where the data
              has already been placed in a file (by other  programs  written  in
              FORTRAN,  BASIC,  COBOL, etc., or by transfer from another system)
              and it is now desired to access it with Datatrieve to  obtain  new
              reports,  to  do  on-line inquiry, etc.  In this case, you have to
              create a  record  definition  which  matches  the  layout  of  the
              existing record:  this layout has to be obtained from whomever set
              up the existing data, or is responsible for maintaining  it.   You
              may  still  have  choices to make on data names, redefines fields,
              and on procedures and report formatting, however.  

          6.  Hard  Work.   It  may  be  that  an entirely new application is to
              be designed, or a hand operation for which  there  is  no  current
              document  defining the operation is to be moved to Datatrieve.  In
              this case, it may be necessary to follow the procedure (or  follow
              a  person  who is going through the procedure) and write down what
              the steps are, to  determine  the  information  used.   This  must
              generally  be  done  more than once, to be certain of catching all
              the possible variations, and do not neglect the exception cases.  


          Each data item should be examined to determine if it is really needed,
     that it isn't duplicated elsewhere, and that it is of the proper type  (for
     example, that fields holding names and addresses will be long enough).  Now
     is also the time to determine if there is any other data item which  should
     be  added:   sometimes  an existing application may have data items missing
     because no-one has gotten around to having new forms printed, or there  has
     been  a recent change in procedure, and a good time to add these data items
     is when the application is developed.  

          Once the data items have been determined, a record layout can be made.
     The primary goal here is to group related items together, and label them in
     a meaningful way.  Consider the following two record definitions.  

     01 FIRST_RECORD.
         10 NAME.
             20 FIRST PIC X(12).
             20 MI PIC X.
             20 LAST PIC X(14).
         10 HOME_ADDRESS.
             20 STREET PIC X(20).
             20 CITY PIC X(12).
             20 STATE PIC XX.
             20 ZIP PIC 9(5).
         .
         .
         .
     ;

Introduction to Datatrieve                                               Page  5
Application Design


     01 SECOND_RECORD.
         10 FIRST PIC X(12).
         10 ZIP PIC 9(5).
         10 LAST PIC X(14).
         10 CITY PIC X(12).
         10 STREET PIC X(20).
         10 MI PIC X.
         10 STATE PIC XX.
         .
         .
         .
     ;


     I  hope  most  people  would  realize  that  the first definition is better
     organized than the second, with related data items  places  together.   The
     group  headings (NAME, ADDRESS) are not essential, but do serve to organize
     the data, and to make it easier to retrieve related  items  together:   for
     example,  on a report one could say PRINT ADDRESS rather than PRINT STREET,
     CITY, STATE, ZIP (though you can still retrieve individual fields when  you
     wish).  They also help to document the use of the data items, which is very
     useful when the application requires modification, or  new  personnel  must
     familiarize themselves with existing applications.  

          It  is  also  useful  to  consider  the  final report while the record
     definition is being made:  if one  particular  report  is  used  more  than
     others,  then it will be helpful to arrange the fields in the record in the
     same order they will appear in the report, so that one can say PRINT record
     rather  than having to explicitly state all fields to be printed.  If there
     are multiple groups in a record, one can at least arrange the items  within
     a  group  in  the  order  they will most often appear, and then specify the
     group names as they are needed.  Even if more than one  report  is  needed,
     there  will usually be groups of fields which are often associated (such as
     the fields which make up an address), and so should  be  grouped  together.
     This  point  can  not be overstated:  whenever any application is designed,
     one should always keep the final result in mind in order to keep the design
     heading in the right direction.  

          The  ordering  of data also needs to be considered.  Some domains will
     have one data entity per record, others will have more than one data entity
     per  record.   Consider two telephone directory records:  the first will be
     ordered in the familiar way.  


     01 FIRST_TELEPHONE.
         10 NAME.
             20 FIRST PIC X(12).
             20 MI PIC X.
             20 LAST PIC X(14).

Introduction to Datatrieve                                               Page  6
Application Design


         10 NUMBER PIC 9(7).
         10 LOCATION.
             20 BUILDING PIC X(12).
             20 FLOOR PIC 99.
     ;


     This  record definition has one entry per person, giving their location and
     telephone number, and is very  straight-forward,  but  consider  an  almost
     identical  application  where  several  persons might share a telephone, as
     happens in some offices.  


     01 SECOND_TELEPHONE.
         10 NUMBER PIC 9(7).
         10 LOCATION.
             20 BUILDING PIC X(12).
             20 FLOOR PIC 99.
             20 POINTS PIC X(6).
         10 NUMBER_INSTRUMENTS PIC 99.
         10 EXTENSIONS OCCURS 1 TO 10 TIMES DEPENDING ON NUMBER_INSTRUMENTS.
             20 NAME.
                 30 FIRST PIC X(12).
                 30 MI PIC X.
                 30 LAST PIC X(14).
                 30 DESK PIC 9(4).
     ;


     Which  of these two records is applicable depends on the final report:  the
     first one organizes data with the user being the most important  item,  and
     would  be well suited for printing telephone directories, or for an on-line
     inquiry system, or for people to find someone's telephone number  by  name;
     though  it  could  also be used to find telephones by building, or users by
     telephone  number,  etc.   The  second  definition  definitely  places  the
     telephone  first,  and then lists the users as subordinate items, and would
     be of greatest use to the person who has to maintain the telephone  system:
     reporting   this   domain  by  user  would  be  very  difficult.   If  both
     applications need to be met, than another definition  altogether  might  be
     needed.  


Introduction to Datatrieve                                               Page  7
Application Design


     01 USER_REC.
         10 NUMBER PIC 9(7).
         10 NAME.
             20 FIRST PIC X(12).
             20 MI PIC .
             20 LAST PIC X(14).
     ;

     01 TELEPHONE_REC.
         10 NUMBER PIC 9(7).
         10 LOCATION.
             20 BUILDING X(20).
             20 FLOOR PIC 99.
         10 CONNECTION.
             20 MDF PIC X(10).
             20 DESK PIC 9(4).
     ;


     These  two  domains would then be accessed separately when one contains the
     relevant information, or they could be joined  together  with  a  VIEW  (or
     JOIN) when information from both is needed at the same time.  

          One  other  factor to consider in the first phase of the design is how
     the information will change with  time.   In  the  case  of  the  telephone
     directory,  any  updates  will  probably  just  replace  the old data:  for
     example, if a person moves, then the record with that persons name will  be
     modified  to show the new telephone number.  This simple replacement is not
     the only possibility, however.  In the case of a domain which stores  sales
     receipts,  for  example, new data will be added for each sale, but old data
     may not be deleted.  Since this would eventually result in an immense file,
     there  may  have  to be some provision to remove old data:  this might be a
     separate procedure to simply delete records of more than a certain age,  or
     the old records may be moved to a separate historic file, or there may be a
     procedure where, when payment is received, then the corresponding  sale  is
     removed  from  the  sales  domain and the transaction recorded in a payment
     received domain, which may in turn be purged of  records  of  more  than  a
     given  age.   Alternatively,  the  records  may all have to be kept for one
     fiscal year for accounting purposes, in which case the size of the file may
     have  to  be  predicted,  to  allow  enough  data to be stored.  To prevent
     unnecessary work, and to avoid mistakes, it is better to develop procedures
     which will automatically transfer data, rather than having a person look up
     the record to be moved, move the data items by hand,  and  delete  the  old
     record.  

          Once  the  data  items have been determined, and the basic form of the
     record  has  been  chosen,  the  record  can  be  defined.   Normally,  the
     definition  follows directly from the data, but there are some data type to
     consider.  


Introduction to Datatrieve                                               Page  8
Application Design


          In  the  case of text, the choices are usually just to make the string
     long enough to hold the necessary data, and whether to use a T edit-string,
     but there are sometimes questions as to how the data should be broken down.
     In the examples given before, all names were defined liked this:  

         10 NAME.
             20 FIRST PIC X(12).
             20 MI PIC X.
             20 LAST PIC X(14).

     but they could just as easily be defined like this:  

     10 NAME PIC X(27).  

     The  only  difference  is in how the data can be retrieved:  in the case of
     the former, the names may be sorted in alphabetical order by last name.  In
     the latter case, this can be done if the names are entered in the form 

     Lederman, Bart Z.  

     but not if the data is in the form 

     Bart Z.  Lederman 

     and  if,  for  example,  the  data is to be used to prepare mailing labels,
     where the name should be in the normal order of first name first, and  also
     to  be  used  for  a  telephone  directory where it has to appear last name
     first,  then  the  first  definition  which  allows  access  to  the  names
     separately  is  the  one  to use.  If the name were just used to record the
     name on a sales slip and did not have to be sorted by last name,  then  the
     second definition is adequate, but consider the problem also of normalizing
     data.  If the first definition is used, then the names will be entered in a
     fixed  format:   the  first  names,  middle initials, and last names of all
     records will be in fixed positions, like this:  

     Burt         A Roseberry
     David          Saad
     Diana          Washburn
     Harold       T Glaser
     Doug           Morelly
     Cynthia        Mealy
     Bart         Z Lederman

     In  the  case  where  the  name is just one field, the data could look like
     that, but there is nothing to stop it from looking like this:  


Introduction to Datatrieve                                               Page  9
Application Design


     Burt A. Roseberry
     Saad, David
     Diana          Washburn
     Glaser, Harold T
     Morelly, Douglas (Doug)
     C Mealy
     B. Z. Lederman

     While  this  is  just  an  example  using  names, where order is not vital,
     consider a parts inventory  application,  were  different  people  will  be
     entering  data.   If you do not want 20 different people to enter the names
     of the parts inventoried in 20 different ways, you may want to define  some
     fixed  subdivisions, to force the data to be normalized.  You may also want
     validation tables to insure the same name is always used for a  given  part
     (or the equivalent in your application).  

     In  the case of numbers, again the size is usually known, and the choice is
     for a data type.  Display (PIC 9 without a qualifier) is easy to  use,  and
     can  be transferred to other languages, but is not efficient in space used.
     INTEGER or COMP takes up less space, and will always come out to be a whole
     number,  but  has  the  drawback  that  it will not always work with tables
     (particularly on PDP-11s).  Consider this application, where  you  want  to
     store  the department as a number (your departments may be numbered, or you
     might want to save storage space, or you may want to be able to  limit  the
     field to certain valid entries only), and then use a table to print out the
     department name:  

     01 : "Personnel",
     02 : "Employment",
     03 : "Accounting",
     04 : "Engineering",
     05 : "Shipping"
         .
         .
         .

     In  this  case the department number stored in the record should be DISPLAY
     (the default) and not INTEGER, even though the numbers  in  the  table  are
     integers,  as  the  table look up works on characters even if the left side
     looks like a number.  If the department names  are  put  in  a  domain  and
     linked  with a view, or are a domain table (VAX only), the the number could
     be INTEGER, but will probably not be an advantage if keys are used properly
     (covered later).  

          REAL  or  Floating Point numbers (COMP-1 and COMP-2) look inviting, as
     they can store a very wide range of numbers in  a  compact  space,  and  it
     isn't  always  necessary  to know the exact range when defining the record.
     They are useful for scientific data and some other  applications,  but  all
     floating  point  numbers  (and not just in Datatrieve, or on DEC computers,
     but everywhere) have the disadvantage of something called  rounding  error.
     The  result of this is that you can enter a whole number such as 12, and it

Introduction to Datatrieve                                              Page  10
Application Design


     may convert to 11.9999999:  to a scientist or  mathematician  this  may  be
     close  enough, but you will find if you are dealing with sums of money that
     some accountants don't think $11,999.99 is the same as  $12,000.00  and  if
     you  are  dealing  with inventory, your boss might like to know how you can
     have 11.9 widgets in stock.  For these applications, you may want to  avoid
     REAL numbers.  

          The  other  number  types  (COMP-3,  COMP-5,  and  COMP-6)  are, in my
     opinion,  only  needed  if  you  have  to  interface  to   existing   COBOL
     applications,  where  the  numbers have already been stored in one of these
     forms.  For new applications, they really have nothing to  recommend  them,
     as INTEGER is more compact and the math is faster, and you still have to go
     to DISPLAY to use tables.  Also,  there  is  an  advantage  if  DISPLAY  or
     INTEGER  is  used and the field can be a key.  Since COBOL can read DISPLAY
     and INTEGER types, there is no problem going back from Datatrieve to  COBOL
     should  it  be  necessary,  and  since  BASIC and FORTRAN often cannot read
     COMP-3, COMP-5 and COMP-6, there is a real handicap if these types are used
     and the data then has to be accessed by other programs.  

          The  data  type  that seems to trip people up most often is DATE.  The
     trick here is that DATE looks like a character string when  it  prints  out
     something  like  "13-Apr-84",  but  it  is  in  fact stored internally as a
     number.  Many problems with DATE can be avoided in the following manner:  

     rather than doing this:  

     PRINT domain WITH TRANSACTION_DATE = *."date of transaction" 

     it is better to to this:  

     DECLARE WHEN USAGE DATE.
     WHEN = *."date of transaction"
     PRINT domain WITH TRANSACTION_DATE = WHEN

     If  this  is  done,  then  the  user  can  enter  the date when prompted as
     "April 13, 1984"  or  "13/04/84"  or  "13-APR-84",  or  any   other   valid
     Datatrieve date format, and Datatrieve will convert it to a number and do a
     correct numeric comparison on the date stored in the domain:  this may  not
     happen  if it is prompted for as part of the RSE.  Datatrieve dates do have
     two potential drawbacks, however.  The  first  is  that  it  is  an  8-byte
     integer,  and  some other languages like FORTRAN can't handle them directly
     (though routines which convert dates have been  published  and  are  fairly
     easy  to  use),  and  the conversion from the numeric date in clunks to the
     displayable date is cumbersome at best.   If  transfer  of  data  to  other
     languages  or systems is important, than you may have to store your date as
     a character string, or perhaps a three numbers for  month,  day  and  year.
     Also,  (on  the  PDP-11 and PRO-350 at least) an 8-byte integer cannot be a
     key in a file, so the DATE cannot be keyed (more on keys later),  so  again
     the  date  might  be  stored  as  a character string or 3 numbers.  Another
     problem, but one which can be overcome, is that it is a bit  cumbersome  to
     look  up a particular month with the Datatrieve data type.  If you want all

Introduction to Datatrieve                                              Page  11
Application Design


     of the records for March, you have to do something like this:  

     DECLARE FIRST_DATE USAGE DATE.
     DECLARE LAST_DATE USAGE DATE.
     FIRST_DATE = 01-MAR-84
     LAST_DATE = 31-MAR-84
     PRINT domain WITH TRANSACTION_DATE BETWEEN FIRST_DATE, LAST_DATE

     What this won't solve is the case where the domain may cover several years,
     and you want all March records from any year  (to  compare  one  year  with
     another,  etc.).   In  this case also, you will have to store the date as a
     character string or separate numbers.  On the other  hand,  the  Datatrieve
     date  can do some things well:  as noted before, if prompted for separately
     it will accept the date in several different forms, and will also print  it
     out in many different formats simply by changing the edit string (including
     some nice extras, like the day of the week).  Times may also be  stored  as
     part  of  the  date,  though  retrieving  this  information  is still a bit
     cumbersome.  Another type of transaction the DATE  type  does  well  is  in
     computing  future  dates, such as the case where you want to find the bills
     due this week.  

     DECLARE FIRST_DATE USAGE DATE.
     FIRST_DATE = "TODAY"
     DECLARE ONE_DAY USAGE COMP PIC 9(18).
     ONE_DAY = 864000000000
     DECLARE LAST_DATE USAGE DATE.
     LAST_DATE = FIRST_DATE + ( 7 * ONE_DAY)
     PRINT domain WITH DUE_DATE BETWEEN FIRST_DATE, LAST_DATE

     A  procedure like this will always pick up today's date, and add seven days
     to it without user intervention, and will be much easier to use than if the
     date  was  stored as a character string or as separate year, month, and day
     values as you would have to do a lot of  math  to  check  for  weeks  which
     overlapped  the  start of a new month, check for leap years, and other such
     special conditions.  By the way:  if you want to find all of the bills  due
     last week, do not subtract 7 times ONE_DAY:  make ONE_DAY negative, and add
     negative 7 days thus:  

     FIRST_DATE = LAST_DATE + ( -7 * ONE_DAY) 

     Datatrieve  sometimes has trouble subtracting dates, but will add correctly
     even if one  number  is  negative.   Also  remember  that  with  a  BETWEEN
     qualifier,  the  first value must be less than the second value to retrieve
     the data properly.  

          Once  the data types have been defined, there is still the question of
     what to call each field.  This is not always as easy as it may  seem.   For
     the  sake  of  consistency, some organizations might insist on this type of
     definition 


Introduction to Datatrieve                                              Page  12
Application Design


     01 INVENTORY_REC.
         10 INVENTORY_PART.
             20 INVENTORY_PART_NAME PIC X(20)
                 QUERY_HEADER "Part"
                 QUERY_NAME "PART".
             20 INVENTORY_PART_QUANTITY PIC 9(5)
                 EDIT_STRING ZZ,ZZ9
                 QUERY_HEADER "Quantity"
                 QUERY_NAME "NUMBER".
         .
         .
         .
     ;

     I  don't  want  anyone  to  think  that this is wrong, as it does certainly
     organize the  data  well,  but  in  some  applications  it  may  have  some
     drawbacks.   First:   if  this  domain is only used for printed reports the
     long names are no problem, but if it used  interactively,  there  would  be
     problems  as  the  names are too long to type in comfortably, and the query
     names don't always match the  query  headers.   Second:   the  extra  query
     headers  and  names take up working space and take longer to search though,
     and require the computer to do more work.  (On the PDP-11 or PRO-350,  they
     may   even  take  up  enough  extra  pool  to  make  the  application  fail
     completely.) My approach in this application would  be  to  call  the  part
     "PART" and the quantity "QUANTITY" or "NUMBER", and let the record name and
     group header identify them as being relevant to inventory.   Just  to  give
     both sides, consider this:  

     01 INVENTORY_RECORD.
         10 PART PIC X(20).
         10 STOCK.
             20 QUANTITY PIC 9(5) EDIT_STRING ZZ,ZZ9.
             20 LOCATION PIC 999.
         10 BACK_ORDER.
             20 QUANTITY PIC 9(5) EDIT_STRING ZZ,ZZ9.
             20 SOURCE PIC 99.
         .
         .
         .
     ;

     It  is  possible  to  get  to  both  quantities  by  using  the group names
     (STOCK.QUANTITY and BACK_ORDER.QUANTITY), but since the  group  name  would
     always  have  to be used to get to the back order quantity, why not call it
     BACK_QUANTITY in the first place.  

          A   related   matter   which  will  never  be  settled  to  everyone's
     satisfaction is how much to abbreviate names.  An abbreviated name is handy
     if  you  are using the data interactively and don't want to type a lot, but
     if it is too short it won't adequately describe the data.  For example, the
     inventory record could be reduced to:  

Introduction to Datatrieve                                              Page  13
Application Design


     01 INVENTORY_REC.
         10 IN PIC X(20) QUERY_HEADER "Part".
         10 IQ PIC 9(5) EDIT_STRING ZZ,ZZ9 QUERY_HEADER "Quantity".
         .
         .
     ;

     While  someone very familiar with the application, and who knows that IN is
     the inventory number, could type in the field names  quickly,  anyone  else
     using the data would have a difficult time trying to figure out what any of
     the fields actually are.  One way to test your field names is to write them
     out  on a piece of paper and then give it to a person who knows the general
     area of the application but not your design (and doesn't have to know about
     computers  or  Datatrieve)  and  ask them to identify what each name stands
     for:  if they cannot do it  reading  directly  from  the  paper,  you  have
     probably  made  your abbreviations too short.  Remember also that the names
     serve to document the data items, and if you have to go back and work  with
     the  domain after a six month absence, you may then wish you made the names
     a little more descriptive.  

          Once the record is finally defined, then its time to define the domain
     (which simply ties together the names of the record, domain and file),  and
     define  the  file.   The  big  choice in defining the file is the choice of
     keys, if any.  A KEYED (synonym INDEXED) file is one which, in addition  to
     storing the data, also stores an index which provides a short-cut to access
     the data.  An index is very much like the index of  a  book:   rather  than
     having to search each page to find an item of interest, the index tells you
     which page to go to directly to find that item.  Also, like an index  which
     is  in  alphabetical order to make finding your item easier, the file index
     is kept in order (alphabetical or numeric) to make searches faster.  Unlike
     a  book  index, a file index generally has an entry for every record in the
     file, so it is a bit larger in proportion to the data than  a  book  index,
     and it may be configured to index more than one field in the record.  There
     are basically four reasons for using a key:  

          1.  Faster access to the data.  

              If you will be retrieving certain data items, and the criteria for
              retrieval is the contents of a  particular  field,  you  can  make
              retrieval  faster by making that field a key field:  that way, the
              system can search the index rather than the entire  file  to  find
              the data.  Note that this works only for some types of retrievals,
              notably EQUALS (numeric and alphabetic), and over a  range.   Note
              particularly  that  CONTAINING  is  NEVER  keyed:   it  is  always
              necessary to read through the entire domain.  For example, in  the
              telephone lists previously described, if you are going to retrieve
              entries with the command NAME = "---"  a  key  will  help,  but  a
              retrieval  by NAME CONTAINING "---" will not be helped with a key.
              Keys are especially important if the domain will be part of a VIEW
              or  a CROSS, but less important in domains which use a single file

Introduction to Datatrieve                                              Page  14
Application Design


              and are nearly always accessed as a whole.   Using  the  telephone
              file  again  as  an  example:   if  it  was primarily to act as an
              on-line inquiry system, having the most accessed field  as  a  key
              would  greatly  increase  the speed of information retrieval.  If,
              however, it was used mostly to print a telephone directory,  where
              the  entire  file  (domain)  is  printed  at once, a key might not
              increase the speed of  the  application  any  (but  see  the  next
              paragraph).  

          2.  Imposing order on the data.  

              The  data  in an indexed file is ALWAYS sorted by the primary key,
              and if you access the  data  without  saying  SORTED BY,  it  will
              automatically  come back sorted by the primary key.  Since sorting
              always takes time, if  you  know  the  domain  will  primarily  be
              reported  in  a certain order, you can make that field the primary
              key, and save a lot of time by not having  to  re-sort  the  data.
              The  more items there are in the file, the more time you will save
              by not re-sorting them.  In the case of the  telephone  directory:
              having   the   name   as   a   key  would  keep  the  file  sorted
              alphabetically, so it would take less time to generate a telephone
              directory ordered by name.  

          3.  Preventing duplicates.  

              There  is  an  optional  qualifier  on  all  keys which permits or
              prevents duplicates (two or more records with the same key).  This
              has  such  uses  as preventing the telephone directory from having
              two  or  more  people  with  the  same  telephone  number,  or  on
              applications  such  as employment records, you may want to prevent
              the accidental entry of two employees with the same identification
              number.   By  making this number a key (primary or alternate), and
              specifying NO DUP, it will be impossible for  anyone  to  enter  a
              duplicate number, even deliberately:  RMS simply won't permit it. 

          4.  Preventing changes.  

              There  is  another attribute to permit or prevent a keyed field to
              be changed (using the MODIFY command).  This can be handy for many
              applications:    consider  a  domain  which  keeps  track  of  job
              assignments, where the original due date is stored by the manager,
              and  the  person  assigned  the  job  will  update the record with
              relevant information.  This means the user must have MODIFY access
              to  the  domain,  but  this  will  give  them modify access to all
              fields, and you may not want them to change the due date.  If  the
              due  date  is made a key (primary or alternate) with the NO CHANGE
              attribute, the user will then be able to modify  other  fields  to
              update  them,  but  not  the  due  date:   no  matter  what access
              Datatrieve or CDD allows, RMS will not permit it.  Two  things  to
              remember:   first, the primary key can NEVER be changed.  If it is
              wrong, you have to delete the record, and store it all over  again

Introduction to Datatrieve                                              Page  15
Application Design


              with the correct key.  This is an inherent factor in the operation
              of RMS, and cannot be altered.  The other thing is that  NO CHANGE
              on  a  key does not mean no delete, so if you don't want your user
              simply to delete the record and put in another, give  them  modify
              access but not erase access with Datatrieve or the CDD.  


          With these factors in mind, the appropriate fields can be selected for
     keying.  For example, the telephone records shown  earlier  would  probably
     have  the  telephone  number  as  the  key,  as the data will most often be
     retrieved by saying NUMBER = ----- or some  similar  command.   Keying  the
     name    might    not    help    if    retrievals    are    often   of   the
     NAME CONTAINING "--------" type, but may  keep  the  file  sorted  in  name
     order,  which  would be helpful if it is most often reported in name order.
     In the last telephone  example,  where  the  data  was  split  between  two
     domains,  the  number  should  definitely be the key for both files, as the
     field which joins domains together in a VIEW  or  CROSS  should  ALWAYS  be
     keyed for fast access.  

          Keep  in  mind  that  keying  a  field  means  that more space will be
     occupied by that file (on disk) than if it were not keyed, so  do  not  key
     everything:   you  should show some discrimination, and key only the fields
     which will result in a benefit to your application.  Occasionally, this may
     mean  bringing  the application up with few fields keyed, and then using it
     for a while to see which fields are most  often  accessed,  and  which  are
     accessed  in  a  way  in  which  keys would be beneficial.  The various RMS
     utilities can be used to move the data from one file  to  another  with  an
     updated  set of keys to provide faster access to the data:  you do not have
     to make any changes in Datatrieve when a field changes from  not  keyed  to
     keyed,  and vice versa, provided you do not change the actual layout of the
     record:  Datatrieve is "told" by RMS which fields are keyed when the domain
     is  readied.  (If you change the name of the file, you will have to correct
     the domain definition.) 

          Unless  this  is  an  application where the data file has already been
     created elsewhere for you (in which case data and the selection of keys has
     already been done), you must now create a file into which the data will go.
     The easiest way is to use the Datatrieve  DEFINE FILE  command.   The  file
     created  by  this  command  may  not  be  the  best possible file for every
     application, but it will work, and should at least be a  reasonable  design
     if  you  have done a decent job of selecting record organization and choice
     of keys.  Use of the keywords available in Datatrieve can help,  especially
     the  ALLOCATION keyword.  If you can make a reasonable guess at the size of
     the file, ALLOCATE space for it  when  you  define  it.   This  most  often
     happens  when  you  are creating a new domain from an old one, and know the
     size of the old file.  If you have some experience  with  RMS  and  a  good
     knowledge  of  the  application, you may be able to increase performance by
     defining a different file structure with one of  the  RMS  utilities  (DES,
     DEF,  EDIT/FSL,  etc.),  but even in this case, it is best to define a file
     with Datatrieve first, as this will get the position and type of all fields
     correct,  and  then modify it where a change is indicated.  If you have not

Introduction to Datatrieve                                              Page  16
Application Design


     learned much about RMS, use the file Datatrieve  gives  you:   it  will  be
     better  than  the  file a person defines without a fairly good knowledge of
     file design and the application involved.  The one exception is if you  put
     your  data  into  the file with Datatrieve first:  once the file is filled,
     the various utilities can measure how much space will be  needed  to  store
     your  data  and  how  the keys are distributed, and can do a fair amount of
     optimization when creating a new file to hold that data without your having
     to  know  a lot about file optimization.  One minor point:  when Datatrieve
     creates the file, it does not store the  name  of  the  field  in  the  key
     descriptor  (at  least on PDP-11 and PRO-350), so you might want to use one
     of the file utilities just to go in and put the names of the  corresponding
     fields  on  the  keys.  That way, if someone looks at the file later (with,
     for example, DES or DISPLAY/ATTRIBUTES) it will be easier to determine what
     is in the file, and to match the file with the corresponding application.  

          Now  that  the record, domain and file are defined, it is necessary to
     load in or modify the data.  You can teach everyone  basic  Datatrieve  and
     let  them  use  the STORE or MODIFY commands, but it is often better to use
     pre-defined routines.  First, a general rule to remember is  the  less  the
     user  has to do, the fewer mistakes will be made.  Second, the users may be
     in positions where they don't have time to learn the computer, or  you  may
     not  want  them to learn to use the computer:  if your sales staff is going
     to enter sales receipts, you want them to spend as much  time  as  possible
     selling,  and  as little time as possible doing computer and paper work.  A
     pre-defined procedure will help everyone, and may also help keep  the  data
     in  a  clean  form  (like  the  jumbled  names  demonstrated earlier).  The
     procedure may be very simple:  


Introduction to Datatrieve                                              Page  17
Application Design


     DEFINE PROCEDURE STORE_SALES
     READY SALES WRITE
     DECLARE SALE_DATE USAGE DATE.
     DECLARE YOUR_CODE PIC 99.
     DECLARE ANOTHER PIC X.
     DECLARE TOTAL_SALE PIC 9(5)V99 EDIT_STRING $$,$$$.Z9.
     DECLARE TEMP_COST PIC 9(3)V99 EDIT_STRING $$$.Z9.
     DECLARE TEMP_ITEM PIC XX.
     DECLARE CORRECT PIC X.
     DECLARE ALWAYS PIC X.
     ALWAYS = "Y"
     WHILE ALWAYS EQ "Y" BEGIN
          YOUR_CODE = 0
          WHILE YOUR_CODE NOT IN SALESMEN_TABLE BEGIN
               YOUR_CODE = *."your code number"
          END
          SALE_DATE = "TODAY"
          TOTAL_SALE = 0
          ANOTHER = "Y"
          WHILE ANOTHER EQ "Y", "y" BEGIN
               TEMP_ITEM = *.ITEM
               TEMP_COST = 0
               CORRECT = "N"
               WHILE CORRECT NE "Y" BEGIN
                    TEMP_COST = *.COST
                    IF (TEMP_COST BETWEEN 0.01 AND 99.99) THEN CORRECT = "Y"
                    IF CORRECT NE "Y" THEN PRINT "Please re-enter cost."
               END
               STORE SALES USING BEGIN
                    SALESMEN = YOUR_CODE
                    ITEM = TEMP_ITEM
                    COST = TEMP_COST
               END
               TOTAL_SALE = TOTAL_SALE + TEMP_COST
               ANOTHER = *."Y for more items"
          END
          PRINT "Total for this sale was ", SPACE 1, TOTAL_SALE
     END
     END_PROCEDURE

       salesmen_table

     01 : "J. Smith",
     02 : "B. Brown",
     03 : "T. Jones"
     END_TABLE

     I  am  not  claiming this is an ideal procedure:  it is just to demonstrate
     some possible features.  The first WHILE is used to keep this one procedure
     on  the  screen,  and  would be used if the application were started in the
     morning and kept running on a terminal all day.  The purpose  of  YOUR_CODE

Introduction to Datatrieve                                              Page  18
Application Design


     is  to  allow  the  user  to enter a number and have it validated against a
     table BEFORE storing it in the record (in this case, there must not  be  an
     entry  for  0  in  the table, or an ELSE at the end, or every entry will be
     valid).  If a VALID_IF clause were put in the record definition,  it  would
     abort  the transaction AFTER all of the data is typed in, which would annoy
     the users, and waste time.  Similarly, TEMP_COST is also used to check  for
     valid  data  before  storing  (and  to calculate TOTAL_COST), and TEMP_ITEM
     could also be validated against  a  table  or  domain.   The  SALE_DATE  is
     automatically  set  to TODAY, which saves the user from having to type in a
     field which can be obtained automatically, and which eliminates a  possible
     source of user error.  The last loop is used to store each individual item,
     and keep track of the total (this implies an OCCURS clause  in  the  record
     definition   with   ITEM  and  COST  in  the  variable  part).   Additional
     enhancements could be for the procedure to calculate sales tax,  or  modify
     the  inventory  data  base  to reflect the items sold, or the total cash on
     hand in the store, and so on.  There is obviously a  good  deal  more  that
     this procedure could do:  its purpose is to be an short example of the type
     of validation processing which can be done to insure that correct  data  is
     entered,  and  to  give  the  user some useful messages when something goes
     wrong.  

          Once  the  domain  has  been created and populated, the data is almost
     always desired in one or more reports.  If the goal of the application  was
     well  defined  at  the  beginning, producing the report should follow quite
     easily from the record layout.  Just as with the selection of  data  items,
     the report layout may come from several sources:  

          1.  Blank Forms.  

              If  you  are  converting  a paper operation to Datatrieve, you may
              want your new form to match  the  old  form  (you  might  even  be
              putting  pre-printed  forms into your hard copy printer).  In this
              case, you can just copy the layout of  the  form  in  your  report
              procedure.  

          2.  Existing Reports.  

              If  you are developing an application to match an existing report,
              it can serve as the template for your report procedure.  

          3.  Other Documents.  

              In  this  case, the format of the data might be less well defined,
              but  may  still  serve  as  a  template.   For  example,  if  your
              application  is  based on a document which has the department name
              in the upper left corner, then the sales for the day by  salesman,
              then  the  sales  for the month at the bottom of the document, you
              may wish to follow this general format in your report so the users
              will have less of a difference to adjust to when converting.  


Introduction to Datatrieve                                              Page  19
Application Design


          4.  Existing data.  

              This will probably also fall under the category of "Hard Work", as
              in most cases the report desired may not be defined, though it may
              be that the reason the application was designed in the first place
              was to derive certain information from an existing data base.  

          5.  Hard Work.  

              It may be that the report will have to be designed from scratch.  

     The  subject  of  actually  using  the report writer is one that requires a
     dedicated session, and will not be further discussed here.