Introduction to Datatrieve Application Design Outline 1. System Analysis / Decision Phase 1. Initial considerations 2. Record Layout 3. Data Requirements (save records, delete old records, move old records to history file, etc.) 4. Report Requirements 2. System Foundations 1. Define Record Description 2. Define File (Datatrieve, EDIT/FSL, RMSDEF) 3. Create Update Routines 4. Create Reports Introduction to Datatrieve Page 2 Application Design The degree of success with which any application is implemented with Datatrieve (or any other product) depends to a large extent upon the planning which goes into the implementation. This session will cover some of the steps involved with emphasis on their relationship to Datatrieve. Because every application is different, and because some decisions may depend upon subjective judgments, or company standards, or even personal preference, it is difficult to give more than general guidelines, and some examples showing the relative merits of some different options. Before anything is actually done in Datatrieve, some time should be given to planning the application design. In all cases, the designer must consider both the information source and the final result together, to produce the best design. If there is any fixed rule which can be stated for all applications, it is this: that the end result should always be kept in view during all stages of application development. It is all too easy to lose track of the original application and the persons for whom it was supposed to be designed. Some common pitfalls are: 1. The designer assumed that the user would be familiar with certain data, or procedures, or other background information. Often the user knows less than the designer has assumed would be known; this can be such simple things as assuming that the user will know (without being reminded) that department codes are always 4 characters long, or that his department's purchase orders always begin with the letter "W" and have a hyphen after the 3rd digit, etc.; but it can also involve such basic functions of the computer system, such as not realizing that when you erase a record it is gone forever (some people might assume it automatically goes into some sort of backup or recovery area) 2. The designer puts in extra features because they were "easy to add", or "look nice". Sometimes this leads to an application which is too complicated for the intended user to understand. If the application is in a new area or for users who are new to computing, it may be better to do only the most simple application first, and wait to add the additional features after the users have adjusted to the system, even if this means more application work. (Of course, it usually doesn't hurt to do some advance work, such as including fields in the record definition which you know will be needed in the future: they can be "hidden" with a redefines until needed). 3. The application makes no provision for occasions where a data item is missing or incorrect. Good designs should make provisions for exception handling, correction of data on entry, validation of data on entry, or some combination of the three. The classic example is not making any provision for a customer who overpays a bill or invoice: other Introduction to Datatrieve Page 3 Application Design examples might be such things as looking up purchase orders on a partial order number when the user has lost or torn the invoice and does not have a complete number, or being able to enter a telephone number without an area code when someone forgot to ask for it or write it down, and be able to add it later. Similarly, if the application takes data from one file (or domain) and uses it to access data in another file (domain), there needs to be some way to escape the procedure gracefully if the first file contains a data item which is incorrect. 4. The application has nothing to do with the needs of the user. Sometimes the application was incorrectly described; sometimes the user doesn't know what can be done and so asks for what he thinks is what he wants; sometimes the person implementing the application doesn't understand it (or perceives it to be something it is not); and sometimes the designer just goes off on a tangent and comes up with the wrong application altogether. This is best avoided by having the designer and the user work together through the application, and not just for a brief meeting to describe the application at the beginning, and complain about it when it is finished. One of the great advantages of Datatrieve is that it is simple enough to learn that the users can develop their own applications, and this is one of the best ways to insure the application fits the needs of the user. Once the purpose of the application is clear, the next step is to define all of the items of data which will be required by the application, so that the record can be defined. Some possible sources are: 1. Blank Forms. If an existing procedure using printed paper forms is to be moved to Datatrieve, the form will have the desired data items marked on it. 2. Existing Reports. There may be existing reports produced by hand or other means, which are to be produced by Datatrieve. The data items in the report will indicate the data required in the application. 3. Other Documents. You may be required to supply documents to other personnel, accounting departments, or to government agencies: you can examine past documents to determine what information is involved, and base your design on that. This might include inventory reports, employment records, sales receipts, tax records, etc. 4. Copy an existing application. Very often a new application is similar to an existing one, and only needs the addition of a few Introduction to Datatrieve Page 4 Application Design new data items, or the re-naming of some data items, and this can be much faster than designing an application from scratch. 5. The data already exists. There are situations where the data has already been placed in a file (by other programs written in FORTRAN, BASIC, COBOL, etc., or by transfer from another system) and it is now desired to access it with Datatrieve to obtain new reports, to do on-line inquiry, etc. In this case, you have to create a record definition which matches the layout of the existing record: this layout has to be obtained from whomever set up the existing data, or is responsible for maintaining it. You may still have choices to make on data names, redefines fields, and on procedures and report formatting, however. 6. Hard Work. It may be that an entirely new application is to be designed, or a hand operation for which there is no current document defining the operation is to be moved to Datatrieve. In this case, it may be necessary to follow the procedure (or follow a person who is going through the procedure) and write down what the steps are, to determine the information used. This must generally be done more than once, to be certain of catching all the possible variations, and do not neglect the exception cases. Each data item should be examined to determine if it is really needed, that it isn't duplicated elsewhere, and that it is of the proper type (for example, that fields holding names and addresses will be long enough). Now is also the time to determine if there is any other data item which should be added: sometimes an existing application may have data items missing because no-one has gotten around to having new forms printed, or there has been a recent change in procedure, and a good time to add these data items is when the application is developed. Once the data items have been determined, a record layout can be made. The primary goal here is to group related items together, and label them in a meaningful way. Consider the following two record definitions. 01 FIRST_RECORD. 10 NAME. 20 FIRST PIC X(12). 20 MI PIC X. 20 LAST PIC X(14). 10 HOME_ADDRESS. 20 STREET PIC X(20). 20 CITY PIC X(12). 20 STATE PIC XX. 20 ZIP PIC 9(5). . . . ; Introduction to Datatrieve Page 5 Application Design 01 SECOND_RECORD. 10 FIRST PIC X(12). 10 ZIP PIC 9(5). 10 LAST PIC X(14). 10 CITY PIC X(12). 10 STREET PIC X(20). 10 MI PIC X. 10 STATE PIC XX. . . . ; I hope most people would realize that the first definition is better organized than the second, with related data items places together. The group headings (NAME, ADDRESS) are not essential, but do serve to organize the data, and to make it easier to retrieve related items together: for example, on a report one could say PRINT ADDRESS rather than PRINT STREET, CITY, STATE, ZIP (though you can still retrieve individual fields when you wish). They also help to document the use of the data items, which is very useful when the application requires modification, or new personnel must familiarize themselves with existing applications. It is also useful to consider the final report while the record definition is being made: if one particular report is used more than others, then it will be helpful to arrange the fields in the record in the same order they will appear in the report, so that one can say PRINT record rather than having to explicitly state all fields to be printed. If there are multiple groups in a record, one can at least arrange the items within a group in the order they will most often appear, and then specify the group names as they are needed. Even if more than one report is needed, there will usually be groups of fields which are often associated (such as the fields which make up an address), and so should be grouped together. This point can not be overstated: whenever any application is designed, one should always keep the final result in mind in order to keep the design heading in the right direction. The ordering of data also needs to be considered. Some domains will have one data entity per record, others will have more than one data entity per record. Consider two telephone directory records: the first will be ordered in the familiar way. 01 FIRST_TELEPHONE. 10 NAME. 20 FIRST PIC X(12). 20 MI PIC X. 20 LAST PIC X(14). Introduction to Datatrieve Page 6 Application Design 10 NUMBER PIC 9(7). 10 LOCATION. 20 BUILDING PIC X(12). 20 FLOOR PIC 99. ; This record definition has one entry per person, giving their location and telephone number, and is very straight-forward, but consider an almost identical application where several persons might share a telephone, as happens in some offices. 01 SECOND_TELEPHONE. 10 NUMBER PIC 9(7). 10 LOCATION. 20 BUILDING PIC X(12). 20 FLOOR PIC 99. 20 POINTS PIC X(6). 10 NUMBER_INSTRUMENTS PIC 99. 10 EXTENSIONS OCCURS 1 TO 10 TIMES DEPENDING ON NUMBER_INSTRUMENTS. 20 NAME. 30 FIRST PIC X(12). 30 MI PIC X. 30 LAST PIC X(14). 30 DESK PIC 9(4). ; Which of these two records is applicable depends on the final report: the first one organizes data with the user being the most important item, and would be well suited for printing telephone directories, or for an on-line inquiry system, or for people to find someone's telephone number by name; though it could also be used to find telephones by building, or users by telephone number, etc. The second definition definitely places the telephone first, and then lists the users as subordinate items, and would be of greatest use to the person who has to maintain the telephone system: reporting this domain by user would be very difficult. If both applications need to be met, than another definition altogether might be needed. Introduction to Datatrieve Page 7 Application Design 01 USER_REC. 10 NUMBER PIC 9(7). 10 NAME. 20 FIRST PIC X(12). 20 MI PIC . 20 LAST PIC X(14). ; 01 TELEPHONE_REC. 10 NUMBER PIC 9(7). 10 LOCATION. 20 BUILDING X(20). 20 FLOOR PIC 99. 10 CONNECTION. 20 MDF PIC X(10). 20 DESK PIC 9(4). ; These two domains would then be accessed separately when one contains the relevant information, or they could be joined together with a VIEW (or JOIN) when information from both is needed at the same time. One other factor to consider in the first phase of the design is how the information will change with time. In the case of the telephone directory, any updates will probably just replace the old data: for example, if a person moves, then the record with that persons name will be modified to show the new telephone number. This simple replacement is not the only possibility, however. In the case of a domain which stores sales receipts, for example, new data will be added for each sale, but old data may not be deleted. Since this would eventually result in an immense file, there may have to be some provision to remove old data: this might be a separate procedure to simply delete records of more than a certain age, or the old records may be moved to a separate historic file, or there may be a procedure where, when payment is received, then the corresponding sale is removed from the sales domain and the transaction recorded in a payment received domain, which may in turn be purged of records of more than a given age. Alternatively, the records may all have to be kept for one fiscal year for accounting purposes, in which case the size of the file may have to be predicted, to allow enough data to be stored. To prevent unnecessary work, and to avoid mistakes, it is better to develop procedures which will automatically transfer data, rather than having a person look up the record to be moved, move the data items by hand, and delete the old record. Once the data items have been determined, and the basic form of the record has been chosen, the record can be defined. Normally, the definition follows directly from the data, but there are some data type to consider. Introduction to Datatrieve Page 8 Application Design In the case of text, the choices are usually just to make the string long enough to hold the necessary data, and whether to use a T edit-string, but there are sometimes questions as to how the data should be broken down. In the examples given before, all names were defined liked this: 10 NAME. 20 FIRST PIC X(12). 20 MI PIC X. 20 LAST PIC X(14). but they could just as easily be defined like this: 10 NAME PIC X(27). The only difference is in how the data can be retrieved: in the case of the former, the names may be sorted in alphabetical order by last name. In the latter case, this can be done if the names are entered in the form Lederman, Bart Z. but not if the data is in the form Bart Z. Lederman and if, for example, the data is to be used to prepare mailing labels, where the name should be in the normal order of first name first, and also to be used for a telephone directory where it has to appear last name first, then the first definition which allows access to the names separately is the one to use. If the name were just used to record the name on a sales slip and did not have to be sorted by last name, then the second definition is adequate, but consider the problem also of normalizing data. If the first definition is used, then the names will be entered in a fixed format: the first names, middle initials, and last names of all records will be in fixed positions, like this: Burt A Roseberry David Saad Diana Washburn Harold T Glaser Doug Morelly Cynthia Mealy Bart Z Lederman In the case where the name is just one field, the data could look like that, but there is nothing to stop it from looking like this: Introduction to Datatrieve Page 9 Application Design Burt A. Roseberry Saad, David Diana Washburn Glaser, Harold T Morelly, Douglas (Doug) C Mealy B. Z. Lederman While this is just an example using names, where order is not vital, consider a parts inventory application, were different people will be entering data. If you do not want 20 different people to enter the names of the parts inventoried in 20 different ways, you may want to define some fixed subdivisions, to force the data to be normalized. You may also want validation tables to insure the same name is always used for a given part (or the equivalent in your application). In the case of numbers, again the size is usually known, and the choice is for a data type. Display (PIC 9 without a qualifier) is easy to use, and can be transferred to other languages, but is not efficient in space used. INTEGER or COMP takes up less space, and will always come out to be a whole number, but has the drawback that it will not always work with tables (particularly on PDP-11s). Consider this application, where you want to store the department as a number (your departments may be numbered, or you might want to save storage space, or you may want to be able to limit the field to certain valid entries only), and then use a table to print out the department name: 01 : "Personnel", 02 : "Employment", 03 : "Accounting", 04 : "Engineering", 05 : "Shipping" . . . In this case the department number stored in the record should be DISPLAY (the default) and not INTEGER, even though the numbers in the table are integers, as the table look up works on characters even if the left side looks like a number. If the department names are put in a domain and linked with a view, or are a domain table (VAX only), the the number could be INTEGER, but will probably not be an advantage if keys are used properly (covered later). REAL or Floating Point numbers (COMP-1 and COMP-2) look inviting, as they can store a very wide range of numbers in a compact space, and it isn't always necessary to know the exact range when defining the record. They are useful for scientific data and some other applications, but all floating point numbers (and not just in Datatrieve, or on DEC computers, but everywhere) have the disadvantage of something called rounding error. The result of this is that you can enter a whole number such as 12, and it Introduction to Datatrieve Page 10 Application Design may convert to 11.9999999: to a scientist or mathematician this may be close enough, but you will find if you are dealing with sums of money that some accountants don't think $11,999.99 is the same as $12,000.00 and if you are dealing with inventory, your boss might like to know how you can have 11.9 widgets in stock. For these applications, you may want to avoid REAL numbers. The other number types (COMP-3, COMP-5, and COMP-6) are, in my opinion, only needed if you have to interface to existing COBOL applications, where the numbers have already been stored in one of these forms. For new applications, they really have nothing to recommend them, as INTEGER is more compact and the math is faster, and you still have to go to DISPLAY to use tables. Also, there is an advantage if DISPLAY or INTEGER is used and the field can be a key. Since COBOL can read DISPLAY and INTEGER types, there is no problem going back from Datatrieve to COBOL should it be necessary, and since BASIC and FORTRAN often cannot read COMP-3, COMP-5 and COMP-6, there is a real handicap if these types are used and the data then has to be accessed by other programs. The data type that seems to trip people up most often is DATE. The trick here is that DATE looks like a character string when it prints out something like "13-Apr-84", but it is in fact stored internally as a number. Many problems with DATE can be avoided in the following manner: rather than doing this: PRINT domain WITH TRANSACTION_DATE = *."date of transaction" it is better to to this: DECLARE WHEN USAGE DATE. WHEN = *."date of transaction" PRINT domain WITH TRANSACTION_DATE = WHEN If this is done, then the user can enter the date when prompted as "April 13, 1984" or "13/04/84" or "13-APR-84", or any other valid Datatrieve date format, and Datatrieve will convert it to a number and do a correct numeric comparison on the date stored in the domain: this may not happen if it is prompted for as part of the RSE. Datatrieve dates do have two potential drawbacks, however. The first is that it is an 8-byte integer, and some other languages like FORTRAN can't handle them directly (though routines which convert dates have been published and are fairly easy to use), and the conversion from the numeric date in clunks to the displayable date is cumbersome at best. If transfer of data to other languages or systems is important, than you may have to store your date as a character string, or perhaps a three numbers for month, day and year. Also, (on the PDP-11 and PRO-350 at least) an 8-byte integer cannot be a key in a file, so the DATE cannot be keyed (more on keys later), so again the date might be stored as a character string or 3 numbers. Another problem, but one which can be overcome, is that it is a bit cumbersome to look up a particular month with the Datatrieve data type. If you want all Introduction to Datatrieve Page 11 Application Design of the records for March, you have to do something like this: DECLARE FIRST_DATE USAGE DATE. DECLARE LAST_DATE USAGE DATE. FIRST_DATE = 01-MAR-84 LAST_DATE = 31-MAR-84 PRINT domain WITH TRANSACTION_DATE BETWEEN FIRST_DATE, LAST_DATE What this won't solve is the case where the domain may cover several years, and you want all March records from any year (to compare one year with another, etc.). In this case also, you will have to store the date as a character string or separate numbers. On the other hand, the Datatrieve date can do some things well: as noted before, if prompted for separately it will accept the date in several different forms, and will also print it out in many different formats simply by changing the edit string (including some nice extras, like the day of the week). Times may also be stored as part of the date, though retrieving this information is still a bit cumbersome. Another type of transaction the DATE type does well is in computing future dates, such as the case where you want to find the bills due this week. DECLARE FIRST_DATE USAGE DATE. FIRST_DATE = "TODAY" DECLARE ONE_DAY USAGE COMP PIC 9(18). ONE_DAY = 864000000000 DECLARE LAST_DATE USAGE DATE. LAST_DATE = FIRST_DATE + ( 7 * ONE_DAY) PRINT domain WITH DUE_DATE BETWEEN FIRST_DATE, LAST_DATE A procedure like this will always pick up today's date, and add seven days to it without user intervention, and will be much easier to use than if the date was stored as a character string or as separate year, month, and day values as you would have to do a lot of math to check for weeks which overlapped the start of a new month, check for leap years, and other such special conditions. By the way: if you want to find all of the bills due last week, do not subtract 7 times ONE_DAY: make ONE_DAY negative, and add negative 7 days thus: FIRST_DATE = LAST_DATE + ( -7 * ONE_DAY) Datatrieve sometimes has trouble subtracting dates, but will add correctly even if one number is negative. Also remember that with a BETWEEN qualifier, the first value must be less than the second value to retrieve the data properly. Once the data types have been defined, there is still the question of what to call each field. This is not always as easy as it may seem. For the sake of consistency, some organizations might insist on this type of definition Introduction to Datatrieve Page 12 Application Design 01 INVENTORY_REC. 10 INVENTORY_PART. 20 INVENTORY_PART_NAME PIC X(20) QUERY_HEADER "Part" QUERY_NAME "PART". 20 INVENTORY_PART_QUANTITY PIC 9(5) EDIT_STRING ZZ,ZZ9 QUERY_HEADER "Quantity" QUERY_NAME "NUMBER". . . . ; I don't want anyone to think that this is wrong, as it does certainly organize the data well, but in some applications it may have some drawbacks. First: if this domain is only used for printed reports the long names are no problem, but if it used interactively, there would be problems as the names are too long to type in comfortably, and the query names don't always match the query headers. Second: the extra query headers and names take up working space and take longer to search though, and require the computer to do more work. (On the PDP-11 or PRO-350, they may even take up enough extra pool to make the application fail completely.) My approach in this application would be to call the part "PART" and the quantity "QUANTITY" or "NUMBER", and let the record name and group header identify them as being relevant to inventory. Just to give both sides, consider this: 01 INVENTORY_RECORD. 10 PART PIC X(20). 10 STOCK. 20 QUANTITY PIC 9(5) EDIT_STRING ZZ,ZZ9. 20 LOCATION PIC 999. 10 BACK_ORDER. 20 QUANTITY PIC 9(5) EDIT_STRING ZZ,ZZ9. 20 SOURCE PIC 99. . . . ; It is possible to get to both quantities by using the group names (STOCK.QUANTITY and BACK_ORDER.QUANTITY), but since the group name would always have to be used to get to the back order quantity, why not call it BACK_QUANTITY in the first place. A related matter which will never be settled to everyone's satisfaction is how much to abbreviate names. An abbreviated name is handy if you are using the data interactively and don't want to type a lot, but if it is too short it won't adequately describe the data. For example, the inventory record could be reduced to: Introduction to Datatrieve Page 13 Application Design 01 INVENTORY_REC. 10 IN PIC X(20) QUERY_HEADER "Part". 10 IQ PIC 9(5) EDIT_STRING ZZ,ZZ9 QUERY_HEADER "Quantity". . . ; While someone very familiar with the application, and who knows that IN is the inventory number, could type in the field names quickly, anyone else using the data would have a difficult time trying to figure out what any of the fields actually are. One way to test your field names is to write them out on a piece of paper and then give it to a person who knows the general area of the application but not your design (and doesn't have to know about computers or Datatrieve) and ask them to identify what each name stands for: if they cannot do it reading directly from the paper, you have probably made your abbreviations too short. Remember also that the names serve to document the data items, and if you have to go back and work with the domain after a six month absence, you may then wish you made the names a little more descriptive. Once the record is finally defined, then its time to define the domain (which simply ties together the names of the record, domain and file), and define the file. The big choice in defining the file is the choice of keys, if any. A KEYED (synonym INDEXED) file is one which, in addition to storing the data, also stores an index which provides a short-cut to access the data. An index is very much like the index of a book: rather than having to search each page to find an item of interest, the index tells you which page to go to directly to find that item. Also, like an index which is in alphabetical order to make finding your item easier, the file index is kept in order (alphabetical or numeric) to make searches faster. Unlike a book index, a file index generally has an entry for every record in the file, so it is a bit larger in proportion to the data than a book index, and it may be configured to index more than one field in the record. There are basically four reasons for using a key: 1. Faster access to the data. If you will be retrieving certain data items, and the criteria for retrieval is the contents of a particular field, you can make retrieval faster by making that field a key field: that way, the system can search the index rather than the entire file to find the data. Note that this works only for some types of retrievals, notably EQUALS (numeric and alphabetic), and over a range. Note particularly that CONTAINING is NEVER keyed: it is always necessary to read through the entire domain. For example, in the telephone lists previously described, if you are going to retrieve entries with the command NAME = "---" a key will help, but a retrieval by NAME CONTAINING "---" will not be helped with a key. Keys are especially important if the domain will be part of a VIEW or a CROSS, but less important in domains which use a single file Introduction to Datatrieve Page 14 Application Design and are nearly always accessed as a whole. Using the telephone file again as an example: if it was primarily to act as an on-line inquiry system, having the most accessed field as a key would greatly increase the speed of information retrieval. If, however, it was used mostly to print a telephone directory, where the entire file (domain) is printed at once, a key might not increase the speed of the application any (but see the next paragraph). 2. Imposing order on the data. The data in an indexed file is ALWAYS sorted by the primary key, and if you access the data without saying SORTED BY, it will automatically come back sorted by the primary key. Since sorting always takes time, if you know the domain will primarily be reported in a certain order, you can make that field the primary key, and save a lot of time by not having to re-sort the data. The more items there are in the file, the more time you will save by not re-sorting them. In the case of the telephone directory: having the name as a key would keep the file sorted alphabetically, so it would take less time to generate a telephone directory ordered by name. 3. Preventing duplicates. There is an optional qualifier on all keys which permits or prevents duplicates (two or more records with the same key). This has such uses as preventing the telephone directory from having two or more people with the same telephone number, or on applications such as employment records, you may want to prevent the accidental entry of two employees with the same identification number. By making this number a key (primary or alternate), and specifying NO DUP, it will be impossible for anyone to enter a duplicate number, even deliberately: RMS simply won't permit it. 4. Preventing changes. There is another attribute to permit or prevent a keyed field to be changed (using the MODIFY command). This can be handy for many applications: consider a domain which keeps track of job assignments, where the original due date is stored by the manager, and the person assigned the job will update the record with relevant information. This means the user must have MODIFY access to the domain, but this will give them modify access to all fields, and you may not want them to change the due date. If the due date is made a key (primary or alternate) with the NO CHANGE attribute, the user will then be able to modify other fields to update them, but not the due date: no matter what access Datatrieve or CDD allows, RMS will not permit it. Two things to remember: first, the primary key can NEVER be changed. If it is wrong, you have to delete the record, and store it all over again Introduction to Datatrieve Page 15 Application Design with the correct key. This is an inherent factor in the operation of RMS, and cannot be altered. The other thing is that NO CHANGE on a key does not mean no delete, so if you don't want your user simply to delete the record and put in another, give them modify access but not erase access with Datatrieve or the CDD. With these factors in mind, the appropriate fields can be selected for keying. For example, the telephone records shown earlier would probably have the telephone number as the key, as the data will most often be retrieved by saying NUMBER = ----- or some similar command. Keying the name might not help if retrievals are often of the NAME CONTAINING "--------" type, but may keep the file sorted in name order, which would be helpful if it is most often reported in name order. In the last telephone example, where the data was split between two domains, the number should definitely be the key for both files, as the field which joins domains together in a VIEW or CROSS should ALWAYS be keyed for fast access. Keep in mind that keying a field means that more space will be occupied by that file (on disk) than if it were not keyed, so do not key everything: you should show some discrimination, and key only the fields which will result in a benefit to your application. Occasionally, this may mean bringing the application up with few fields keyed, and then using it for a while to see which fields are most often accessed, and which are accessed in a way in which keys would be beneficial. The various RMS utilities can be used to move the data from one file to another with an updated set of keys to provide faster access to the data: you do not have to make any changes in Datatrieve when a field changes from not keyed to keyed, and vice versa, provided you do not change the actual layout of the record: Datatrieve is "told" by RMS which fields are keyed when the domain is readied. (If you change the name of the file, you will have to correct the domain definition.) Unless this is an application where the data file has already been created elsewhere for you (in which case data and the selection of keys has already been done), you must now create a file into which the data will go. The easiest way is to use the Datatrieve DEFINE FILE command. The file created by this command may not be the best possible file for every application, but it will work, and should at least be a reasonable design if you have done a decent job of selecting record organization and choice of keys. Use of the keywords available in Datatrieve can help, especially the ALLOCATION keyword. If you can make a reasonable guess at the size of the file, ALLOCATE space for it when you define it. This most often happens when you are creating a new domain from an old one, and know the size of the old file. If you have some experience with RMS and a good knowledge of the application, you may be able to increase performance by defining a different file structure with one of the RMS utilities (DES, DEF, EDIT/FSL, etc.), but even in this case, it is best to define a file with Datatrieve first, as this will get the position and type of all fields correct, and then modify it where a change is indicated. If you have not Introduction to Datatrieve Page 16 Application Design learned much about RMS, use the file Datatrieve gives you: it will be better than the file a person defines without a fairly good knowledge of file design and the application involved. The one exception is if you put your data into the file with Datatrieve first: once the file is filled, the various utilities can measure how much space will be needed to store your data and how the keys are distributed, and can do a fair amount of optimization when creating a new file to hold that data without your having to know a lot about file optimization. One minor point: when Datatrieve creates the file, it does not store the name of the field in the key descriptor (at least on PDP-11 and PRO-350), so you might want to use one of the file utilities just to go in and put the names of the corresponding fields on the keys. That way, if someone looks at the file later (with, for example, DES or DISPLAY/ATTRIBUTES) it will be easier to determine what is in the file, and to match the file with the corresponding application. Now that the record, domain and file are defined, it is necessary to load in or modify the data. You can teach everyone basic Datatrieve and let them use the STORE or MODIFY commands, but it is often better to use pre-defined routines. First, a general rule to remember is the less the user has to do, the fewer mistakes will be made. Second, the users may be in positions where they don't have time to learn the computer, or you may not want them to learn to use the computer: if your sales staff is going to enter sales receipts, you want them to spend as much time as possible selling, and as little time as possible doing computer and paper work. A pre-defined procedure will help everyone, and may also help keep the data in a clean form (like the jumbled names demonstrated earlier). The procedure may be very simple: Introduction to Datatrieve Page 17 Application Design DEFINE PROCEDURE STORE_SALES READY SALES WRITE DECLARE SALE_DATE USAGE DATE. DECLARE YOUR_CODE PIC 99. DECLARE ANOTHER PIC X. DECLARE TOTAL_SALE PIC 9(5)V99 EDIT_STRING $$,$$$.Z9. DECLARE TEMP_COST PIC 9(3)V99 EDIT_STRING $$$.Z9. DECLARE TEMP_ITEM PIC XX. DECLARE CORRECT PIC X. DECLARE ALWAYS PIC X. ALWAYS = "Y" WHILE ALWAYS EQ "Y" BEGIN YOUR_CODE = 0 WHILE YOUR_CODE NOT IN SALESMEN_TABLE BEGIN YOUR_CODE = *."your code number" END SALE_DATE = "TODAY" TOTAL_SALE = 0 ANOTHER = "Y" WHILE ANOTHER EQ "Y", "y" BEGIN TEMP_ITEM = *.ITEM TEMP_COST = 0 CORRECT = "N" WHILE CORRECT NE "Y" BEGIN TEMP_COST = *.COST IF (TEMP_COST BETWEEN 0.01 AND 99.99) THEN CORRECT = "Y" IF CORRECT NE "Y" THEN PRINT "Please re-enter cost." END STORE SALES USING BEGIN SALESMEN = YOUR_CODE ITEM = TEMP_ITEM COST = TEMP_COST END TOTAL_SALE = TOTAL_SALE + TEMP_COST ANOTHER = *."Y for more items" END PRINT "Total for this sale was ", SPACE 1, TOTAL_SALE END END_PROCEDURE salesmen_table 01 : "J. Smith", 02 : "B. Brown", 03 : "T. Jones" END_TABLE I am not claiming this is an ideal procedure: it is just to demonstrate some possible features. The first WHILE is used to keep this one procedure on the screen, and would be used if the application were started in the morning and kept running on a terminal all day. The purpose of YOUR_CODE Introduction to Datatrieve Page 18 Application Design is to allow the user to enter a number and have it validated against a table BEFORE storing it in the record (in this case, there must not be an entry for 0 in the table, or an ELSE at the end, or every entry will be valid). If a VALID_IF clause were put in the record definition, it would abort the transaction AFTER all of the data is typed in, which would annoy the users, and waste time. Similarly, TEMP_COST is also used to check for valid data before storing (and to calculate TOTAL_COST), and TEMP_ITEM could also be validated against a table or domain. The SALE_DATE is automatically set to TODAY, which saves the user from having to type in a field which can be obtained automatically, and which eliminates a possible source of user error. The last loop is used to store each individual item, and keep track of the total (this implies an OCCURS clause in the record definition with ITEM and COST in the variable part). Additional enhancements could be for the procedure to calculate sales tax, or modify the inventory data base to reflect the items sold, or the total cash on hand in the store, and so on. There is obviously a good deal more that this procedure could do: its purpose is to be an short example of the type of validation processing which can be done to insure that correct data is entered, and to give the user some useful messages when something goes wrong. Once the domain has been created and populated, the data is almost always desired in one or more reports. If the goal of the application was well defined at the beginning, producing the report should follow quite easily from the record layout. Just as with the selection of data items, the report layout may come from several sources: 1. Blank Forms. If you are converting a paper operation to Datatrieve, you may want your new form to match the old form (you might even be putting pre-printed forms into your hard copy printer). In this case, you can just copy the layout of the form in your report procedure. 2. Existing Reports. If you are developing an application to match an existing report, it can serve as the template for your report procedure. 3. Other Documents. In this case, the format of the data might be less well defined, but may still serve as a template. For example, if your application is based on a document which has the department name in the upper left corner, then the sales for the day by salesman, then the sales for the month at the bottom of the document, you may wish to follow this general format in your report so the users will have less of a difference to adjust to when converting. Introduction to Datatrieve Page 19 Application Design 4. Existing data. This will probably also fall under the category of "Hard Work", as in most cases the report desired may not be defined, though it may be that the reason the application was designed in the first place was to derive certain information from an existing data base. 5. Hard Work. It may be that the report will have to be designed from scratch. The subject of actually using the report writer is one that requires a dedicated session, and will not be further discussed here.