From: CRDGW2::CRDGW2::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 17-APR-1989 10:48 To: MRGATE::"ARISIA::EVERHART" Subj: Thoughts On The Application Of The DNS/DFS Products... Received: From KL.SRI.COM by CRVAX.SRI.COM with TCP; Mon, 17 APR 89 06:41:19 PDT Received: from remote.dccs.upenn.edu by KL.SRI.COM with TCP; Mon, 17 Apr 89 06:17:53 PDT Received: from XRT.UPENN.EDU by remote.dccs.upenn.edu id AA08278; Fri, 14 Apr 89 13:46:02 EDT Message-Id: <8904141746.AA08278@remote.dccs.upenn.edu> Date: Fri, 14 Apr 89 13:47 EDT From: "Clayton, Paul D." Subject: Thoughts On The Application Of The DNS/DFS Products... To: INFO-VAX@KL.SRI.COM X-Vms-To: @INFOVAX,CLAYTON There are two products, made and sold by Digital Equipment Corporation that I am using which as been the target of recent postings. They are both saving me considerable time and effort. They are the DNS and DFS products. The problem I am having deals with disk accesses from remote systems, as described below. ----------------- The disk problem is complicated, but will be overly simplified, to show what is going on. There will be up to eighteen (18) MICs (Mixed Interconnect Clusters), with up to forty-two (42) nodes each, communicating to dual HSC's with disks and tapes attached. Add into this hardware mixture the following facts and constraints. 1. Most users have a workstation, of one type or another, which is a member of one of these MI clusters. Movement of workstations from one cluster to another will ONLY be done for those people changing work areas. 2. Considerable amounts of software, both DEC and third party, is licensed for certain systems, and not all of them. In some cases, software is licensed for a 8550 boot node member, but not the workstations running off the same MIC. 3. Some software is licensed for one MIC, but due to the 42 node limit, and other considerations, the users of the product may not be on the MIC that has the software licensed to run. But these same users must be able to get to the software in order to perform their job function. 4. In some cases, user groups (those doing the same type of function, be it development of a sub-system or something else) may or may not be on the same cluster due to disk storage constraints, 42 node limits or group breakdowns. 5. All systems are on the same Ethernet, and therefore able to talk with one another freely. 6. Certain application packages do not allow for node names to be specified in a file specification which would have allowed 'normal' DECnet access to files on other MICs. 7. 'Normal' DECnet file transfers are not very fast due to the various layers of DECnet and associated software. This is in reference to throughput versus time versus record size. 8. Multiple usernames on different MICs is not attractive due to having different disks that the same users work is then spread over. The case always comes up that the same user wants to take a file from one application and use it for another application. Due to the license problems, these could be different MICs the user has to be on. With the above items known, the list of agreed upon issues was determined and are listed next. 1. The license issue is based on which processor the program is 'executed' on. Many packages here have keys based on the system id register (SID) or Ethernet address of the processor which is legally allowed to run the program(s). No mention is made of where the input and output data files are located. The bottom line is that if the node name problem would be worked around, users with accounts on MIC 'C' could log into MIC 'A' and run some program with the files used being those of the users account on MIC 'C'. 2. Having the same username on multiple MICs, with ALL UAF fields, except the device and directory specification being the same, is allowed for. The extra management overhead will need to be taken care of with manpower. Considering all the above, there are two products that work with each other to provide the needed abilities. They are DNS (Distributed Name Service) and DFS (Distributed File Service). Having DNS is a requirement of DFS. A brief description of the two products follows. DNS: This product 'lives' in the network and provides a means to store and retrieve common information from any node in the network. The information is stored in a 'name_space', and accessed through a 'clearing_house'. The name space can be thought of as a 'group' of names and associated equivalence strings. The clearing house is the processor which is executing the DNS software to maintain a specific name space. In order to survive single node failure, DNS is setup to have another node in the network, be a 'clearing_house' for the same name_space, in a Read- Only mode. This method of surviving node failure is drastically different then the 'standard' cluster rule of having the same software and information available on another node in the same VAXcluster. It is recommended for DNS that the RO version clearing house be a processor not in the same cluster. It should also be pointed out that the nodes for both clearing houses do NOT have to be cluster members, it just happens that I have nothing BUT cluster members to work with. Various applications, like DFS, can 'query' the clearing house with a certain 'name' and DNS will return the equivalence string. DFS: This product 'lives' on each processor that needs access to disks other then ones in its own cluster, or directly attached. There are two version's of DFS. The first is the CLIENT kit which allows a node to gain access to disks on another node, as if they were locally attached. The second, is the SERVER kit which would make local disks be available to nodes elsewhere on the network which could not gain access to them through cluster membership or local attachments. The specific method of setting up the MIC's was to insure that each cluster followed certain rules. The following rules where defined by myself and not a requirement of DFS or DNS. Most of the rules were defined to reduce the amount of confusion this problem creates. 1. Only the CI boot node processors will have a DECnet cluster alias defined as 'XCLUS' where 'X' is a letter of the alphabet starting with the letter 'A' and denotes a specific cluster. 2. The disks will have volume labels of the form 'XUSERN', where 'X' is the cluster identifier and 'N' is the logical unit number for the disk. Logical names are defined at system level to mirror the volume labels. The system disk label is left as 'VMSRL5' but a logical name of 'XUSER0' is defined during the boot process. 3. On each boot node only, an 'access point' is defined to DNS for each of the available disks on that cluster. An access point is a disk and directory specification that DFS on another node can reference in a MOUNT command. The access points are broken down by cluster in the following format. $DFSCP ADD ACCESS_POINT XCLUS.XUSERN XUSERN:[000000]/CLUSTER where DFSCP is the DFS application control program, much like NCP is to DECnet. The 'X' variable is the cluster desiginator and the 'N' is the volume number. What this allows for is DFS on other nodes to mount this access point and gain access to the disk as if the entire disk structure was directly available through MSCP. Note here that there is nothing stopping you from setting an access point to be a user directory and NOT the MFD. The knowledge of what each access point refers to is assumed to be known and understood when anyone references it. The '/CLUSTER' switch allows for node failure of the boot nodes. Any disks mounted through DFS will enter mount verification and continue on a surviving node, if one is available with the same cluster alias. Only the boot nodes have the cluster alias defined to prevent satellites from getting DFS connections from other nodes. 4. On any node, this includes boot nodes and satellites, wanting access to another clusters disks, the following DFS command 'mounts' the other nodes disk as a local disk. $DFSCP MOUNT/SYSTEM XCLUS.XUSERN XUSERN/VOLUME=XUSERN The result is a disk which a 'SHOW DEVICE/MOUNTED XUSERN' command would yield the following output. Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DFSC1001: (node_name) Mounted 0 XUSERN ****** 2 1 If a 'SHOW DEVICE/FULL DFSC1001' command is done, the listing shows it to be an RA-82 device, on-line and mounted, shareable and served to the cluster via MSCP. The specific line item information is that for an 'normal' RA-82 disk. It is my belief that the device info is that associated with the particular device being servered, and would therefore change depending on the device type. I have not proven this, as I have nothing but RA-82 disks. The misleading entry here is that the device is NOT served to the cluster via MSCP, and in fact can not be 'SET SERVED'. The result is a need to 'mount' the same access point on every node that needs access to a particular remote disk. The mount can be done at SYSTEM level, and therefore available to anyone on that specific processor, or at process or GROUP level for restricted access. Since the access point, for my purpose, is made at the MFD level, the 'normal' directory specifications work such as XUSERN:[USER.FOOBAR]. Items to note. 1. The speed of the access is VERY good when compared to normal DECnet transfers. Looking at VS3100's screens it might be difficult to tell the difference between normal LAVc disk speed and DFS. This is possible even though DECnet links are used by DFS to pass data. It should be noted that the DFS product installation kit patches a number of system images and requires a reboot in order to function. One can only assume that the patches are allowing backdoor, and much faster, paths through DECnet and what have you. The speed does rely on the load, and size of the processor serving as the DFS server node. If there is a maxed out 8250 as a DFS server node, then the transfers are not going to be that good, but then neither is the LAVc responsiveness. 2. The ability to have a SINGLE, shared, SYSUAF file was attempted to help eliminate the management headaches. It did not, and will not work. 3. Control of file access over DFS is handled through the DECnet PROXY mechanism. This means that appropriate entries have to be made in the proxy database using the AUTHORIZE command to allow for a user on one system to access files on another using DFS. This can also add to a management headache depending on the network environment. 4. There are restrictions on the types of I/O operations that can be performed over DFS. It boils down to no write sharing of files, and only virtual I/O operations. Logical and physical I/O operations are not allowed. You can use DFS with the BACKUP command for storage of save sets on remote disks, as well as using a DFS device in a BACKUP output qualifier in order to copy directory structures from one node to another. Without DFS, you had to specify a save set on the remote system then 'unpack' the save set on the remote system. You can also create directories, given the proper protections and privileges. 5. There is a single DECnet link between the client system, who has the disk mounted, and the server system, who has the disk directly attached by HSC or direct connect. If multiple disks are being 'served' by the same server, to the same client, only one DECnet link is used. These links can add up to substantial numbers. It is incumbent on the system manager to manage the DECnet executor database for the number of allowable links 'in-bound' to the server nodes. Of particular interest is to insure that sufficient resources are made available for both the cluster alias and processor specific connections. This is done through the following two commands. $MCR NCP (DEFINE/SET) EXECUTOR ALIAS MAXIMUM LINKS nnn $MCR NCP (DEFINE/SET) EXECUTOR MAXIMUM LINKS yyy The number used for the cluster alias command should allow for all the links to reside on a single processor. This will allow for node failure in the cluster without having link rejects occur. 6. There are control mechanisms in the DFS control program (DFSCP) for such things as the number of buffers to make available, how many requests can be outstanding at any one time, how many 'connections' will be served at one time, etc.. These values will require monitoring and modification as the user environment changes. Hope this helps gives some insight into how the product set works. pdc Paul D. Clayton Address - CLAYTON%XRT@RELAY.UPENN.EDU Disclaimer: All thoughts and statements here are my own and NOT those of my employer, and are also not based on, or contain, restricted information.