From: Bill Todd [billtodd@foo.mv.com] Sent: Wednesday, August 16, 2000 12:03 PM To: Info-VAX@Mvb.Saic.Com Subject: Re: RMS Index Problem David A Froble wrote in message news:3999C469.EB8D2633@tsoft-inc.com... ... > You've gotten some other responses, and they are pretty much on target. RMS > secondary keys pretty much suck, and when you have lots of duplicates, they're > downright unbearable. While the latter half of the statement is certainly true for the number of duplicates being discussed here, as long as the number of duplicates is small enough for all the RFA pointers for a given key value to fit in a single bucket (I'd hope that today's RMS is suave enough never to split up such an RFA list when other splitting options exist) performance is just fine (relative to theoretical optimal performance), as it also is with non-duplicated secondary keys. That reflects a fundamental performance trade-off RMS made from the beginning, compared with other multi-key access methods such as Tandem's which use primary key values rather than RFAs as pointers in the secondary indexes (thus requiring *2* full index traversals to obtain a record by secondary key). Supporting permanent-RFA-style access costs performance on record insertions (save for initial population in primary-key order) even when no secondary indexes exist, but makes secondary-key access (when they do exist) - both random and sequential - as fast as it's possible to achieve in an architecture that supports primary key access at all (which requires that records be able to move in order to remain in primary-key sequence). Some architectures (e.g., Prime's KSAM, IIRC) avoided the issue by not supporting a primary key at all, so records never had to move, RFAs existed without the need for internal RRVs requiring updating on user-data-bucket splits, and secondary indexes could always point directly to the actual record using a physical address. However, in such an architecture there's no way to get good *sequential* performance on *any* key (since each record usually requires a separate disk access): one can work around this in special cases by populating the file in a desired access order and then processing it in that physical order, but as soon as records start getting deleted the holes reduce access efficiency - whereas deletions in a b-tree primary index allow the possibility to 'coalescing' to maintain record groupings and sequential access efficiency (though whether RMS supports this dynamically I don't know). So, except in cases of large numbers of duplicates for individual key values, RMS secondary indexes perform as efficiently as is possible and better than most competing implementations (leaving aside non-structural issues such as the degree to which write-back caching coupled with transactional logging might be able to improve RMS's general performance without sacrificing integrity). The only question is whether this performance may have been obtained at too great a sacrifice in other areas: maintaining permanent RFAs for records that move around to maintain primary-key order is expensive. A possible alternative would be the ability to define a primary index in which new records were automatically assigned monotonically-increasing key values, which could be used in the secondary indexes as Tandem does but would be highly-compressible (making the primary index traversal not that much less efficient than RFA access): this would give very good sequential access by primary key (though in an order that might not be useful in other than whole-file scans), good access by an RFA-like ID (the primary key), and pretty good secondary key access (though not always as good as it is in RMS today). The bottom line is that supporting multi-key access requires some trade-offs. RMS's weren't bad. The issue of having large numbers of duplicates for a given key value is fairly orthogonal to secondary indexing per se (the same problem occurs with large numbers of primary dupes): Chris Saether and I explored various improvements back in 1978 to improve insertion performance (there's no obvious way to avoid a full dupe scan on alternate-key deletions), but AFAIK nothing ever got implemented. - bill