From: Bill Todd [billtodd@foo.mv.com]
Sent: Wednesday, August 16, 2000 12:03 PM
To: Info-VAX@Mvb.Saic.Com
Subject: Re: RMS Index Problem


David A Froble <davef@tsoft-inc.com> wrote in message
news:3999C469.EB8D2633@tsoft-inc.com...

...

> You've gotten some other responses, and they are pretty much on target.
RMS
> secondary keys pretty much suck, and when you have lots of duplicates,
they're
> downright unbearable.

While the latter half of the statement is certainly true for the number of
duplicates being discussed here, as long as the number of duplicates is
small enough for all the RFA pointers for a given key value to fit in a
single bucket (I'd hope that today's RMS is suave enough never to split up
such an RFA list when other splitting options exist) performance is just
fine (relative to theoretical optimal performance), as it also is with
non-duplicated secondary keys.

That reflects a fundamental performance trade-off RMS made from the
beginning, compared with other multi-key access methods such as Tandem's
which use primary key values rather than RFAs as pointers in the secondary
indexes (thus requiring *2* full index traversals to obtain a record by
secondary key).  Supporting permanent-RFA-style access costs performance on
record insertions (save for initial population in primary-key order) even
when no secondary indexes exist, but makes secondary-key access (when they
do exist) - both random and sequential - as fast as it's possible to achieve
in an architecture that supports primary key access at all (which requires
that records be able to move in order to remain in primary-key sequence).

Some architectures (e.g., Prime's KSAM, IIRC) avoided the issue by not
supporting a primary key at all, so records never had to move, RFAs existed
without the need for internal RRVs requiring updating on user-data-bucket
splits, and secondary indexes could always point directly to the actual
record using a physical address.  However, in such an architecture there's
no way to get good *sequential* performance on *any* key (since each record
usually requires a separate disk access):  one can work around this in
special cases by populating the file in a desired access order and then
processing it in that physical order, but as soon as records start getting
deleted the holes reduce access efficiency - whereas deletions in a b-tree
primary index allow the possibility to 'coalescing' to maintain record
groupings and sequential access efficiency (though whether RMS supports this
dynamically I don't know).

So, except in cases of large numbers of duplicates for individual key
values, RMS secondary indexes perform as efficiently as is possible and
better than most competing implementations (leaving aside non-structural
issues such as the degree to which write-back caching coupled with
transactional logging might be able to improve RMS's general performance
without sacrificing integrity).  The only question is whether this
performance may have been obtained at too great a sacrifice in other areas:
maintaining permanent RFAs for records that move around to maintain
primary-key order is expensive.  A possible alternative would be the ability
to define a primary index in which new records were automatically assigned
monotonically-increasing key values, which could be used in the secondary
indexes as Tandem does but would be highly-compressible (making the primary
index traversal not that much less efficient than RFA access):  this would
give very good sequential access by primary key (though in an order that
might not be useful in other than whole-file scans), good access by an
RFA-like ID (the primary key), and pretty good secondary key access (though
not always as good as it is in RMS today).

The bottom line is that supporting multi-key access requires some
trade-offs.  RMS's weren't bad.  The issue of having large numbers of
duplicates for a given key value is fairly orthogonal to secondary indexing
per se (the same problem occurs with large numbers of primary dupes):  Chris
Saether and I explored various improvements back in 1978 to improve
insertion performance (there's no obvious way to avoid a full dupe scan on
alternate-key deletions), but AFAIK nothing ever got implemented.

- bill