From: Robert Young [YOUNG_R@Encompasserve.org] Sent: Wednesday, April 11, 2001 9:44 AM To: GlennEverhart@FirstUSA.com Subject: Re: Solaris tmpfs (was Re: VMS-Related: Affordable) Glenn, On Tuesday John AtoZ (his last name is about 12-15 characters long , starts with A ends with Z and his actual email addres at Compaq is John.AtoZ@compaq.com) gave a talk entitled: Shadowing Mini Copy using Write Bitmap it was a good presentation. The presentations are zipped and reside here: http://encompasserve.org/lugs/esilug/ovmstud/ Rob No, don't know anyone by that name either. Go ahead and forward it to him if you like. If I were still fiddling with this, I would use the shadow update logic I built into one of my other drivers for shadowing, so the startup would not be delayed by the copy operations I mentioned, and would fix the thing to continue to allow read operations during the periodic updates mentioned. Note too that the local operation needs to be delayed only long enough to record the local journal. The remote update can be done from the local journal to the remote "shadow", so a slow network link really doesn't need to delay anything. My code did not do that because it ran well enough as it was, and was sort of a quick hack to satisfy the security worry. I would certainly do that though. The bitmap limited the thing to the size of an RK05 the way I built it, but could have been larger. It was partly a memory issue with the amount of memory on the boxes in question. Also because the boxes got rebooted every time I could cheat on the detection with DCL to apply the necessary smoke & mirrors. It would have been better to have more intelligence in the host process to restore orderly operation and not depend on external setup routines. Still, the code is out there for anyone to fiddle with as pleases the fiddler... Glenn Everhart -----Original Message----- From: Robert Young [mailto:YOUNG_R@Encompasserve.org] Sent: Wednesday, April 11, 2001 8:36 AM To: GlennEverhart@FirstUSA.com Subject: Re: Solaris tmpfs (was Re: VMS-Related: Affordable) Glenn, Have you shared your thoughts with John AtoZ? I have found him to be very approachable. I have exchanged email with him on occasion and spoke with him down at College Park a few weeks back. Not too much sucking up on my part but I did mention I thought the in-memory write bitmap was a neat idea. Also, up to 6 bitmaps for a device can be active at one time and a brief conversation helped to firm up the disaster tolerant aspects of the bitmap. This seems to be a follow-on to your work or very similar. Rob One of the clients for my zrdriver is in effect a memory disk. However the memory is process memory in a big array in the host process...which means it gets paged and the amount of actual memory used depends on the working set that process gets. The code is not the very fastest available but it works and is free and isn't too bad...you just have to get it set up. It was convenient to test zrdriver (aka fddriver) that way since initially I didn't want to risk anything disk resident while debugging the driver. A process based array however can be pretty well as big as you want. I have also written versions that would shadow the memory array to a diskfile; wrote to both, read the memory array. That worked well. So did a variant that updated a shadow file every 15 minutes or so, keeping an internal bitmap of altered blocks so it could update the shadow copy over DECnet by only writing the altered blocks. It did this in two phases: first, write the blocks and LBNs locally, then update the remote shadow, then delete the local copy. That way even if something crashed you could get a valid snapshot. Either there was a valid snapshot at the remote copy, or you could create one by taking the local modified blocks journal and applying it to the remote copy. You just had to notice if the local copy had been properly closed after boot. I was pleased in that any LBNs modified multiple times over the 15-20 minute periods would of course be remote updated only once. Reason for this was we had some vaxstations that were used for classified stuff. The mgmt wanted faster updates, but they had to be serially reusable by others. By keeping the disk file in virtual storage on the pagefile, this ensured well enough for the users that a new user could not readily scavenge the disk from a previous one. At login he'd get a copy of the remote shadow disk, which the host process copied to the memory area, so you started out with both the same, and every so often the system would pause for journal update, but most of the time, disk updates were to local storage so the DECnet (remember 10 MHz ethernet) didn't get totally bogged down. The code still exists, is on sigtapes, is free and in source, and can of course be adapted if anyone cares to. -----Original Message----- From: young_r@encompasserve.org [mailto:young_r@encompasserve.org] Sent: Tuesday, April 10, 2001 11:24 AM To: Info-VAX@Mvb.Saic.Com Subject: Re: Solaris tmpfs (was Re: VMS-Related: Affordable) In article <3AD30BCD.214B91D3@uk.sun.com>, andrew harrison writes: > Hoff Hoffman wrote: >> >> In article <3AD1F484.9C8E4598@uk.sun.com>, andrew harrison writes: >> >> :Sort of. Solaris has an in-memory filesystem called tmpfs. >> : >> :/tmp is normally mounted as a tmpfs volume but some people >> :also use it for things like Sybase TMPDB. It never gets >> :flushed to disk unless you run out of memory and it >> :gets paged. >> >> A good idea, but certainly not the implementation that I would have >> chosen. I'd have gone the route of a host-based RAM disk with pageable >> memory, and the same file system everything else uses. (Why? A disk >> device is a whole lot less involved than a file systems. I'd prefer >> to avoid maintaining yet another file system.) >> > > You can also do that. There are vendors who provide > SBus or PCI NVRAM RAM disks that then have a UFS > type filesystem on them. One I have seen also has a > disk attached that is used to flush the data > to disk if there is a power failure. The disk is > obviously battery backed as well. > > But people tend to use these devices where they > need very high performance persistent storage. > Going forward (12 months?) , this is not a problem. The RamDisk will be one member of a shadowset. Reads go to the RamDisk, writes go to both. The disk based shadowset of course has writeback caching turned on so it is catching the writes and flushing without a problem as there is no reads to contend with. John AtoZ has shadowing RamDisks as a bullet on his slides. > > tmpfs is used where people need very high performance > but non-persistent storage. > You do the best you can with what you have to work with. Rob