Everhart, Glenn From: Steve Spires [sspires1@ford.com] Sent: Friday, July 10, 1998 7:09 AM To: Info-VAX@Mvb.Saic.Com Subject: Re: NT Meltdowns/Disasters; Documented Examples needed (Long...) Not quite what you want, but interesting nonetheless; The problems with NT faced by NatWest are no surprise to those in the industry. Danny Bradbury explains. If the mainframe were an animal, it would be a tortoise: strong, reliable, plods along for ages and never falls over. Some users argue for the client/server hare - a smaller, faster animal that seems to be everywhere at once. But what happens when it finds itself caught in the headlamps of pressure? It freezes. Gartner analyst David Norris presented a report at the UK Computer Measurement Group earlier this year in which he highlighted statistics relating to the cost of mainframe millions of instructions per second (mips) and the performance and availability of single-server symmetric multiprocessor NT installations. Norris argued that the average selling price per mips for mainframes will decline by between 25% and 35% a year for the next two or three years. Costs would be pushed down by competition between mainframe suppliers, and increased pressure placed on the mainframe market by the Unix vendors. This is not that surprising, as mainframe hardware costs are always shrinking. The part of the report that will strike fear into the hearts of some NT users, however, is the downtime statistics analysis. According to Norris, at the end of last year, NT availability ran at 99%, compared to 99.6% for Unix and 99.975% for the mainframe. In 1999, NT would reach 99.6%, while Unix would have reached 99.8% and the mainframe will have jumped to 99.99%. In 2002, NT will have reached 99.9% availability, putting it at the same level as Unix, but still far behind the mainframe, which will have attained a 99.9999% availability record. This may not seem like much of a difference, but at nearly a whole percentage point behind the mainframe, NT's availability record, according to Gartner, translates into unattractive real-world downtime figures. Calculations reveal that according to these figures, NT downtime equates to roughly 1.68 hours per week, while mainframe downtime weighs in at 0.042 hours per week. The Gartner figures suggest NT users can expect their systems to be unavailable for 101 minutes every week, compared to the mainframe's humble 2.5 minutes. Perhaps just as annoying for Microsoft and its devotees are Gartner's scalability figures. They suggest that a single server implementation of NT could cope with 500 concurrent online transaction processing users at the back end of last year, compared with Unix's 1,600-user capability and the mainframe's 5,000 users. By 2002, NT will be able to cope with 2,000 concurrent users compared with Unix's 6,000 users and 20,000 on the mainframe, according to Norris. Some end-users of NT are, however, unmoved by the findings. Phil Scutchings, director of information services at the Surrey Police Force, is bullish about his NT system's capabilities. "If you're talking about airline reservation systems, then you're probably right, but if you want to just give information to people as they require it, then NT is perfectly scalable," he says. Scutchings has 2,500 users on his NT system, which runs the command and control centre where emergency calls are logged. So are there any other applications that would not be suitable for an NT system? "Maybe air traffic control systems - things that require high volume and a real time of one- or two-second response supported by a single application. NT might have this capability eventually but I wouldn't suggest that it has at the moment," Scutchings says. ICL has recently put together a demonstration system following a strategic tie-up with Microsoft, stringing eight servers together at the back end to handle 50,000 concurrent users on an Exchange system. Software marketing manager Val Newman is proud of the system, but admits that the installation is asynchronous - the system could hardly be called a mission-critical, transactional system. ICL will work on a transactional system using Microsoft's Terminal Server, but this challenge has not yet been met. Andrew Gutteridge, UK marketing director at middleware supplier Neon, agrees that NT's availability leaves something to be desired. In particular, he says NT is not suitable for large-scale mission-critical systems. "These are systems that deal with the processing of orders, for example. Applications where you have to take the order from the customer, process it and fulfil that order are the most mission critical," he says. Johann Edward, technology marketing manager at Data General, says NT is suitable for most people's availability needs. He adds that availability issues can be addressed by including enhancements to the server, and cites parity checking on the memory and PCI buses. Data General's servers spot soft errors and dial a support database to fix the problem automatically. He says the company's service manager is happy to contract to 99.95% availability for a clustered system using its Clariion technology. Nevertheless, from an availability perspective, the mainframe beats NT because it has a range of features that were built in from the start 30 years ago, explains US-based John Phelps, research director with Gartner's Enterprise Systems and Central Operations. Taking IBM's System 390 operating system as an example, Phelps cites functional recovery routines for each major component of the operating system. Problems are isolated to their smallest impact, so that when a module has a problem, it passes control to a routine written to analyse and solve it. If that routine cannot handle the problem, it passes it along the chain to the functional recovery module above it. On the hardware side, many mainframe servers have gone beyond parity checking into error correction, handling multibit rather than single bit errors, says Phelps. Part of the problem is that all these reliability enhancements carry a certain amount of operating system overhead, he says, adding that newer systems that do not have them built in will be slower if they are included. Meanwhile, systems such as NT are constantly enhanced with new functions, which can make it hard to stabilise the system to the point where downtime can be decreased, he says. "I see Unix moving into the enterprise arena and displacing some mainframe activity, and NT moving into the low-end Unix arena. NT is squeezing Unix at the bottom-end and mainframe is squeezing Unix at top-end," contends Phelps. He adds that NT's work sharing capabilities are restrictive. The system's clustering technology allows for two nodes, joined together in a fail-over configuration, but there is no load sharing clustering capability yet. Symmetric multiprocessing systems only work up to a point because of bus overload issues, while Numa also has problems because it is suitable for only certain types of application, says Unisys marketing manager Ian Benn. He touts the company's Cellular Multiprocessing technology as the answer, where all communications are handled in memory and there is no bus. Some things are best left to the mainframe, admits Benn. He mentions a Web site designed around NT to provide race results for a sponsored Nascar racing team in the US. "It looked tremendous and worked like a dream for the first 20 minutes," he says. "We were involved in a project to rewrite it to make it stay up. We moved all the transactions back onto the mainframe and just used the NT system to handle the messaging, drawing the data off the mainframe and putting it into hypertext markup language." Benn also has reservations about NT's failover clustering, arguing that it has to be used with care. One of the biggest problems is that NT takes roughly 30 seconds to fail-over from one server to the other, so if you are dealing with a transactional system, you have to be sure that lost transactions can be rebuilt. Other issues surrounding the availability and scalability of Unix systems include systems management and architecture, according to Adam Jollans, IBM's European marketing manager for NT software. Systems management in a distributed system is inherently more difficult than systems management in a centralised host-based architecture, he believes. Jollans cites two levels of systems management on NT. The first is local area network management, with which he equates Intel's Landesk and Microsoft's Systems Management Server. Then, on the enterprise level, he cites tools like Computer Associates' Unicenter and Tivoli's TME and IT Director products. "You can't just take mainframe systems management strategies and apply them to NT, because you have moved from centralised to distributed systems," he says. "That needs a different kind of systems management tool. You need one that is built around a distributed environment." Many companies with large, mission-critical applications may still prefer to view NT as a middle-tier rather than a top-tier solution. Pulling some of your applications logic into an NT-based middle tier gives you the advantage of easier, faster applications development using object-oriented technology and the ability to introduce user-friendly graphical user interface technology. NT's efficiency as a Web server makes it an ideal platform for providing access to highly scalable mainframe transactional systems, interfacing with them on the end-user's behalf. If you want to implement a large-scale transactional system using NT, then bringing big systems skills to bear will go some way towards solving the operating system's availability issues, says Benn. Approaching NT as a system that still needs to be set up and managed properly will stand IT departments in good stead, but many firms may be in danger of approaching it as a no-brainer PC system that does a lot of work for the end-user. The lack of attention to good systems design across the board is illustrated by Find/SVP's 1997 survey which covered all companies and not just those running NT installations. The statistics indicate that unplanned IT downtime is often long, with 20% of respondents reporting downtime durations of between five hours and a day. Of the respondents 23% reported outages of one to two days, while 8% admitted longer downtimes, which could drastically affect a business. Unless users base their system on a robust design, it will experience problems in the future. While NT's availability may fall far short of that of mainframe levels, says Gartner, a little forethought about system workload and future expansion will help end-users make the best of it. From Steve Spires Larry D Bohan, Jr wrote: > > Title says it all; > > Could *really* use some good examples of sites > where NT caused serious problems for the business. > > especially Pointers to URL's, and trade press articles, > > but even anecdotes, rumors, war stories, and pointers > to USEnet ramblings would help; possibly they could be tracked > down to something authoritative. Stories of so-and-so > site tried this on NT, but had to give up, and reimplement on > <..insert name of your favorite ...> OS > > i expect this sort of thing to be hard to find, as most > companies , people who survived IS disasters, > (or got canned ..) tend to be real closed-mouthed about it, > if not actually contractually forbidden to say anything. > ie, it's not a thing to shouted from the roof-tops. > > it'd be a fine thing if there was a Web-site somewhere > decdicated to this topic, presumably outside the > reaches of Billy Gates' many lawyers ... > > lbohan (at) DBC dot Com, voice/fax:719.488.1652, in Colorado Springs > Larry D Bohan, c/o Data Broadcasting Corp > 1900 S Norfolk, Ste #150, San Mateo CA 94403-1151