Everhart, Glenn
From:	Steve Spires [sspires1@ford.com]
Sent:	Friday, July 10, 1998 7:09 AM
To:	Info-VAX@Mvb.Saic.Com
Subject:	Re: NT Meltdowns/Disasters; Documented Examples needed (Long...)
Not quite what you want, but interesting nonetheless;

The problems with NT faced by NatWest are no surprise to those in the
 industry. Danny Bradbury explains.
 If the mainframe were an animal, it would be a tortoise: strong,
reliable, plods
 along for ages and never falls over. Some users argue for the
client/server hare -
 a smaller, faster animal that seems to be everywhere at once. But what
happens
 when it finds itself caught in the headlamps of pressure? It freezes. 

 Gartner analyst David Norris presented a report at the UK Computer
 Measurement Group earlier this year in which he highlighted statistics
relating
 to the cost of mainframe millions of instructions per second (mips) and
the
 performance and availability of single-server symmetric multiprocessor
NT
 installations. Norris argued that the average selling price per mips
for
 mainframes will decline by between 25% and 35% a year for the next two
or
 three years. Costs would be pushed down by competition between
mainframe
 suppliers, and increased pressure placed on the mainframe market by the
Unix
 vendors. 

 This is not that surprising, as mainframe hardware costs are always
shrinking.
 The part of the report that will strike fear into the hearts of some NT
users,
 however, is the downtime statistics analysis. According to Norris, at
the end of
 last year, NT availability ran at 99%, compared to 99.6% for Unix and
99.975%
 for the mainframe. In 1999, NT would reach 99.6%, while Unix would have
 reached 99.8% and the mainframe will have jumped to 99.99%. In 2002, NT
 will have reached 99.9% availability, putting it at the same level as
Unix, but
 still far behind the mainframe, which will have attained a 99.9999%
availability
 record. 

 This may not seem like much of a difference, but at nearly a whole
percentage
 point behind the mainframe, NT's availability record, according to
Gartner,
 translates into unattractive real-world downtime figures. Calculations
reveal that
 according to these figures, NT downtime equates to roughly 1.68 hours
per
 week, while mainframe downtime weighs in at 0.042 hours per week. The
 Gartner figures suggest NT users can expect their systems to be
unavailable for
 101 minutes every week, compared to the mainframe's humble 2.5 minutes. 

 Perhaps just as annoying for Microsoft and its devotees are Gartner's
scalability
 figures. They suggest that a single server implementation of NT could
cope with
 500 concurrent online transaction processing users at the back end of
last year,
 compared with Unix's 1,600-user capability and the mainframe's 5,000
users. By
 2002, NT will be able to cope with 2,000 concurrent users compared with
Unix's
 6,000 users and 20,000 on the mainframe, according to Norris. 

 Some end-users of NT are, however, unmoved by the findings. Phil
Scutchings,
 director of information services at the Surrey Police Force, is bullish
about his
 NT system's capabilities. "If you're talking about airline reservation
systems, then
 you're probably right, but if you want to just give information to
people as they
 require it, then NT is perfectly scalable," he says. 

 Scutchings has 2,500 users on his NT system, which runs the command and
 control centre where emergency calls are logged. So are there any other
 applications that would not be suitable for an NT system? "Maybe air
traffic
 control systems - things that require high volume and a real time of
one- or
 two-second response supported by a single application. NT might have
this
 capability eventually but I wouldn't suggest that it has at the
moment,"
 Scutchings says. 

 ICL has recently put together a demonstration system following a
strategic
 tie-up with Microsoft, stringing eight servers together at the back end
to handle
 50,000 concurrent users on an Exchange system. Software marketing
manager
 Val Newman is proud of the system, but admits that the installation is
 asynchronous - the system could hardly be called a mission-critical,
 transactional system. ICL will work on a transactional system using
Microsoft's
 Terminal Server, but this challenge has not yet been met. 

 Andrew Gutteridge, UK marketing director at middleware supplier Neon,
agrees
 that NT's availability leaves something to be desired. In particular,
he says NT is
 not suitable for large-scale mission-critical systems. "These are
systems that deal
 with the processing of orders, for example. Applications where you have
to take
 the order from the customer, process it and fulfil that order are the
most mission
 critical," he says. 

 Johann Edward, technology marketing manager at Data General, says NT is
 suitable for most people's availability needs. He adds that
availability issues
 can be addressed by including enhancements to the server, and cites
parity
 checking on the memory and PCI buses. Data General's servers spot soft
errors
 and dial a support database to fix the problem automatically. He says
the
 company's service manager is happy to contract to 99.95% availability
for a
 clustered system using its Clariion technology. 

 Nevertheless, from an availability perspective, the mainframe beats NT
because
 it has a range of features that were built in from the start 30 years
ago, explains
 US-based John Phelps, research director with Gartner's Enterprise
Systems and
 Central Operations. 

 Taking IBM's System 390 operating system as an example, Phelps cites
 functional recovery routines for each major component of the operating
system.
 Problems are isolated to their smallest impact, so that when a module
has a
 problem, it passes control to a routine written to analyse and solve
it. If that
 routine cannot handle the problem, it passes it along the chain to the
 functional recovery module above it. On the hardware side, many
mainframe
 servers have gone beyond parity checking into error correction,
handling
 multibit rather than single bit errors, says Phelps. 

 Part of the problem is that all these reliability enhancements carry a
certain
 amount of operating system overhead, he says, adding that newer systems
that
 do not have them built in will be slower if they are included.
Meanwhile,
 systems such as NT are constantly enhanced with new functions, which
can
 make it hard to stabilise the system to the point where downtime can be
 decreased, he says. 

 "I see Unix moving into the enterprise arena and displacing some
mainframe
 activity, and NT moving into the low-end Unix arena. NT is squeezing
Unix at
 the bottom-end and mainframe is squeezing Unix at top-end," contends
Phelps.

 He adds that NT's work sharing capabilities are restrictive. The
system's
 clustering technology allows for two nodes, joined together in a
fail-over
 configuration, but there is no load sharing clustering capability yet. 

 Symmetric multiprocessing systems only work up to a point because of
bus
 overload issues, while Numa also has problems because it is suitable
for only
 certain types of application, says Unisys marketing manager Ian Benn.
He touts
 the company's Cellular Multiprocessing technology as the answer, where
all
 communications are handled in memory and there is no bus. 

 Some things are best left to the mainframe, admits Benn. He mentions a
Web
 site designed around NT to provide race results for a sponsored Nascar
racing
 team in the US. "It looked tremendous and worked like a dream for the
first 20
 minutes," he says. 

 "We were involved in a project to rewrite it to make it stay up. We
moved all the
 transactions back onto the mainframe and just used the NT system to
handle
 the messaging, drawing the data off the mainframe and putting it into
hypertext
 markup language." 

 Benn also has reservations about NT's failover clustering, arguing that
it has to
 be used with care. One of the biggest problems is that NT takes roughly
30
 seconds to fail-over from one server to the other, so if you are
dealing with a
 transactional system, you have to be sure that lost transactions can be
rebuilt. 

 Other issues surrounding the availability and scalability of Unix
systems include
 systems management and architecture, according to Adam Jollans, IBM's
 European marketing manager for NT software. Systems management in a
 distributed system is inherently more difficult than systems management
in a
 centralised host-based architecture, he believes. 

 Jollans cites two levels of systems management on NT. The first is
local area
 network management, with which he equates Intel's Landesk and
Microsoft's
 Systems Management Server. Then, on the enterprise level, he cites
tools like
 Computer Associates' Unicenter and Tivoli's TME and IT Director
products. 

 "You can't just take mainframe systems management strategies and apply
them
 to NT, because you have moved from centralised to distributed systems,"
he
 says. "That needs a different kind of systems management tool. You need
one
 that is built around a distributed environment." 

 Many companies with large, mission-critical applications may still
prefer to view
 NT as a middle-tier rather than a top-tier solution. Pulling some of
your
 applications logic into an NT-based middle tier gives you the advantage
of
 easier, faster applications development using object-oriented
technology and
 the ability to introduce user-friendly graphical user interface
technology. NT's
 efficiency as a Web server makes it an ideal platform for providing
access to
 highly scalable mainframe transactional systems, interfacing with them
on the
 end-user's behalf. 

 If you want to implement a large-scale transactional system using NT,
then
 bringing big systems skills to bear will go some way towards solving
the
 operating system's availability issues, says Benn. Approaching NT as a
system
 that still needs to be set up and managed properly will stand IT
departments in
 good stead, but many firms may be in danger of approaching it as a
no-brainer
 PC system that does a lot of work for the end-user. 

 The lack of attention to good systems design across the board is
illustrated by
 Find/SVP's 1997 survey which covered all companies and not just those
running
 NT installations. The statistics indicate that unplanned IT downtime is
often
 long, with 20% of respondents reporting downtime durations of between
five
 hours and a day. Of the respondents 23% reported outages of one to two
days,
 while 8% admitted longer downtimes, which could drastically affect a
business. 

 Unless users base their system on a robust design, it will experience
problems in
 the future. While NT's availability may fall far short of that of
mainframe levels,
 says Gartner, a little forethought about system workload and future
expansion
 will help end-users make the best of it. 

From <http://www.computerweekly.co.uk>

Steve Spires

Larry D Bohan, Jr wrote:
> 
> Title says it all;
> 
> Could  *really* use some good examples of sites
> where NT caused serious problems for the business.
> 
> especially Pointers to URL's, and trade press articles,
> 
> but even anecdotes, rumors, war stories, and pointers
> to USEnet ramblings would help;  possibly they could be tracked
> down to something authoritative.   Stories of so-and-so
> site tried this on NT, but had to give up, and reimplement on
> <..insert name of your favorite ...> OS
> 
> i expect this sort of thing to be hard to find, as most
> companies , people who survived IS disasters,
> (or got canned ..) tend to be real closed-mouthed about it,
> if not actually contractually forbidden to say anything.
> ie, it's not a thing to shouted from the roof-tops.
> 
> it'd be a fine thing if there was a Web-site somewhere
> decdicated to this topic, presumably outside the
> reaches of Billy Gates' many lawyers ...
> 
> lbohan (at) DBC dot Com, voice/fax:719.488.1652, in Colorado Springs
> Larry D Bohan, c/o Data Broadcasting Corp
> 1900 S Norfolk, Ste #150, San Mateo CA 94403-1151