While disk problems are a big culprit in storage subsystem failures, enterprises might want to begin eying physical interconnects, since they're just as often to blame.
That's according to researchers at the University of Illinois at Urbana-Champaign and Network Appliance. The researchers -- university researchers Weihang Jiang, Chongfeng Hu and Yuanyuan Zhou, and NetApp's Arkady Kanevsky -- concluded in a recent study that disks were responsible for 20 to 55 percent of failures.
Digg - Capcom: dont worry, the 5GB Devil May Cry install is worth:: Blame Retailers for Higher PS3 Devil May Cry 4 Prices dont want developers to get lazy and require 5GB/20 min installs for every game http://digg.com/playstation/Capcom_don_t_worry_the_5GB_Devil_May_Cry_install_is_worthHOME | From Concept to Consumer by Phil Baker: China:: While they dont inspect each and every product, they do statistical sampling to of the display screen, the disk-storage devices, and other electronic components http://blog.philipgbaker.com/my_weblog/china/index.htmlHOME | But they also found that physical interconnects including shelf enclosures could claim even higher failure rates: 27 to 68 percent.
Yet more Maxtor and IBM drive failures - Discussion@SR:: I dont know how many of you saw this post from a thread for 75GXP owners but I millionths of an inch on each and every rotation of the disk), or be allowed to http://forums.storagereview.net/index.php?showtopic=5910&view=findpost&p=61681HOME | Fear the Cowboy: Windows Home Servers Drive Extender vs RAID:: Arbitrary storage expansion by supporting any type of hard disk drive (Serial could recreate every file that I still had data for, if not a little slowly. http://www.fearthecowboy.com/2008/08/windows-home-server-drive-extender-vs.htmlHOME | "Disks are not the only component in storage systems," wrote the study's authors. "To connect and access disks, modern storage systems also contain many other components, including shelf enclosures, cables and host adapters, and complex software protocol stacks ... Failures in these components can lead to downtime and/or data loss of the storage system."
lists.community.tummy.com/pipermail/nclug/2007-September.txt:: iherr.com> My embedded Debian box is hanging for 5-10 seconds every minute or two. Keystrokes dont even show up during this, and then they catch up after the http://lists.community.tummy.com/pipermail/nclug/2007-September.txtHOME | StorageMojo Responding to Marc Farley of EqualLogic:: the ne plus ultra of disk technology, is at 1 IOPS for every 750 MB of capacity. If so, dont blame RAID for supplying what its consumers demand (and demanded http://storagemojo.com/2007/04/16/responding-to-marc-farley-of-equallogic/HOME | "Hence, in complex storage systems, component failures are very common and critical to storage system reliability," they said.
Their findings, available in PDF format, are slated to be presented at this week's 6th USENIX Conference on File and Storage Technologies (FAST).
The study's authors analyzed almost five years' worth of storage logs from 39,000 systems deployed at NetApp customer sites. Those systems include approximately 1.8 million disks, across 155,000 high-end, mid-range, low-end and backup shelf enclosures.
In addition to new statistics on the role of physical interconnects in failures, the researchers also found that protocol stacks were responsible for 5 to 10 percent of failures.
Fortunately for IT admins, the report also suggested some ways to help beat the odds.
For instance, storage subsystems tied together with redundant interconnects experienced 30 to 40 percent lower failure rates than those with a single interconnect, it said.
Additionally, spanning disks of a RAID group across multiple shelves in a system makes for a "more resilient" approach than using a single shelf, the study stated.
Other design considerations could play a role in further reducing problems.
"Storage system designers should also think about using smaller shelves, with fewer disks per shelf, but with more shelves in the system," the report said.
The research takes a somewhat wider view of storage problems plaguing enterprise datacenters, as a good deal of recent, high-profile research about storage failures has focused primarily on disk problems.
For instance, last year at FAST '07, Google presented its own study on failure rates (available here in PDF format) based on experiences with 100,000 of its own PATA and SATA disk drives.
The Google study found that drives one year old or less had an annual failure rate of 6 percent, and are at risk from colder temperatures -- while high temperatures can lead to excessive failures in older drives.
That study also focused on the drives' Self-Monitoring, Analysis, and Reporting Technology (SMART) and concluded that the feature -- found in most drives used today -- may not be up to snuff in accurately predicting disk failure. The Google research found that in 36 percent of failed drives, SMART did not flag any problems.
The authors of this year's joint Illinois-NetApp study warned that focusing on drive-related problems can encourage enterprises to undertake unnecessary disk replacements to combat crashes, when failures can just as often be caused by other factors.
Similarly, the study also noted that low disk failure rates do not necessarily translate to a more reliable system.
Intel's Latest Embedded Xeons Join the Penryn Party
Vendors Face Off on Benchmark Approach
|