How To Calculate A Rackmount Computer Failure Rate
by Michael Bowling, on Jan 3, 2018 10:51:38 AM
When sourcing a rackmount computer supplier for your embedded computing components, especially when quality, longevity and the assurance that the components were designed, built, integrated and supported in the USA are important factors, an important consideration is the failure rate of those embedded computing products.
Sure, a supplier may tout their low Mean Time to Repair or high Mean Time Between Failure, but those figures by themselves paint an incomplete picture of the quality of the components that are being deployed into mission-critical applications.
Ideally, a rugged computer system would never fail or need repair, but entropy dictates otherwise and truly mission-critical systems can’t suffer even a near-instantaneous MTTR. What you need to know to paint a complete picture is the failure rate.
“The median brand-failure rate by the third year of ownership is 18 percent.” – Consumer Reports
Just from this one summary sentence from Consumer Reports you can see that getting to an actual failure rate is not as simple as it may sound. You are probably saying to yourself, “If I buy 100 computers how many of them should I expect to fail?” You may also be saying to yourself, “18%! That’s very high!”
You will also see computer manufacturers advertise a failure rate that feels really low compared to your personal experience. Maybe you’ve been buying from a supplier and you inherently know that there is no way they have a 1.99% failure rate. There are many ways rackmount computer manufacturers can manipulate the numbers in their favor (“numbers don’t lie, but liars use numbers”).
Here are just a few things to lookout for:
- The sum is greater than the whole
“Our motherboards have a 1.99% failure rate.” This may be true, but dig a little deeper. The power supplies have a 1.45% failure rate, the memory/RAM has a 0.81% failure rate, the hard drives have a 1.53% failure rate, etc. Eventually you end up with a median failure rate of 18%.
- Exclude anything that is potentially not their fault
If you return a system and there is a component failure (i.e. capacitor, hard drive, etc.) that failed…is that the capacitor or hard drive supplier’s fault or your computer supplier’s fault? You probably don’t care whose fault it is…you would count it as a failure. Believe it or not many suppliers exclude component failures like these from their statistics that they consider out of their control.
- Highlight one product
Our shiny, new product (that we’ve only shipped 100 units so far) has only had 1 failure. That’s an impressive 1% failure rate. Given time and more shipments that 1% failure rate will likely increase.
- Rolling averages
Failures can sometimes take time to happen and computer suppliers know that. They may just look at the failures that have been reported in the last month. Maybe they have shipped a large quantity of computers for Christmas and the systems have not had time to get returned yet.
- Customers give up and just don’t take the time to report
By making your tech support as slow and painful as possible the natural result is that customers don’t want to waste their time to report every failure. I recently bought the laptop that I’m typing on right now from a major computer manufacturer and right out of the box there were several small problems that I don’t want to waste my time reporting because I know the supplier will likely not be very responsive.
There will always be more innovative ways to manipulate the numbers. The bottom line is that 18% industry average failure rate is simply unacceptable.
So, what can anyone do about the problem?
1. If you have quality issues with your supplier see how they react to your demands. Ask for monthly reports. Ask to work with their engineers directly. Talk directly with their Quality Manager or their Operations Manager. Sometimes a supplier honestly makes a mistake, but they should be proactive to resolve the issue if they take your business seriously. The supplier’s goal should be to turn you into an advocate for their brand and how they resolve problems should go a long way in earning your trust!
2. Ask for a single point of contact. If they value your business you should not just be placed next in the queue. This provides a more holistic, long-term, relational partnership with your supplier.
3. Ask to see a copy of their ISO 9001 quality certificate and make sure it’s up to date. This ensures that they have a documented procedure for reporting failures that could turn into trends, increasing the likelihood that a product or procedural deficiency will be discovered and rectified.
4. Don’t simply ask your computer supplier what their average failure rate is…ask them how they calculate that number.
5. At the end of the day you should vote with your pocketbook. There are computer suppliers out there that take pride in their work and value quality over quantity. These companies tend to be more engineering and production focused rather than focused on their board members and stock prices.
When deciding who to partner with as an industrial computer supplier for your mission-critical embedded computing needs, remember these steps to ensure the lowest failure rate possible. Failures will occur in electrical components, its inevitable. To minimize those failures, however, you should ensure you are getting the most accurate data about particular component failure rates, in addition to adherence to the strictest design and manufacturing quality standards possible. This will allow you to effectively manage your lifecycle/spares/obsolescence programs and ensure your embedded computing project is properly funded and reaches full maturity without issue.