MTBF, Reliability, and Availability Prediction Workbooks for Redundant Systems
Larry George, Problem Solving Tools, Oct. 22, 2102
The traditional partscount MTBF (Mean Time Between Failure) prediction is
MTBF = 1 / S (part failure rates)*(fudge factors),
where the sum is over all parts [MILHDBK217F Appendix A, Bellcore, Telcordia, and others]. Such MTBF predictions tacitly assume:
Series systems¾failure of any part causes system failure
Independent lives¾P[Series system life > t] = P P[Life of part j]
Constant failure rate = 1/MTBF or P[Life > t] = Exp[t/MTBF]
Despite criticism of and the lack of credibility of MTBF predictions [George, June 2001; and Jones and Hayes], people continue to compare MTBF predictions. So let’s make MTBF, reliability, and availability predictions consistently and correctly, using established probability methods, with redundancy.
The traditional partscount MTBF prediction is incorrect if the system has redundancy, because subsystem failure rates are not constant and reliability is not an exponential function. Telcordia (Bellcore) TR332 provides no help; it says, “If the serial model is inappropriate, a suitable reliability model must be developed.” We can still predict MTBF, reliability, and availability, without the series and independence assumptions.
A redundant system usually provides greater reliability and availability than a series system. Redundancy methods include:
Parallel¾usually independent and identical parts
Koutofn¾at least k out of n parts must work for system to work
Cold standby¾parts are nominally in parallel, but their lives are sequential and, if lucky, parts don’t deteriorate in standby
Networks¾more elaborate arrangements than parallel or koutofn, such as a Wheatstone bridge (distributed processing seems equivalent to parallel processing, with overhead in series with the parallel processors)
Can we simply eliminate redundant systems from an MTBF prediction, because redundancy improves reliability so much? You can, but, although koutofn systems have a lower failure rate than n times the part failure rate—that’s what the traditional, partscount failure rate prediction would be— they have higher failure rates than the part failure rate itself. Don’t eliminate redundant subsystems from MTBF predictions; even though they usually make systems more reliable, redundant subsystems may fail. Use the workbooks described in this article.
Could we add the failure rates (1/MTBF) of redundant subsystems in series to obtain system MTBF predictions? [NASA] We could, but unfortunately, the MTBF and reliability will be wrong and not necessarily conservative. The failure rate of a redundant subsystem isn’t constant, 1/MTBF, even if the parts have constant failure rates. How should you make an MTBF prediction for a redundant system in series with another system, such as the system in figure 1? Don’t use the MTBF formula for two systems in series, 1/(1/MTBF1+1/MTBF2), unless both systems 1 and 2 have constant failure rates. Use the workbooks…
Could we use the availability formula MTBF/(MTBF + MTTR) for redundant systems? We could, but the real availability depends on age as well as redundancy. Even though you plug in the redundant system MTBF formula, the availability is incorrect, because repair time may not be exponentially distributed and because that formula doesn’t take redundancy into account. Use the workbooks…
WORKBOOK SOFTWARE
The workbooks are Excel spreadsheets and VBA software to compute and simulate (when necessary) MTBF, reliability, availability and the standard deviations of these parameter estimates induced by simulation. Spreadsheets allow parameter and reliability block diagram inputs, because spreadsheet cells resemble block diagrams. Figure 1 shows a series system where subsystem 1 consists of 5 parts in parallel, one of which is required, subsystem 2 consists of 4 parts in parallel, 2 of which are required, and so on. The FITs of subsystems 5 and 6 could have been added to make a single subsystem in series with the other, redundant subsystems. Simply enter FITs to indicate presence of a part, starting from the left. Specify how many subsystem parts are required. FITs don’t have to be the same within subsystems. The spreadsheet computes the remaining entries. (FITS stands for Failures In Thousands (of millions of hours); i.e., one FIT = failure rate per 1E9 hours (billion hours). It is 1000 times the failure rate per million hours.)
Part 
Subsys 1 
Subsys 2 
Subsys 3 
Subsys 4 
Subsys 5 
Subsys 6 

1 
100 
2000 
2593 
3000 
3200 
2400 
<FITs 
2 
100 
2000 
2593 
3000 
<FITs 

3 
100 
2000 
<FITs 

4 
100 
2000 
<FITs 

5 
100 
<FITs 

6 
<FITs 

7 
<FITs 

8 
<FITs 

K out of N 
1 
2 
1 
1 
1 
1 
<Enter 
N 
5 
4 
2 
2 
1 
1 
Computed 
Nsubsys 
6 
Computed 

Nseries 
TRUE 
Computed 
Figure 1. Sample input to SeriesParallel workbook
Although the reliability statistics for parallel and koutofn systems has been known for many years, [Klion is cited in MILHDBK217], the mathematics for combining such subsystems is not common [George, Dec. 2001]. There are at least five web sites that do it incorrectly: three belong to NASA.
The workbook computes MTBF, reliability, and availability from 1000 simulations. If you want more simulation runs, run the VBA macro. There are two versions of this workbook, one for seriesparallel systems such as shown in figure 1, and one for parallelseries systems which replicate systems like figure 1 in parallel. If you have other needs, we will try to provide the spreadsheets and VBA software to satisfy them. Contact pstlarry@yahoo.com for the workbook.
References
George, L. L., “MTBF Versus AgeSpecific Reliability Prediction,” ASQ Reliability Review, Vol. 21, No. 2, pp 1315, June 2001
George, L. L., “MTBF Prediction for Redundant Systems,” ASQ Reliability Review, Vol. 21, No. 4, Dec. 2001
Jones, Jeff and Joseph Hayes, “A Comparison of ElectronicReliability Prediction Models,” IEEE Trans. on Reliability, Vol. 48, No. 2, pp. 127–134, June 1999
Klion, J., “A Redundancy Notebook,” RADCTR77287, AD A050837, 1977
MILHDBK217F, Reliability Prediction of Electronic Equipment, U.S. Department of Defense, Washington, DC, Notice 1, July 1992
NASA, “Active Redundancy,” NASA Preferred Reliability Practices No. PDED1216, http://www.hq.nasa.gov/office/codeq/relpract/n1216.pdf, NASA Headquarters, Washington, DC