UCSC-CRL-91-06: ESTIMATING THE RELIABILITY OF HOSTS USING THE INTERNET

03/01/1991 09:00 AM
Computer Science
Modeling the reliability distributed systems, whether through analysis or simulation, requires a good understanding of the reliability of the components. Careful modeling allows highly fault-tolerant distributed data bases and similar applications to be constructed at the least cost. It is often assumed that the failure and repair rates of components are exponentially distributed. This hypothesis is testable for failure rates, though the process of gathering and reducing the data to a usable form can be difficult. By applying an appropriate test statistic, some of the samples were found to have a realistic chance of being drawn from an exponential distribution, while others can be confidently classed as non- exponential. For this study, data were collected from a large number of hosts via the Internet with no special privileges or monitoring facilities. Over 350,000 hosts were considered, and more than 68,000 of these that were judged likely to respond were queried. These hosts were sampled several times over the course of two months to obtain up-times, and finally to determine average host availability. A rich collection of information was gathered in this fashion, allowing estimates of availability, mean-time-to-failure (MTTF) and mean-time-to-repair (MTTR) to be derived. The results reported here correspond with those seen in practice.

UCSC-CRL-91-06