Date
Journal Title
Journal ISSN
Volume Title
Publisher
In this thesis, we propose a new distributed diagnosis algorithm using the multilevel paradigm. This algorithm is a generalization of both the ADSD and Hi-ADSD algorithms. We present all details of the design and implementation of this multilevel adaptive distributed diagnosis algorithm called the ML-ADSD algorithm. We also present extensive simulation results comparing the performance of these three algorithms.
In 1967, Preparata, Metze and Chien proposed a model and a framework for diagnosing faulty processors in a multiprocessor system. To exploit the inherent parallelism available in a multiprocessor system and thereby improving fault tolerance, Kuhl and Reddy, in 1980, pioneered a new area of research known as distributed system level diagnosis. Following this pioneering work, in 1991, Bianchini and Buskens proposed an adaptive distributed algorithm to diagnose fully connected networks. This algorithm called the ADSD algorithm has a diagnosis latency of O(N) testing rounds for a network with N nodes. With a view to improving the diagnosis latency of the ADSD algorithm, in 1998 Duarte and Nanya proposed a hierarchical distributed diagnosis algorithm for fully connected networks. This algorithm called the Hi-ADSD algorithm has a diagnosis latency of O(log2N) testing rounds. The Hi-ADSD algorithm can be viewed as a generalization of the ADSD algorithm.
In all cases, the time required by the ML-ADSD algorithm is better than or the same as for the Hi-ADSD algorithm. The performance of the ML-ADSD algorithm can be improved by an appropriate choice of the number of clusters and the number of levels. Also, the ML-ADSD algorithm is scalable in the sense that only some minor modifications will be required to adapt the algorithm to networks of varying sizes. This property is not shared by the Hi-ADSD algorithm. The primary application of our research is to develop and implement a prototype network fault detection/monitoring system by integrating the ML-ADSD algorithm into a SNMP-based (Simple Network Management Protocol) fault management system. We report the details of the design and implementation of such a distributed network fault detection system.