Failures Of A Distributed System Essay

Failures of a Distributed System
July 25, 2013

In the words of Adam Savage from Mythbusters, "failure is always an option". This holds true when talking about a distributed system, which is a computer network like a Wide Area Network (WAN) or a Local Area Network (LAN). Distributed systems is defined as a software system in which components located on networked computers communicate and coordinate their actions by passing messages (Coulouris, Dollimore, Kindberg, & Blair, 2012). This allows the computers or even devices like smart phones and tablets, to share resources like printers, hard drives, and even internet access. A centralized

To fix the problem of a hard disk drive not responding due to it being unplugged from the network, the system admin would check to see if he could see the drive on one of the networked computers. If the drive is not showing up or is non-responsive, he or she should first check all cable connections and plug in any that are disconnected. If no cables are unplugged and the drive is still not accessible, it might be a dead drive and should be replaced. To get the data off the drive if not backed up the system admin or tech could send it to a specialty location to do data recovery.
The next form of failure is a failure of link. This occurs when a link connection cannot be made on the system. Links are what the connections between computers are called; the internet is a huge link connection. Failure of link, as with hardware failure is caused by many things: broken router, DHS is incorrectly defined, IP blocking or mismatch, to something as simple as the devices wireless card is turned off. When a failure of link occurs, the devices on the system will not “talk” to each other. To troubleshoot this the user, tech, or system administrator should first check that the device is on, and the cable or wireless connection is working properly. If that does not resolve the issue, the next step would be to check that hardware such as modem or router to ensure that they are working properly. After that step is completed, and they are still not able to connect, they should check the router and device settings to make sure that everything is configured correctly to connect to the network. This type of fault can only happen on a distributed system, since as stated before a centralized system is stand-alone.
The third type of failure is message lost. Message lost means that one computer has sent a data, or a message to another computer and the second computer did not receive it. This can be a minor problem if the data was not important, or a major one if sending important information from one computer to the next. This type of failure can only happen on a distributed system. To explain this; in a distributed system, computer A will send a message to computer B. Computer A is asking computer B if it's responsive. Since the message was lost,

