Distributed System Essay

Summary from the Papers:
Cloud computing is the latest evolution of Internet-Based Computing. Public internet spawned private corporate intranets, cloud computing is now spawning private cloud platforms. The database is the critical part of that platform. Therefore it is imperative that our cloud database be compatible with cloud computing.
Key Design principles of the cloud model:
The shared-disk database architecture is ideally suited to cloud computing. It requires fewer and lower cost servers, it provides high availability, reduces

This provides Atomicity, Consistency, Isolation and Durability. Dynamic Scalability has proven to be vexing problem for databases because most databases use Shared-nothing architecture which relies on splitting the data into separate silos of data, one per server. This Partitioning of data is a time-consuming process and decrease database performance if the partitioning is not done carefully to minimize data shipping. Each server must be run at low CPU utilization in order to be able to accommodate spikes in usage for that server’s data. So, need to buy expensive servers to handle the peaks. Each individual node of Shared-nothing Architecture owns a specific piece of data. Take out one server in a shared-nothing database and entire cluster must be shut down.
However, the Shared-Disk Database Architecture is ideal for cloud computing which eliminates the need to partition data and support elastic scalability. This architecture allows clusters of low-cost servers to use a single collection of data, typically served up by a Storage Area Network (SAN) or Network Attached Storage (NAS).
Advantage of Shared-Disk Database Architecture:
a. Fewer servers required: Shared-disk is master-master configuration rather than master-slave configuration. So, each node provides fail-over for the other nodes. This reduces the number of servers required by half.
b. Lower cost servers: It spreads the spikes across the entire cluster. As a result, each system can be run at a higher CPU utilization. One can purchase low-cost commodity server.
c. Scale-in: It enables to allocate and bill customers on the basis of instances of database being run on a multi-core machine.
d. Simplified maintenance/upgrade process: Servers can be upgraded individually while the cluster remains online.
e. High Availability
f. Reduced partitioning and tuning services
g. Reduced support costs
With the advent of high-throughput experimental technologies and of high speed internet connections, generation and transmission of large volumes of data has been automated over the last decade. Modern automated methods for measurement, collection and analysis of data in all fields of science, industry and economy are providing more and more data with drastically increasing complexity of its structure. Representing complex objects by means of simple objects like numerical feature vectors could be understood as a way to incorporate domain knowledge into the data mining process. "Data mining", often also referred to as "Knowledge Discovery in Databases" aims at the automatic interpretation of large datasets. In order to able to identify patterns in data, a step of preprocessing of the data is almost always required. This processing has a significant impact on the runtime and on the results of the subsequent data mining algorithm. In order to manage the huge volume of such complex data, database systems are employed. Thus, databases provide

