Running head: SHORT TITLE OF PAPER (<= 50 CHARACTERS)
Data Warehousing and Data Mining
March 19, 2012.
Data mining is a process of numerical analysis. Analysts use technical tools to query and sort through terabytes of data looking for patterns. Usually, the analyst will develop a hypothesis, such as customers who buy product X usually buy product Y within six months.
Running a query on the relevant data to prove or disprove this theory is data mining.
Data warehousing describes the process of designing how the data is stored in order to improve reporting and analysis. Data warehouse experts consider that the various stores of data are ...view middle of the document...
However, if the data warehouse expert designs a data storage system that closely connects relevant data in different databases, the data miner can now run much more meaningful and efficient queries to improve the business.
Selection of architecture will determine, or be determined by, where the data warehouses and or data marts themselves will reside and where the control resides. For example, the data can reside in a central location that is managed centrally. Or, the data can reside in distributed local and/or remote locations that are either managed centrally or independently. The architecture choices we consider in this book are global, independent, interconnected, or some combination of all three. The implementation choices to be considered are top down, bottom up, or a combination of both. It should be understood that the architecture choices and the implementation choices can also be used in combinations.
For example, data warehouse architecture could be physically distributed, managed centrally, and implement from the bottom up starting with data marts that service a particular workgroup, department, or line of business. A global data warehouse is considered one that will support all, or a large part, of the corporation that has the requirement for a more fully integrated data warehouse with a high degree of data access and usage across departments or lines-of-business. That is, it is designed and constructed based on the needs of the enterprise as a whole. It could be considered to be a common repository for decision support data that is available across the entire organization, or a large subset thereof. A common misunderstanding is that a global data warehouse is centralized.
The term global is used here to reflect the scope of data access and usage, not the physical structure. The global data warehouse can be physically centralized or physically distributed throughout the organization. A physically centralized global warehouse is to be used by the entire organization that resides in a single location and is managed by the Information Systems department. A distributed global warehouse is also to be used by the entire organization, but it distributes the data across multiple physical locations within the organization and is managed by the IS department. When we say that the Information Systems department manages the data warehouse, we do not necessarily mean that it controls the data warehouse. For example, the distributed locations could be controlled by a particular department or line of business. That is, they decide what data goes into the data warehouse, when it is updated, which other departments or lines of business can access it, which individuals in those departments can access it, and so forth. However, to manage the implementation of...