01:59 PM
site_id
3. Flexible design - To normalize or denormalize? Since enterprise data warehouses consist of integrated architecture of various subject areas and data domains, they necessarily gravitate more toward normalized schema. Normalization creates a flexible environment that permits rapid accommodation of data in an iterative fashion. New data will always be the reality of such data architectures. Due to the size and versatility of EDW, changes will always occur and should be accounted for in the schema construction.
Normalization helps designers and implementers accommodate changes quickly and in the most economical manner. It also serves as a technique to directly reflect the business process workflow and can, therefore, change as fast as these processes do. Both of these observations express a crucial characteristic of data warehouses: adaptability.
It is generally difficult to anticipate what future requirements will be addressed by the warehouse or how it ultimately will be used. However, it is clear that even with advancements in hardware and RDBMS software, normalization will remain a problem in terms of performance and complexity. It may get costly to optimize and beef up the hardware to the point that performance is not an issue. The complexity of normalized structures also becomes a business issue since the less technically proficient users (business power users) of data and information will find them difficult to navigate and understand.
Therefore, for some time to come, denormalization still will be seen as a way to improve performance and, first and foremost, as a technique that simplifies data relationships for business users. It is also the basis for unifying all standard business views or conformed dimensions around the same base metrics, the same version of the truth. But does the denormalization violate the principles of EDW architecture? Not necessarily. The EDW should be an instrument for organizing data, while integrating and delivering information in the most comprehensive way to all facets of an organization. As a result, denormalization needs to accommodate all types of structures without becoming totally dependent on a certain degree of granularity or native relationships between data.
4. Performance vs. scalability
Some of the operational concerns with an EDW design are performance and scalability. One sign of an enterprise data warehouse's success is its widespread acceptance across the organization. For the most part, quick adoption depends on how much versatile information users can retrieve quickly. Versatile information facilitates large, complex queries. High acceptance means also high frequency of users querying the database concurrently.
Let's consider questions relating to the volume of data and speed of access. The amount of redundant data likely to end up in the EDW will result in slow performance. High volumes of concurrent queries are another cause of poor performance. Measures need to be taken in order to eliminate both of these degrading factors. Those measures could involve changes in the logical architecture, such as denormalization, aggregation and summarizations, as well as modifications to the physical architecture, DBMS and/or hardware-related strategies such as massive parallel processing, shared everything vs. shared nothing, workload distribution, failover protection, etc.
While scalability always is seen as a physical attribute of the EDW, there are some logical attributes that define a highly scalable architecture as well. Those attributes can include enterprise metadata, data quality, match/consolidation, performance monitoring, application check-pointing and the degree of freedom power users have when exploring the data in the warehouses. The EDW solution should be able to accommodate any type of query regardless of its complexity and data volume and bring back high-quality information within a reasonable amount of time. That means it is imperative to rationalize the design across all subject areas, and make it flexible enough to accept changes at any time and at any level of complexity.
To summarize, scalability for EDW means: flexible logical and physical model, query freedom, data quality, rich metadata, high availability system, high-performance loading and querying, support for high concurrency and, last but not least, tight operational control of the entire processes/systems.