What Is Data Warehouse Architecture?
Data warehouse, the English name is Data Warehouse, which can be abbreviated as DW or DWH. A data warehouse is a strategic collection that provides all types of data support for decision-making processes at all levels of an enterprise. It is a single data store created for analytical reporting and decision support purposes. Provide guidance to business process improvement, monitoring time, cost, quality, and control for businesses that need business intelligence. [1]
database
- To divide the business of the entire unit, generally according to the division of business units, define the business work between the various parts, and clarify the relationship between the business units.
- Gain insight into and programmatically specific business processes within each business unit.
- Propose methods to modify and improve the workflow of business units and programmatically.
- The scope of data modeling is defined, and the goals and phases of the entire data warehouse project are divided.
- Extract and abstract key business concepts.
- Group business concepts and aggregate similar grouping concepts according to business lines.
- Refine grouping concepts, clarify and abstract business processes within grouping concepts.
- The relationship between grouping concepts is clarified to form a complete domain conceptual model.
- Materialize business concepts and consider their specific attributes
- Materialize the event and consider its content
- Explain materialization and consider its content
- Make corresponding technical adjustments for specific physical platforms
- Make adjustments to specific platforms based on model performance considerations
- According to the needs of management, make corresponding adjustments in combination with specific platforms
- Generate the final execution script and refine it.
- Data Warehouse, the English name is Data Warehouse, which can be abbreviated as
- 1. The data warehouse is subject-oriented; the data organization of the operational database is oriented
- In the context of information technology and data intelligence, data warehouses provide many cost-effective computing resources in the areas of software and hardware, Internet and intranet solutions, and databases. Data access technology.
- Open system technology makes the cost of analyzing large amounts of data reasonable, and the hardware solutions are more mature. The main technologies used in data warehouse applications are as follows:
- parallel
- Computing hardware environment, operating system environment, database management system and all related database operations, query tools and technologies, applications and other fields can benefit from the latest parallel achievements.
- Partition
- Partitioning makes it easier to support large tables and indexes, while also improving data management and query performance.
- data compression
- The data compression function reduces the cost of disk systems that are often required in data warehouse environments to store large amounts of data. New data compression technologies have also eliminated the negative impact of compressed data on query performance. [1]
- From database to data warehouse
- Enterprise data processing is roughly divided into two categories: one type is operational processing, also known as online transaction processing, which is the daily operation of specific business online in the database, usually querying and modifying a few records. The other is analytical processing, which generally analyzes historical data on certain topics to support management decisions.
- The two have different characteristics, which are mainly reflected in the following aspects.
- 1. Processing performance
- The daily business involves frequent and simple data access, so the performance requirements for operational processing are relatively high, and the database needs to be able to respond in a short time.
- Data integration
- The operational processing of enterprises is usually scattered, and the application-oriented nature of traditional databases makes data integration difficult.
- 3.Data update
- Operational processing is mainly composed of atomic transactions, data is updated frequently, and parallel control and recovery mechanisms are required.
- 4.Data time limit
- Operational processing mainly serves daily business operations.
- 5.Data synthesis
- Operational processing systems usually have simple statistical functions.
- Data warehouses have the power to transform business. It helps companies gain insights into customer behavior, predict sales trends, and determine the profitability of a group of customers or products. Nevertheless, the realization of a data warehouse is a long and risky process. A web survey published by DM Review found that 51% of respondents believe that the number one obstacle to creating a data warehouse is the lack of accurate data. The most important point is that all data cannot be updated in real time.
- There are six guiding principles that can help companies quickly implement a data warehouse plan and evaluate its process:
- Simplify requirements collection and design.
- It is often difficult for companies to determine what data is important and what prevents them from leveraging valuable unstructured information to drive critical business processes. The organization should check whether the IT manager has a good understanding of the business plan and the information needed to support the plan. Where is the source data? What transformations are needed to make them useful for critical applications?
- Support business and IT user collaboration.
- Incomplete, outdated or inaccurate data can lead to a lack of trusted information. Note that the company has a business glossary for users to view, use for collaboration, and adjust based on their collective business perspective?
- Avoid costly low-level errors and rework.
- Is it clear that the company has an implementation strategy with well-defined data models and applications that provide information?
- Identify matching information and create a single view.
- Multiple versions of the same fact can cause problems managing user, product, and partner relationshipsincreasing the risk of regulatory compliance violations.
- Convert and publish using the fastest and most scalable method.
- Is it clear that the company has an automated process that can take advantage of parallel processing and reuse previous conversions? Can company systems publish data to users and applications on demand in a timely manner?
- Extend information accessibility through information services.
- Is it clear that companies can really use information as common property? Can IT professionals keep this property and make it available to authorized persons? Can the information be released to the right place and in the right situation at the right time? [3]
- Different from the general online transaction processing (OLTP) system, the data model design is the foundation of a data warehouse design. At present, the two mainstream theories are the data model design using a normalized approach or a dimensional approach. Data models can be divided into logical and physical data models. The logical data model states the relationship of business-related data. It is basically a structure design that is not related to the database. It is usually designed in a formal way. The main spirit is to formulate the subject area model from the perspective of the business field of the enterprise and then gradually Dive into entities and attributes, and will not consider future adoption when designing
- 1) Choose the right topic (area of the problem to be solved)
- 2) Clearly define the fact table
- 3) Determine and confirm the dimension
- 4) Select
- The data modeling of the data warehouse is roughly divided into four stages:
- 1. Business modeling , this part of the modeling work, mainly includes the following parts:
- 2. Conceptual modeling of the domain . This part of the modeling work mainly includes the following parts:
- 3. Logical modeling , this part of the modeling work, mainly includes the following parts:
- 4. Physical modeling , this part of the modeling work, mainly includes the following parts:
- Each company has its own data. In addition, many companies store a large amount of data in computer systems, recording a large amount of information in the purchase, sales, and production processes of enterprises and customer information. These data are usually stored in many different places.
- After using the data warehouse, the enterprise stores all the collected information in a single place-the data warehouse. The data in the warehouse is organized in a certain way, making the information easy to access and useful.
- Some specialized software tools have been developed to make the process of data warehouse semi-automated, helping companies import data into the data warehouse and use the data that has been stored in the warehouse.
- Data warehouses have brought tremendous changes to the organization. The establishment of the data warehouse has brought some new
- In the early days of computer development, the idea of building a data warehouse had been proposed. The term "data warehouse" was first introduced in 1990 by Mr. Bill Inmon, and it is described as follows: A data warehouse is a collection of data specially designed and built to support business decisions.
- Enterprises build data warehouses to fill existing
- About decision support databases
- The connection between the two:
- The emergence of data warehouses is not intended to replace databases. Most data warehouses still use
- Sybase-IQ
- Oracle-Oracle Database / Oracle Exadata
- TeraData-TeraData
- IBM-Red Brick
- Netezza-Netezza TwinFin
- NEC-InfoFrame DWH Appliance
- Microsoft-Microsoft SQL Server
- Pivotal-Greenplum