The BI InfoCube: An Extended Star Schema
SAP Street discusses the database design for InfoCubes, which are the primary InfoProvider for BI. The lesson is broken down into three sections. The first section covers the basic industry-standard star schema (on which InfoCubes are loosely based). The second section showcases our BI's version, an extended star schema. The last section discusses creating your own BI star schema InfoCube, also known as an InfoCube.
Your organization has defined the requirements for a multidimensional data model to support the needs of its cost center managers. After analyzing the business content that was delivered with BI, the project group decides to define a new InfoCube named Cost Center Accounting. This new InfoCube will integrate the new InfoObjects you created earlier. Before you create an InfoCube, you want to understand the underlying database model in a more technical way.
OLTP: A Highly Normalized Relation Database Schema
Almost all OLTP system are designed using a highly normalized relational schema. In reality, many more tables are involved, especially in sophisticated systems like mySAP ERP (formally SAP R/3). A very simplified example of a sales order is shown on the following figure. As described in the figure, normalization is the process of removing repeated data from a table to auxiliary connected tables, there by making the original table much smaller. This decrease in size, combined with only the most basic indexing scheme, helps when creating, updating, and deleting records. The price paid for this advantage is a decrease in performance for analysis-type queries.
Click here for Training in SAP BI
Normalized OLTP Database Schema
The Classic Star Schema: The EDW Database Schema
Multidimensional data models are needed for the creation of Enterprise Data Warehousing or OLAP applications, in other words, for analytical applications. The problems with OLTP's normalized design preclude it being used to support complex, ad hoc data analysis. The classic star schema, as illustrated in the following figures, is the most frequently used multidimensional model for relational databases. This database schema classifies two groups of data: facts (sales amount or quantity, for example) and dimension attributes (customer, material, or time, for example). Facts, sometimes called measures, are the focus of the analysis for a business process.
The fact data (values for the facts) is stored in a highly normalized fact table. The values of the dimension attributes are stored from a technical perspective, in various denormalized dimension tables. From a business perspective, these tables are collectively revered to as dimensions of the business process, or for short, dimensions. Here, logically related dimension attributes are stored as a hierarchy (parent-child relationships) within the dimension table. The dimension tables are linked relationally with the central fact table by way of key relationships. In the star schema design shown, the key of the dimension tables is a machine-generated dimension key (DIM ID) that uniquely defines a combination of dimension attribute values. The DIM ID (a sequentially assigned number) is a foreign key in the fact table. In this way, all data records in the fact table can be uniquely identified.
Note: The star schema shown below is not the most basic. The only difference between this star schema and a very basic on is that the more basic design would use the master data ID itself as the key to the dimension tables. Although this design is valid, it is not optimal, as records with null for a master data key value could not be easily processed.
Classic Star (Sequential Assigned Machine Generated DIM)
In the next figure, the business process changes to cost center transactions as opposed to sales data, and we add in a few details from the previous lessons on InfoObjects. This results in a functional view of the BI star schema. The nice thing about this design is that it works for all areas of the business.
Star Schema: Functional View
The following section explains the star schema in greater detail using the cost center transaction example from the figure above.
Populated Dimension Table: Cost Center Transactions
In a dimension table, any number of semantically related dimension attributes are stored in a hierarchy (parent-child relationship as a 1:N relationship). In the figure, the time dimension tables are made up of the dimension attributes Fiscal Year and Period/Year. If an M:N relationship exists between dimension attributes, they are normally stored in different
Another example with reference to our sales InfoCube might be the product dimension, which contains individual products. In most companies, products are grouped into product lines and sub-product lines or categories (for example, product X is a chocolate bar, which is part of the candy product line in the food category). This is just a simple example; the important point is that the fields category, product line, and others would be in the dimension table.
In a classic star schema, a dimension attribute can possess any number of described attributes, also called non-dimension attributes, which can be used as supplemental information sources. Described attributes always have a 1:1 relationship with the dimension attribute. In the first figure, .material name. is the described attribute for the dimension attribute material in the material dimension table.
A dimension attribute/described attribute consists of any number of values. For example, .hardware and software are assigned to the dimension attribute material group., and the values .monitor.and keyboard. are assigned to the described attribute material name..
Semantically speaking, the dimension tables in the classic star schema are often referred to as dimensions. A dimension (perspective) describes a possible user's (decision-maker's) view of the facts.
- Each classic star schema consists of one or more dimension tables.
- Each dimension table has a primary key, called the dimension key. In this example it was a machine-generated sequential number. In other, more basic designs, this key is determined by the dimension attribute with the highest level of detail. If we had shown a more basic version of the sales star, the product ID would have been the key to the product dimension. The dimension tables are linked to the central fact table by way of key relationships.
- The dimension tables are fully denormalized. In other words, repeated information is not broken out to different tables like in our OLTP example.
Note: From an OLTP perspective, the values of the dimension attributes or described attributes correspond to master data.
The Fact Table and the Completed Star Schema
- The fact table serves as the central table in the star. The measures or facts . or, inBI terms, the key figures . are aggregated via the fact table.
- Each classic star schema is made up of exactly one fact table.
- The fact table contains the fact data. It contains the facts sales with the fact data (50,000, 3,000, 100,000, and so on) and quantity with the fact data (100,60,250, and so on).
- The central fact table is connected to the surrounding dimension tables via a unique key. The primary key of the fact table is made up of all dimension keys (foreign keys). In the figure of the populated star schema, below, the primary key of the fact table is made up of the dimension keys Cost_Center_DIM_ID, COST_ELEMENT_DIM_ID, and TIME_DIM_ID. The result is that all data records (and so all fact data) in the fact table can be identified uniquely. In the diagram below, the fact data (50,000/ 100) is uniquely identified with the value combination (1, 1,1) of the dimension keys.
- The fact table is highly normalized.
Note: From an OLTP perspective, fact data corresponds to transaction data.
The figure above shows how the dimension tables and fact tables are arranged in a star formation. It also shows the connections between the denormalized dimension tables and the highly normalized fact table.
Populated Cost Center Transaction Star Schema
Storing data in the classic star schema optimizes it for reporting. It allows the user to view facts from a variety of perspectives (dimensions). A user may be interested in finding answers to the following questions:
- Who was responsible for the most Travel-Hotel expenses incurred?
- What Cost Element (Travel, Salaried Payroll, or Depreciation) is the largest expense we have?
- What are the total expenses for all the cost centers combined?
- When did the costs occur?
The BI Schema: An Extended Star Schema
Although the figure above is a functional definition of a star schema, from the BI system perspective it not complete. The complete BI schema on which we base much of the EDW is a much enhanced (refined) star schema. The improvements eliminate both technical and business-reporting problems experienced with the classic star schema.
Note: In this class we do not focus on all the specific tables of BI Extended Star schema. Our perspective is quasi technical, just enough to understand the advantages and be able to perform our back end administration tasks.
Before we can delve into the schema of a BI InfoCube, we need to review characteristics InfoObjects. We want to focus on master-data-bearing characteristics InfoObjects. The figure below shows two of the many master-data-bearing characteristics delivered by BI. Although characteristics InfoObjects are the attribute fields on the dimension tables, the characteristics that have their own master data tables connected to them are very important in our overall schema design.
Master-Data-Bearing Characteristics InfoObjects
Note: The system built an SID table when you activated your COSTC# InfoObject. An SID (master data ID) table is a table that contains a system-generated sequential number that links to a characteristic value.
The following figure shows the crossover between the classic star schema and the BI star schema, using cost center transactions as an example. Notice how a master-data-bearing characteristic used in the dimensions of the InfoCube is linked to its master data.
BI InfoCube: An Extended Star Schema
In the figure above, you can see how the BI star schema is an enhancement of the classic star schema. The enhancement comes from the fact that the dimension tables do not contain master data information. This master data information is stored in separate tables, called master data tables. We can say that the master data tables .extend. the star schema, resulting in an extended star schema.
Click here for Training in SAP BI
BI InfoCube (InfoCube)
InfoCubes are the central objects of the multidimensional model in BI. Most BEx reports and analyses are based on these. From a reporting perspective, a InfoCube describes a self-contained data set within a business area, for which you can define queries.
A InfoCube consists of a quantity of relational tables arranged multidimensionally, meaning that it consists of a central fact table surrounded by several dimension tables. SID tables link these dimension tables to their respective master data tables.
Hint: There are various types of InfoCubes in BI. The InfoCube with type InfoCube is the InfoCube most relevant for modeling discussions, since physical database objects (objects that contain data) are the core of your BI project.
A Bigger Example of an Extended Star Schema
The figure above shows more dimensions and more master data. As we discussed in the first section of this lesson, the facts in the fact table are referred to as key figures and the dimension attributes are known as characteristics. The dimension tables are linked the central fact table by way of key relationships. In contrast to the classic star schema, characteristics are not components of the dimension tables; in other words, the characteristic values are not stored in the dimension tables. A numerical SID key is generated for each characteristic.
This alias key replaces the characteristic as the component of the dimension table. Here, SID stands for Master Data ID or Surrogate ID (replacement key). In the Master-Data-Bearing Characteristic InfoObjects figure, this these keys are given the prefix SID_. For example, SID_Cost_Center is the SID key for the characteristic Cost Center . The dimension table has a generated numerical primary key, called the dimension key. Dimension tables are denoted with the prefix DIM_ID_. Here, DIM_ID_CostCenter is the dimension key for the Cost Center dimension. As in the classic star schema, the primary key of the fact table is made up of dimension keys.
The customer (you) can define up to 13 dimension tables for a InfoCube, andmust define at least one. SAP provides three dimension tables for a total maximum of 16. The Package (DIM_ID_DATAPAKET), Time (DIM_ID_TIME), and Unit ('DIM_ID_UNITS) are the three supplied by SAP. The time of a transaction is necessary to add meaning, so the Time dimension is a required. The Unit dimension stores the unit of measure or currency of the key figures, again a critical piece of information. Finally, the Package dimension is used as is a technical
A Complete BI InfoCube: Extended Star Schema
One special aspect of the master data is that is shared (linked) with all InfoCubes that have the associated characteristic InfoObject as part of a dimension. This was made possible by removal of master data from the dimension tables using SID technology to create the links. The result is that you to use the master data with different BI InfoCubes. In other words, the master data is InfoCube-independent, and can be used by several queries off several different InfoCubes at the same time. This concept is shown in the figure below.
Shared Master Data Across Basic InfoCubes
Classic Star Schema Compared to the BI Star Schema
First, let us compare terminology for the two schemas.
Terminology Comparison: Basic Star vs. BI Extended Star
Note: The table shows how dimensions in BI not only consist of dimension tables, but also contain master data and SID tables.
Advantages and Disadvantages of the Classic Star Schema
- Data access performs fairly well due to the small number of joining operations (there are only join operations between the fact tables and the involved dimension tables).
Click here for Training in SAP BI
- Redundant entries exist in the dimension tables.
- In contrast to the historization (how time is modeled) of fact data (the time reference is given implicitly via the time dimension table), historization of dimensions (slowly changing dimension) is not easy to model.
- The multilingual capability is cumbersome.
- Modeling some hierarchy types in a dimension (parallel and imbalanced hierarchies, for example) can lead to anomalies.
- Query performance is lessened, since aggregates and Basis fact data are stored in the same table (fact table).
Advantages of the BI Star SchemaThanks to the SIDs, the link to the master data from the dimension tables, the following modeling possibilities exist:
- Easy modeling of slow-moving dimension (time dependant master data)
- Multilingual capability
- Cross-cube use of master data (similar to shared dimensions)
- Ability to handle null values for a characteristic
- The use of automatically generated INT4 keys (SID keys and DIM ID keys) enables faster access to data than via long alphanumeric keys (all of our big tables are 100% numbers).
Hint: Another enhancement to the classic star schema is the summarization of aggregated key figures in their own fact tables by the construction of aggregates that were previously not taken into account in the original BI star schema design.