The star and snowflake schema are logical storage designs commonly found in data marts and data warehouse architecture. While common database types use ER (Entity-Relationship) diagrams, the logical structure of warehouses uses dimensional models to conceptualize the storage system.
Read on to find out the differences, characteristics, and flaws of the star and snowflake schemas.
Star Schema vs. Snowflake Schema: The Main Difference
The two main elements of the dimensional model of the star and snowflake schema are:
1. Facts table. A table with the most considerable amount of data, also known as a cube.
2. Dimension tables. The derived data structure provides answers to ad hoc queries or dimensions, often called lookup tables.
Connecting chosen dimensions on a facts table forms the schema. Both the star and snowflake schemas make use of the dimensionality of data to model the storage system.
The main differences between the two schemas are:
|Fact table Dimension tables
|Fact table Dimension tables Subdimension tables
|One table per dimension
|Multiple tables for each dimension
|Uses more storage
|Uses less space
|Denormalized dimension tables
|Normalized dimension tables
|Fast, fewer JOINs needed because of fewer foreign keys
|Slow, more JOINs required because of more foreign keys
|Simple and easier to understand
|Complicated and more challenging to understand
|Dimension tables with several rows, typical with data marts
|Dimension tables with multiple rows found with data warehouses
Due to the complexity of the snowflake schema and the lower performances, the star schema is the preferred option whenever possible. One typical way to get around the problems in the snowflake schema is to decompose the dedicated storage into multiple smaller entities with a star schema.
What Is a Star Schema?
A star schema is a logical structure for the development of data marts and simpler data warehouses. The simple model consists of dimension tables connected to a facts table in the center.
The facts table typically consists of:
- Quantifiable numerical data, such as values or counts.
- References to the dimensions through foreign keys.
The lookup tables represent descriptive information directly connected to the facts table.
For example, to model the sales of an ecommerce business, the facts table for purchases might contain the total price of the purchase. On the other hand, dimensional tables have descriptive information about the items, customer data, the time or location of purchase.
The star schema for the analysis of purchases in the example has four dimensions. The facts table connects to the dimensional tables through the concept of foreign and primary keys. Apart from the numerical data, the facts table therefore also consists of foreign keys to define relations between tables.
Characteristics of a Star Schema
The main characteristics of the star schema are:
- Simplified and fast queries. Fewer JOIN operations due to denormalization make information more readily available.
- Simple relationships. The schema works great with one-to-one or one-to-many relationships.
- Singular dimensionality. One table describes each dimension.
- OLAP friendly. OLAP systems widely use star schema to design data cubes.
Drawbacks of a Star Schema
The disadvantages of using the star schema are:
- Redundancy. The dimensional tables are one-dimensional, and data redundancy is present.
- Low integrity. Due to denormalization, updating information is a complex task.
- Limited queries. The set of questions is limited, which also narrows down the analytical power.
What Is a Snowflake Schema?
The snowflake schema has a branched-out logical structure used in large data warehouses. From the center to the edges, entity information goes from general to more specific.
Apart from the dimensional model's common elements, the snowflake schema further decomposes dimensional tables into subdimensions.
The ecommerce sales analysis model from the previous example further branches ("snowflakes") into smaller categories and subcategories of interest.
The four dimensions decompose into subdimensions. The lookup tables further normalize through a series of connected objects.
Characteristics of a Snowflake Schema
The main features of the snowflake schema include:
- Small storage. The snowflake schema does not require as much storage space.
- High granularity. Dividing tables into subdimensions allows analysis at various depths of interest. Adding new subdimensions is a simple process as well.
- Integrity. Due to normalization, the schema has a higher level of data integrity and low redundancies.
Drawbacks of a Snowflake Schema
The weaknesses of the snowflake schema are:
- Complexity. The database model is complex, and so are the executed queries. Multiple multidimensional tables make the design complicated to work with overall.
- Slow processing. Many lookup tables require multiple JOIN operations, which slows down information retrieval.
- Hard to maintain. A high level of granularity makes the schema hard to manage and maintain.
The star and snowflake schema are both logical designs of database storage systems. Both schemas show up in big data systems, where the amount of information is massive and hard to maintain.
For more information about big data platforms, check out our article about the Hadoop platform.