What is snowflaking (snowflake schema)? | Definition from TechTarget (2024)

Robert Sheldon

What is snowflaking (snowflake schema) in warehousing?

In data warehousing, snowflaking is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables. A snowflake schema is a variation of the star schema that normalizes the dimension tables to increase data integrity, simplify data maintenance and reduce the amount of disk space. In certain situations, it can also improve query performance.

A star schema consists of a central fact table that references multiple dimension tables. Each dimension table is denormalized ("flattened") to avoid the query overhead that comes with a highly normalized schema, which can require a large number of joins to retrieve the necessary data.

What is the purpose of the snowflake schema?

The normalized dimensions in a snowflake schema reduce the amount of redundant data, making them less susceptible to data integrity issues than a star schema. Whenever multiple copies of the same data are stored in a database, as is the case with the star schema, there is a greater risk that extract, load and transform (ELT) operations will result in problems with the data.

With a snowflake schema, data maintenance is easier and less likely to cause data integrity issues. There is less redundant data, and that data is organized into separate tables, simplifying the processes of adding data and updating data. A snowflake schema also requires less disk space for data storage.

That said, a star schema is easier to set up than a snowflake schema. Developers can also build and update queries more easily, and those queries generally perform better because there are fewer joins. Many sources, including the Kimball Group, a data warehousing consultancy, generally recommend against the snowflake schema, although other sources, such as IBM, suggest that snowflaking is a viable alternative in some cases. However, snowflaking is rarely recommended simply to minimize disk space.

The snowflake schema might be used to support specific query needs, or it might be used when the data itself is not conducive to being easily denormalized. The decision to use a snowflake schema will often depend on the type of queries that will be supported and on the data being stored in the data warehouse. For instance, business intelligence (BI) applications that use a relational OLAP (ROLAP) architecture might perform better when the data warehouse schema is snowflaked.

What is snowflaking (snowflake schema)? | Definition from TechTarget (4)

Snowflaking should also be considered when evaluating the dimensions themselves. For example, you should consider using it in the following circ*mstances:

The dimension contains sparsely populated attributes whose values are mostly NULL.
The dimension supports a many-to-many relationship but limits the number of potential instances. For example, a data warehouse for a streaming video service might have to limit the number of categories that can be associated with each video if the dimension is denormalized.
The dimension is very large and full of redundant data in low cardinality attributes. For instance, a company's data warehouse might include a customer dimension that contains over a million rows of data, including a substantial amount of redundant geographical data.
The dimension includes low cardinality attributes that are queried independently. For example, a product dimension might contain thousands of records, but only a handful of product types. Moving the product type attribute to its own dimension table can improve performance when the product types are queried independently.
The dimension's attributes are part of a hierarchy and are queried independently, such as the year, quarter and month attributes of a date hierarchy or the country and state attributes of a geographic hierarchy.

Query performance will often drive the decision of whether to use a star schema or snowflake schema. However, data architects should still keep in mind other factors, such as data maintenance, data storage and the resources needed to develop and maintain the schema and queries.

Further explore the differences between star schema and snowflake schema and learn the differences between data lake vs. data warehouse.

This was last updated in July 2023

Continue Reading About snowflaking (snowflake schema)

Modernizing a data warehouse for real-time decisions

The differences between a data warehouse vs. data mart

On-premises vs. cloud data warehouses: Pros and cons

Weigh the benefits and drawbacks of a hybrid data warehouse

Best practices and pitfalls of the data pipeline process

Related Terms

What are data silos and what problems do they cause?: A data silo is a repository of data that's controlled by one department or business unit and isolated from the rest of an ...Seecompletedefinition
What is AWS Glue?: AWS Glue is a cloud-based and serverless data integration service that helps users to prepare data for analysis through automated...Seecompletedefinition
What is parallel processing?: Parallel processing is a method in computing of running two or more processors, or CPUs, to handle separate parts of an overall ...Seecompletedefinition

Dig Deeper on Data integration

7 data modeling techniques and concepts for businessBy: RickSherman
schemaBy: RahulAwati
data modelingBy: AlexanderGillis
star schemaBy: RobertSheldon

FAQs

What is snowflaking (snowflake schema)? | Definition from TechTarget? ›

In data warehousing, snowflaking

snowflaking

In computing, a snowflake schema or snowflake model is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.

https://en.wikipedia.org › wiki › Snowflake_schema

Snowflake schema - Wikipedia

is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables
dimension tables
The dimension is a data set composed of individual, non-overlapping data elements. The primary functions of dimensions are threefold: to provide filtering, grouping and labelling. These functions are often described as "slice and dice".
https://en.wikipedia.org › wiki › Dimension_(data_warehouse)
Dimension (data warehouse) - Wikipedia
. A snowflake schema is a variation of the star schema that normalizes the dimension tables to increase data integrity, simplify data maintenance and reduce the amount of disk space.

Read On ›

What is a snowflake schema and what is its purpose? ›

A snowflake schema is a multi-dimensional data model that is an extension of a star schema, where dimension tables are broken down into subdimensions. Snowflake schemas are commonly used for business intelligence and reporting in OLAP data warehouses, data marts, and relational databases.

Discover More Details ›

What is snowflake schema vs star schema? ›

A star schema has denormalized dimension tables, while a snowflake schema has normalized dimension tables. A star schema is easier to design and implement than a snowflake schema. A star schema can be more efficient to query than a snowflake schema, because there are fewer JOINs between tables.

What is snowflake information schema? ›

The Snowflake Information Schema (aka “Data Dictionary”) consists of a set of system-defined views and table functions that provide extensive metadata information about the objects created in your account.

See Details ›

What is starflake schema? ›

A starflake schema is a combination of a star schema and a snowflake schema. Starflake schemas are snowflake schemas where only some of the dimension tables have been denormalized. Starflake schemas aim to leverage the benefits of both star schemas and snowflake schemas.

Find Out More ›

What is the meaning of snowflaking? ›

In data warehousing, snowflaking is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables. A snowflake schema is a variation of the star schema that normalizes the dimension tables to increase data integrity, simplify data maintenance and reduce the amount of disk space.

Tell Me More ›

What is the main purpose of snowflake? ›

Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings. The Snowflake data platform is not built on any existing database technology or “big data” software platforms such as Hadoop.

Show Me More ›

Why is it called a star schema? ›

The star schema gets its name from the physical model's resemblance to a star shape with a fact table at its center and the dimension tables surrounding it representing the star's points.

Explore More ›

What is the difference between schema and view in snowflake? ›

A view allows the result of a query to be accessed as if it were a table. The query is specified in the CREATE VIEW statement. If the schema is not specified, then Snowflake assumes that the table is in the same schema as the view.

How many fact tables are in snowflake schema? ›

The snowflake schema consists of one fact table that is connected to many dimension tables, which can be connected to other dimension tables through a many-to-one relationship.

Show Me More ›

Where are snowflake schemas stored? ›

All data in Snowflake is maintained in databases. Each database consists of one or more schemas, which are logical groupings of database objects, such as tables and views. Snowflake does not place any hard limits on the number of databases, schemas (within a database), or objects (within a schema) you can create.

Read The Full Story ›

Is snowflake schema on read or write? ›

While Snowflake supports both Schema-on-Read and Schema-on-Write, the public preview of the Schema Detection feature improves Snowflake's Schema-on-Write capabilities and can greatly decrease the amount of effort at the beginning of data ingestion.

See Details ›

What is Snowflake database in Snowflake? ›

A Snowflake database is where an organization's uploaded structured and semistructured data sets are held for processing and analysis. Snowflake automatically manages all parts of the data storage process, including organization, structure, metadata, file size, compression, and statistics.

Get More Info Here ›

What is a snowflake schema example? ›

For example, a fact table might contain data on sales transactions, while the dimension tables might contain data on customers, products, and stores. The snowflake schema is useful for organizing and querying large, complex databases because it allows for more efficient querying and faster performance.

Which one is better, star schema or snowflake? ›

Star schemas will only join the fact table with the dimension tables, leading to simpler, faster SQL queries. Snowflake schemas have no redundant data, so they're easier to maintain. Snowflake schemas are good for data warehouses whereas star schemas are better for datamarts with simple relationships.

Why use snowflake schema? ›

Purpose and Benefits of the Snowflake Schema

The benefits include: The snowflake schema reduces data redundancy. Optimizes storage space. Streamlines data organization and query performance.

View Details ›

What are the use cases of snowflake schema? ›

When to Use a Snowflake Schema in Power BI

Normalized Data Structure. Scenario: When your source data is highly normalized. ...
Complex Data Relationships. ...
Efficient Storage and Data Integrity. ...
Data Consistency and Maintenance. ...
Large and Complex Data Models. ...
Reducing Data Duplication. ...
Detailed Hierarchical Reporting.

Jun 12, 2024

What are the benefits of snowflake? ›

Unlike traditional data warehouses, Snowflake operates on a cloud-based architecture, which allows for faster data processing, scalability, and flexibility. It separates storage and compute resources, ensuring that you only pay for what you use and can scale your resources up or down according to your needs.

Learn More ›

What are the requirements for a snowflake schema? ›

Join Requirements: As compared to a star schema, more joins are required to develop hierarchies in a snowflake schema. The requirement for more complex SQL queries that involve multiple table joins can impact performance, especially when dealing with large data sets.

Discover More Details ›