What is snowflaking (snowflake schema)? | Definition from TechTarget (2024)

What is snowflaking (snowflake schema)? | Definition from TechTarget (1)

By

  • Robert Sheldon

What is snowflaking (snowflake schema) in warehousing?

In data warehousing, snowflaking is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables. A snowflake schema is a variation of the star schema that normalizes the dimension tables to increase data integrity, simplify data maintenance and reduce the amount of disk space. In certain situations, it can also improve query performance.

A star schema consists of a central fact table that references multiple dimension tables. Each dimension table is denormalized ("flattened") to avoid the query overhead that comes with a highly normalized schema, which can require a large number of joins to retrieve the necessary data.

Figure 1 shows a basic example of a star schema. It includes one fact table (green) and four dimension tables (blue). The fact table contains multiple foreign keys that reference the dimension tables. The data in each dimension table is denormalized, making queries fast and efficient.

What is snowflaking (snowflake schema)? | Definition from TechTarget (2)

Normalized dimension tables can contain a significant amount of redundant data. For instance, the dimTerritory table includes the TerritoryName, TerritoryCountry and TerritoryRegion columns. Together, these columns form the hierarchy TerritoryRegion > TerritoryCountry >TerritoryName. An example of this might be North America > Mexico > Baja California.

Columns such as TerritoryCountry and TerritoryName can have a low cardinality, which refers to the number of unique values relative to the number of rows. The more rows, the lower the cardinality and the more redundant data. Although data queries are generally faster in a star schema, the schema can also be more prone to data integrity issues and require more disk space than a more normalized structure.

In a snowflake schema, a fact is surrounded by its associated dimensions (as in a star schema), but those dimensions are further related to other dimensions, branching out into a snowflake pattern. Snowflaking normalizes the dimensions by moving attributes with low cardinality into separate dimension tables.

The fact table in a snowflake schema uses foreign keys to reference the core dimensions, which in turn use foreign keys to reference the dimensions at the next level. The illustration in Figure 2 shows a snowflake schema that was created from the star schema shown in Figure 1, with each of the original dimensions now normalized.

What is snowflaking (snowflake schema)? | Definition from TechTarget (3)

The fact table still links to the four core dimensions, but those dimensions have been normalized in the following ways:

  • The dimCustomer dimension no longer includes the CustomerInterest1, CustomerInterest2 and CustomerInterest3 columns. Instead, a separate dimInterestArea table has been created, with a bridge (junction) table added between the dimInterestArea table and the dimCustomer table. The bridge table makes it possible to support a many-to-many relationship in which one customer can be associated with multiple interest areas and an interest area can be associated with multiple customers. It also enables customers to be associated with more than three interests.
  • The ProductType and ProductSupplier columns in the dimProduct table have been replaced with key columns that reference the new dimProductType and dimSupplier tables, removing the redundant data from the dimProduct table.
  • The hierarchical data in dimTerritory table is now fully normalized. The table contains a foreign key that references the dimCountry table and the dimCountry table contains a foreign key that references the dimRegion table. The data is now spread across the three dimension tables, eliminating the redundancy in the original dimTerritory table.
  • The month and day data have been removed from the dimDate table and replaced with key columns that reference two new dimension tables, eliminating the many duplicated instances of the day and month names. The dimDate table can also be normalized in other ways, as warranted by the supported workloads and date range covered by the dimension.

Some sources consider a schema to be snowflaked if at least one of the dimensions is normalized, while other sources insist that all dimensions must be normalized. A schema that is only partially normalized is sometimes referred to as a starflake schema because it combines the characteristics of both the star and snowflake schemas.

What is the purpose of the snowflake schema?

The normalized dimensions in a snowflake schema reduce the amount of redundant data, making them less susceptible to data integrity issues than a star schema. Whenever multiple copies of the same data are stored in a database, as is the case with the star schema, there is a greater risk that extract, load and transform (ELT) operations will result in problems with the data.

With a snowflake schema, data maintenance is easier and less likely to cause data integrity issues. There is less redundant data, and that data is organized into separate tables, simplifying the processes of adding data and updating data. A snowflake schema also requires less disk space for data storage.

That said, a star schema is easier to set up than a snowflake schema. Developers can also build and update queries more easily, and those queries generally perform better because there are fewer joins. Many sources, including the Kimball Group, a data warehousing consultancy, generally recommend against the snowflake schema, although other sources, such as IBM, suggest that snowflaking is a viable alternative in some cases. However, snowflaking is rarely recommended simply to minimize disk space.

The snowflake schema might be used to support specific query needs, or it might be used when the data itself is not conducive to being easily denormalized. The decision to use a snowflake schema will often depend on the type of queries that will be supported and on the data being stored in the data warehouse. For instance, business intelligence (BI) applications that use a relational OLAP (ROLAP) architecture might perform better when the data warehouse schema is snowflaked.

What is snowflaking (snowflake schema)? | Definition from TechTarget (4)

Snowflaking should also be considered when evaluating the dimensions themselves. For example, you should consider using it in the following circ*mstances:

  • The dimension contains sparsely populated attributes whose values are mostly NULL.
  • The dimension supports a many-to-many relationship but limits the number of potential instances. For example, a data warehouse for a streaming video service might have to limit the number of categories that can be associated with each video if the dimension is denormalized.
  • The dimension is very large and full of redundant data in low cardinality attributes. For instance, a company's data warehouse might include a customer dimension that contains over a million rows of data, including a substantial amount of redundant geographical data.
  • The dimension includes low cardinality attributes that are queried independently. For example, a product dimension might contain thousands of records, but only a handful of product types. Moving the product type attribute to its own dimension table can improve performance when the product types are queried independently.
  • The dimension's attributes are part of a hierarchy and are queried independently, such as the year, quarter and month attributes of a date hierarchy or the country and state attributes of a geographic hierarchy.

Query performance will often drive the decision of whether to use a star schema or snowflake schema. However, data architects should still keep in mind other factors, such as data maintenance, data storage and the resources needed to develop and maintain the schema and queries.

Further explore the differences between star schema and snowflake schema and learn the differences between data lake vs. data warehouse.

This was last updated in July 2023

Continue Reading About snowflaking (snowflake schema)

  • Modernizing a data warehouse for real-time decisions
  • The differences between a data warehouse vs. data mart
  • On-premises vs. cloud data warehouses: Pros and cons
  • Weigh the benefits and drawbacks of a hybrid data warehouse
  • Best practices and pitfalls of the data pipeline process

Related Terms

What are data silos and what problems do they cause?
A data silo is a repository of data that's controlled by one department or business unit and isolated from the rest of an ...Seecompletedefinition
What is AWS Glue?
AWS Glue is a cloud-based and serverless data integration service that helps users to prepare data for analysis through automated...Seecompletedefinition
What is parallel processing?
Parallel processing is a method in computing of running two or more processors, or CPUs, to handle separate parts of an overall ...Seecompletedefinition

Dig Deeper on Data integration

  • 7 data modeling techniques and concepts for businessBy: RickSherman
  • schemaBy: RahulAwati
  • data modelingBy: AlexanderGillis
  • star schemaBy: RobertSheldon
What is snowflaking (snowflake schema)? | Definition from TechTarget (2024)

FAQs

What is snowflaking (snowflake schema)? | Definition from TechTarget? ›

In data warehousing, snowflaking

snowflaking
In computing, a snowflake schema or snowflake model is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.
https://en.wikipedia.org › wiki › Snowflake_schema
is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables
dimension tables
The dimension is a data set composed of individual, non-overlapping data elements. The primary functions of dimensions are threefold: to provide filtering, grouping and labelling. These functions are often described as "slice and dice".
https://en.wikipedia.org › wiki › Dimension_(data_warehouse)
. A snowflake schema is a variation of the star schema that normalizes the dimension tables to increase data integrity, simplify data maintenance and reduce the amount of disk space.

What is a snowflake schema and what is its purpose? ›

A snowflake schema is a multi-dimensional data model that is an extension of a star schema, where dimension tables are broken down into subdimensions. Snowflake schemas are commonly used for business intelligence and reporting in OLAP data warehouses, data marts, and relational databases.

What is snowflake schema vs star schema? ›

A star schema has denormalized dimension tables, while a snowflake schema has normalized dimension tables. A star schema is easier to design and implement than a snowflake schema. A star schema can be more efficient to query than a snowflake schema, because there are fewer JOINs between tables.

What is snowflake information schema? ›

The Snowflake Information Schema (aka “Data Dictionary”) consists of a set of system-defined views and table functions that provide extensive metadata information about the objects created in your account.

What is starflake schema? ›

A starflake schema is a combination of a star schema and a snowflake schema. Starflake schemas are snowflake schemas where only some of the dimension tables have been denormalized. Starflake schemas aim to leverage the benefits of both star schemas and snowflake schemas.

What is the meaning of snowflaking? ›

In data warehousing, snowflaking is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables. A snowflake schema is a variation of the star schema that normalizes the dimension tables to increase data integrity, simplify data maintenance and reduce the amount of disk space.

What is the main purpose of snowflake? ›

Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings. The Snowflake data platform is not built on any existing database technology or “big data” software platforms such as Hadoop.

Why is it called a star schema? ›

The star schema gets its name from the physical model's resemblance to a star shape with a fact table at its center and the dimension tables surrounding it representing the star's points.

What is the difference between schema and view in snowflake? ›

A view allows the result of a query to be accessed as if it were a table. The query is specified in the CREATE VIEW statement. If the schema is not specified, then Snowflake assumes that the table is in the same schema as the view.

How many fact tables are in snowflake schema? ›

The snowflake schema consists of one fact table that is connected to many dimension tables, which can be connected to other dimension tables through a many-to-one relationship.

Where are snowflake schemas stored? ›

All data in Snowflake is maintained in databases. Each database consists of one or more schemas, which are logical groupings of database objects, such as tables and views. Snowflake does not place any hard limits on the number of databases, schemas (within a database), or objects (within a schema) you can create.

Is snowflake schema on read or write? ›

While Snowflake supports both Schema-on-Read and Schema-on-Write, the public preview of the Schema Detection feature improves Snowflake's Schema-on-Write capabilities and can greatly decrease the amount of effort at the beginning of data ingestion.

What is Snowflake database in Snowflake? ›

A Snowflake database is where an organization's uploaded structured and semistructured data sets are held for processing and analysis. Snowflake automatically manages all parts of the data storage process, including organization, structure, metadata, file size, compression, and statistics.

What is a snowflake schema example? ›

For example, a fact table might contain data on sales transactions, while the dimension tables might contain data on customers, products, and stores. The snowflake schema is useful for organizing and querying large, complex databases because it allows for more efficient querying and faster performance.

Which one is better, star schema or snowflake? ›

Star schemas will only join the fact table with the dimension tables, leading to simpler, faster SQL queries. Snowflake schemas have no redundant data, so they're easier to maintain. Snowflake schemas are good for data warehouses whereas star schemas are better for datamarts with simple relationships.

Why use snowflake schema? ›

Purpose and Benefits of the Snowflake Schema

The benefits include: The snowflake schema reduces data redundancy. Optimizes storage space. Streamlines data organization and query performance.

What are the use cases of snowflake schema? ›

When to Use a Snowflake Schema in Power BI
  • Normalized Data Structure. Scenario: When your source data is highly normalized. ...
  • Complex Data Relationships. ...
  • Efficient Storage and Data Integrity. ...
  • Data Consistency and Maintenance. ...
  • Large and Complex Data Models. ...
  • Reducing Data Duplication. ...
  • Detailed Hierarchical Reporting.
Jun 12, 2024

What are the benefits of snowflake? ›

Unlike traditional data warehouses, Snowflake operates on a cloud-based architecture, which allows for faster data processing, scalability, and flexibility. It separates storage and compute resources, ensuring that you only pay for what you use and can scale your resources up or down according to your needs.

What are the requirements for a snowflake schema? ›

Join Requirements: As compared to a star schema, more joins are required to develop hierarchies in a snowflake schema. The requirement for more complex SQL queries that involve multiple table joins can impact performance, especially when dealing with large data sets.

Top Articles
Opened Store and Slowed Down Sales
Create a template - Microsoft Support
Lengua With A Tilde Crossword
13 Easy Ways to Get Level 99 in Every Skill on RuneScape (F2P)
The Daily News Leader from Staunton, Virginia
Chelsea player who left on a free is now worth more than Palmer & Caicedo
CKS is only available in the UK | NICE
Jonathan Freeman : "Double homicide in Rowan County leads to arrest" - Bgrnd Search
What happens if I deposit a bounced check?
Amelia Bissoon Wedding
Mile Split Fl
[Birthday Column] Celebrating Sarada's Birthday on 3/31! Looking Back on the Successor to the Uchiha Legacy Who Dreams of Becoming Hokage! | NARUTO OFFICIAL SITE (NARUTO & BORUTO)
Divina Rapsing
Nurse Logic 2.0 Testing And Remediation Advanced Test
X-Chromosom: Aufbau und Funktion
Curver wasmanden kopen? | Lage prijs
EASYfelt Plafondeiland
Football - 2024/2025 Women’s Super League: Preview, schedule and how to watch
Dcf Training Number
Jeffers Funeral Home Obituaries Greeneville Tennessee
Raw Manga 1000
Troy Gamefarm Prices
Cb2 South Coast Plaza
Copper Pint Chaska
Narragansett Bay Cruising - A Complete Guide: Explore Newport, Providence & More
Login.castlebranch.com
TJ Maxx‘s Top 12 Competitors: An Expert Analysis - Marketing Scoop
Viduthalai Movie Download
Skepticalpickle Leak
Halsted Bus Tracker
Delta Rastrear Vuelo
Beaver Saddle Ark
Poster & 1600 Autocollants créatifs | Activité facile et ludique | Poppik Stickers
Marine Forecast Sandy Hook To Manasquan Inlet
Covalen hiring Ai Annotator - Dutch , Finnish, Japanese , Polish , Swedish in Dublin, County Dublin, Ireland | LinkedIn
The Bold And The Beautiful Recaps Soap Central
Space Marine 2 Error Code 4: Connection Lost [Solved]
Myfxbook Historical Data
Bill Manser Net Worth
Coroner Photos Timothy Treadwell
Despacito Justin Bieber Lyrics
Dr Mayy Deadrick Paradise Valley
Login
Blow Dry Bar Boynton Beach
Hampton In And Suites Near Me
News & Events | Pi Recordings
Lesson 5 Homework 4.5 Answer Key
Freightliner Cascadia Clutch Replacement Cost
What Does the Death Card Mean in Tarot?
Thrift Stores In Burlingame Ca
Shad Base Elevator
Códigos SWIFT/BIC para bancos de USA
Latest Posts
Article information

Author: Terence Hammes MD

Last Updated:

Views: 5348

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Terence Hammes MD

Birthday: 1992-04-11

Address: Suite 408 9446 Mercy Mews, West Roxie, CT 04904

Phone: +50312511349175

Job: Product Consulting Liaison

Hobby: Jogging, Motor sports, Nordic skating, Jigsaw puzzles, Bird watching, Nordic skating, Sculpting

Introduction: My name is Terence Hammes MD, I am a inexpensive, energetic, jolly, faithful, cheerful, proud, rich person who loves writing and wants to share my knowledge and understanding with you.