Snowflake Best Practices for Data Engineering / Blogs / Perficient (2024)

We often end up creating a problem while working on data. So, here are few best practices for data engineering using Snowflake:

1. Transform your data incrementally:

A common mistake novice data engineers make is writing huge SQL statements that join, aggregate, and process many tables, misunderstanding that this is an efficient way to work. That’s it. In practice, the code becomes overly complex, difficult to maintain, and worse, often problematic. Instead, split the transformation pipeline into multiple steps and write the results to an intermediate table. This makes it easier to test intermediate results, simplifies code, and often produces simpler SQL code that runs faster.

2. Load data using COPY or SNOWPIPE:

Approximately80% of data loaded into a data warehouse is ingestedviaregular batchprocesses,orincreasingly as soon asthe data files arrive.I’m here. Using COPY and SNOWPIPE isthefastest and cheapestway to loaddata. So, resistthe temptation toperiodically load data usingother methods(such as queryingexternaltables). In fact,this is another example ofusingthe righttools.

3. Use multiple data models:

Localdata storageissocostly thatit was notpossibleto store multiple copies ofdata,each using a different data model tomeet your requirements. However, when using Snowflake, storetheraw data history in a structured or variant format, clean and fit the datausingthe third normal form or the Data Vault model, and Itmakes sense to storethe final consumabledata inthe Kimball dimensionaldata model. Each data model hasits own advantagesand storingintermediate stepresults hassignificantarchitecturaladvantages. Especially important isthe ability to reload and reprocess the data in the event ofan error.

4. Choose a required Virtual Warehouse size:

Another tip from the Top 3 Snowflake Performance Tuning Tactics, don’t assume an X6-LARGE virtual warehouse will load massive data files any faster than an X-SMALL. Each physical file is loaded sequentially, and it therefore pays to follow the Snowflake File Sizing Recommendations and either split multi-gigabyte files into chunks of 100–250Mb or load multiple concurrent data files in parallel.

5. Keep raw data history:

Unlessthedatacomesfrom a raw data lake, it makes sense to keep raw datahistory. Thisshould ideally be storedin aVARIANT data type to benefit from automatic schema evolution. This meansthat data can be truncatedandreprocessediferrorsare found in the transformationpipeline, providing data scientists with a great source ofrawdata. If you don’t have a needfor machine learningyet, you’llalmostcertainly need it in the next few years,if notnow.

6. Do not use JDBC or ODBC for normal large data loads:

Choosing a Global Software Development Partner to Accelerate Your Digital StrategyTo be successful and outpace the competition, you need a software development partner that excels in exactly the type of digital projects you are now faced with accelerating, and in the most cost effective and optimized way possible.Get the Guide

Anotherrecommendation for suitable tools.JDBC or ODBCinterfacesmay besuitable for loading severalmegabytes of data,butthese interfacescannot handlethe massive throughput of COPY and SNOWPIPE.Use it,butdon’t use itfornormallarge data loads.

7. Avoid scanning files:

Whencollecting datausing the COPYcommand,use partitionedstagingdata filesasdescribedin Step1of SnowflakeTop 3 PerformanceOptimizationTactics. This reduces the effort of scanning large numbers of data files in cloud storage.

8. Use the Tool according to requirement:

As the quote abovesuggests,if you only know one tool,it’s being used improperly. The decisionisbasedonseveralfactors, such astheskills available on yourteam, whether you needfast, near-real-timedelivery, whetheryou are performingaone-timedataload,or aprocess that repeats on aregularbasis. must be Note thatSnowflake can natively handle avarietyof file formats including Avro, Parquet, ORC,JSON,andCSV. Please see online documentation for detailed instructions loading data intoSnowflake.

9. Ensure 3rd party tools push down:

ETL tools like Ab Initio, Talend and Informatica were originally designed to extract data from source systems into an ETL server, transform the data and write them to the warehouse. As Snowflake can draw upon massive on-demand compute resources and automatically scale out, it makes no sense to have data copied to an external server. Instead, use the ELT (Extract, Load and Transform) method, and ensure the tools generate and execute SQL statements on Snowflake to maximize throughput and reduce costs.

10. Using Query Tag:

Whenstarting amulti-stepconversion task,set the session query tagusingALTER SESSION SET QUERY_TAG =“XXXXXX”and ALTER SESSION UNSET QUERY_TAG. This stampsan identifier on eachSQL statement untilitisrolled back, which is very important for system administrators. EachSQL statement (and QUERY_TAG) isloggedin the QUERY_HISTORYview, allowingyoutotrack job performance over time. Thisallows youto quicklyseewhen a task change hasdegradedperformance, identified inefficientconversion jobs,or indicated whenjobs need to run better in large or small warehouses.

11. Use Transient Tables for Intermediate Results:

During complex ELTpipelines,write intermediate results totransient tables that may be truncated before the next load. Thiscutsthetime travelstoragedownto just one day and avoids an additional 7 days offailsafestorage.Alwaysuse temporary tableswhere it makes sense. However, it isoftenusefultovalidatethe results of intermediate steps in complex ELTpipelines.

12. Avoid row-by-row processing:

Modern analytics platforms such as Snowflake are designed to ingest, process, and analyze billions of rows at incredible speed using simple SQL statements that react to each data set It has been. However, people tend to think in terms of row-by-row processing, and this can lead to programming loops where he fetches and updates one row at a time. Note that row-by-row processing is the biggest way to slow query performance. Use SQL statements to process all table entries atonceand avoid row-by-row processing at allcosts.

13. Follow standard ingestion patterns:

This involvesa multi-stepprocess ofstoringdata files in cloud storage and loadingthem into storage tablesbefore transforming the data. Breakingdowntheentireprocess intodefinedsteps makes it easier to orchestrate and test.

Conclusion:

In this blog, we have discussed few ways to deal with data using Snowflake. Above mentioned are the best practices one should follow for data ingestion, transformation, batch ingestion and processing, continuous data streaming, and data lake exploration.

Snowflake Best Practices for Data Engineering / Blogs / Perficient (2024)
Top Articles
Best Degrees for Real Estate - College Educated
How to Efficiently Pack a Suitcase for a Week's Stay
Pinellas County Jail Mugshots 2023
Don Wallence Auto Sales Vehicles
Sissy Hypno Gif
Tx Rrc Drilling Permit Query
Beds From Rent-A-Center
Craigslist In Fredericksburg
Www Movieswood Com
Ogeechee Tech Blackboard
Nier Automata Chapter Select Unlock
Ivegore Machete Mutolation
Samsung Galaxy S24 Ultra Negru dual-sim, 256 GB, 12 GB RAM - Telefon mobil la pret avantajos - Abonament - In rate | Digi Romania S.A.
Webcentral Cuny
Rugged Gentleman Barber Shop Martinsburg Wv
Hyvee Workday
Phoebus uses last-second touchdown to stun Salem for Class 4 football title
Unionjobsclearinghouse
Talkstreamlive
LCS Saturday: Both Phillies and Astros one game from World Series
A Cup of Cozy – Podcast
How to Make Ghee - How We Flourish
Jermiyah Pryear
Avatar: The Way Of Water Showtimes Near Maya Pittsburg Cinemas
Strange World Showtimes Near Savoy 16
Helpers Needed At Once Bug Fables
Xxn Abbreviation List 2017 Pdf
As families searched, a Texas medical school cut up their loved ones
Times Narcos Lied To You About What Really Happened - Grunge
Keshi with Mac Ayres and Starfall (Rescheduled from 11/1/2024) (POSTPONED) Tickets Thu, Nov 1, 2029 8:00 pm at Pechanga Arena - San Diego in San Diego, CA
Viduthalai Movie Download
Www.1Tamilmv.con
Rek Funerals
Mawal Gameroom Download
ATM, 3813 N Woodlawn Blvd, Wichita, KS 67220, US - MapQuest
Club Keno Drawings
Was heißt AMK? » Bedeutung und Herkunft des Ausdrucks
The Ultimate Guide to Obtaining Bark in Conan Exiles: Tips and Tricks for the Best Results
Despacito Justin Bieber Lyrics
Family Fare Ad Allendale Mi
Vivek Flowers Chantilly
Mohave County Jobs Craigslist
NHL training camps open with Swayman's status with the Bruins among the many questions
Weather Underground Corvallis
062203010
Seven Rotten Tomatoes
Fairbanks Auto Repair - University Chevron
Best Conjuration Spell In Skyrim
Reilly Auto Parts Store Hours
This Doctor Was Vilified After Contracting Ebola. Now He Sees History Repeating Itself With Coronavirus
Wisconsin Volleyball titt*es
Leslie's Pool Supply Redding California
Latest Posts
Article information

Author: Dean Jakubowski Ret

Last Updated:

Views: 5865

Rating: 5 / 5 (50 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Dean Jakubowski Ret

Birthday: 1996-05-10

Address: Apt. 425 4346 Santiago Islands, Shariside, AK 38830-1874

Phone: +96313309894162

Job: Legacy Sales Designer

Hobby: Baseball, Wood carving, Candle making, Jigsaw puzzles, Lacemaking, Parkour, Drawing

Introduction: My name is Dean Jakubowski Ret, I am a enthusiastic, friendly, homely, handsome, zealous, brainy, elegant person who loves writing and wants to share my knowledge and understanding with you.