How Google stores Exabytes of Data (2024)

Hey Everyone!

Today we’ll be talking about

  • How Google stores Exabytes of Data

    • Colossus is Google’s Distributed File System and it can scale to tens of thousands of machines and exabytes of storage.

    • It’s the next generation of Google File System, a distributed file system built in the early 2000s that inspired Hadoop Distributed File System

    • We’ll talk about some history of Google File System and MapReduce and then go into how Colossus works

  • Tech Snippets

    • Real World Recommendation Systems

    • Managing Technical Quality in a Codebase

    • How Dropbox Rewrote Their Web Serving Stack

    • AWS Builder’s Library - How to Instrument Your System for Visibility

    • When Is Short Tenure a Red Flag?

How Google stores Exabytes of Data (1)

In a past article, we talked about how crucial Observability is for meeting your availability targets. However, having multiple teams build their own monitoring solutions doesn’t scale and is a waste of effort.

Instead, engineering teams are adopting a centralized approach to monitoring with a Metrics as a Service model.

With this model, different stakeholders like IT Ops, developers, managers, etc. will all have a single location they can easily access for observability data.

InfluxDB wrote a great blog post detailing how you can implement a centralized Observability Platform at your own organization.

You’ll need

sponsored

Google Colossus is a massive distributed storage system that Google uses to store and manage exabytes of data (1 exabyte is 1 million terabytes). Colossus is the next generation of Google File System (GFS), which was first introduced in 2003.

In the early 2000s, Sanjay Ghemawat, Jeff Dean and other Google engineers published two landmark papers in distributed systems: the Google File System paper and the MapReduce paper.

GFS was a distributed file system that could scale to over a thousand nodes and hundreds of terabytes of storage. The system consisted of 3 main components: a master node, Chunk Servers and clients.

How Google stores Exabytes of Data (2)

The master is responsible for managing the metadata of all files stored in GFS. It would maintain information about the location and status of the files. The Chunk Servers would store the actual data. Each managed a portion of data stored in GFS, and they would replicate the data to other chunk servers to ensure fault tolerance. Then, the GFS client library could be run on other machines that wanted to create/read/delete files from a GFS cluster.

To make the system scalable, Google engineers minimized the number of operations that the master would have to do. If a client wanted to access a file, they would send a query to the master. The master would send the chunk servers that hold the chunks of the file, and the client would download the data from the chunk servers. The master shouldn’t be involved in the minutia of transferring data from chunk servers to clients.

In order to efficiently run computations on the data stored on GFS, Google created MapReduce, a programming model and framework for processing large data sets. It consists of two main functions: the map function and the reduce function.

How Google stores Exabytes of Data (3)

Map is responsible for processing the input data locally and generating a set of intermediate key-value pairs. The reduce function will then process these intermediate key-value pairs in parallel and combine them to generate the final output. MapReduce is designed to maximize parallel processing while minimizing the amount of network bandwidth needed (run computations locally when possible).

For more details, we wrote past deep dives on MapReduce and Google File System.

These technologies were instrumental in scaling Google and they were also reimplemented by engineers at Yahoo. In 2006, the Yahoo projects were open sourced and became the Apache Hadoop project. Since then, Hadoop has exploded into a massive ecosystem of data engineering projects.

One issue with Google File System was the decision to have a single master node for storing and managing the GFS cluster’s metadata. You can read Section 2.4 Single Master in the GFS paper for an explanation of why Google made the decision of having a single master.

This choice worked well for batch-oriented applications like web crawling and indexing websites. However, it could not meet the latency requirements for applications like YouTube, where you need to serve a video extremely quickly.

Having the master node go down meant the cluster would be unavailable for a couple of seconds (during the automatic failover). This was no bueno for low latency applications.

To deal with this issue (and add other improvements), Google created Colossus, the successor to Google File System.

Dean Hildebrand is a Technical Director at Google and Denis Serenyi is a Tech Lead on the Google Cloud Storage team. They posted a great talk on YouTube delving into Google’s infrastructure and how Colossus works.

Infrastructure at Google

The Google Cloud Platform and all of Google’s products (Search, YouTube, Gmail, etc.) are powered by the same underlying infrastructure.

It consists of three core building blocks

  • Borg - A cluster management system that serves as the basis for Kubernetes. It launches and manages compute services at Google. Borg runs hundreds of thousands of jobs across many clusters (with each having thousands of machines). For more details, Google published a paper talking about how Borg works.

  • Spanner - A highly scalable, consistent relational database with support for distributed transactions. Under the hood, Spanner stores its data on Google Colossus. It uses TrueTime, Google’s clock synchronization service to provide ordering and consistency guarantees. For more details, check out the Spanner paper.

  • Colossus - Google’s successor to Google File System. This is a distributed file system that stores exabytes of data with low latency and high reliability. For more details on how it works, keep reading.

How Colossus Works

How Google stores Exabytes of Data (4)

Similar to Google File System, Colossus consists of three components

  • Client Library

  • Control Plane

  • D Servers

Applications that need to store/retrieve data on Colossus will do so with the client library, which abstracts away all the communication the app needs to do with the Control Plane and D servers. Users of Colossus can select different service tiers based on their latency/availability/reliability requirements. They can also choose their own data encoding based on the performance cost trade-offs they need to make.

The Colossus Control Plane is the biggest improvement compared to Google File System, and it consists of Curators and Custodians.

Curators replace the functionality of the master node, removing the single point of failure. When a client needs a certain file, it will query a curator node. The curator will respond with the locations of the various data servers that are holding that file. The client can then query the data servers directly to download the file.

When creating files, the client can send a request to the Curator node to obtain a lease. The curator node will create the new file on the D servers and send the locations of the servers back to the client. The client can then write directly to the D servers and release the lease when it’s done.

Curators store file system metadata in Google BigTable, a NoSQL database. Because of the distributed nature of the Control Plane, Colossus can now scale up by over 100x the largest Google File System clusters, while delivering lower latency.

Custodians in Colossus are background storage managers, and they handle things like disk space rebalancing, switching data between hot and cold storage, RAID reconstruction, failover and more.

D servers refers to the Chunk Servers in Google File System, and they’re just network attached disks. They store all the data that’s being held in Colossus. As mentioned previously, data flows directly from the D servers to the clients, to minimize the involvement of the control plane. This makes it much easier to scale the amount of data stored in Colossus without having the control plane as a bottleneck.

Colossus Abstractions

Colossus abstracts away a lot of the work you have to do in managing data.

Hardware Diversity

Engineers want Colossus to provide the best performance at the cheapest cost. Data in the distributed file system is stored on a mixture of flash and disk. Figuring out the optimal amount of flash memory vs disk space and how to distribute data can mean the difference of tens of millions of dollars. To handle intelligent disk management, engineers looked at how data was accessed in the past.

Newly written data tends to be hotter, so it’s stored in flash memory. Old analytics data tends to be cold, so that’s stored in cheaper disk. Certain data will always be read at specific time intervals, so it’s automatically transferred over to memory so that latency will be low.

Clients don’t have to think about any of this. Colossus manages it for them.

Requirements

As mentioned, apps have different requirements around consistency, latency, availability, etc.

Colossus provides different service tiers so that applications can choose what they want.

Fault Tolerance

At Google’s scale, machines are failing all the time (it’s inevitable when you have millions of machines).

Colossus steps in to handle things like replicating data across D servers, background recovery and steering reads/writes around failed nodes.

For more details, you can read the full talk here.

How did you like this summary?

Your feedback really helps me improve curation for future emails.

  • I didn't like it
  • Mehh, it was okay
  • It was great

Login or Subscribe to participate in polls.

How Google stores Exabytes of Data (5)

Working with large sets of time-stamped data has its challenges.

Fortunately, InfluxDB is a time series database purpose-built to handle the unique workloads of time series data.

Using InfluxDB, developers can ingest billions of data points in real-time with unbounded cardinality, and store, analyze, and act on that data – all in a single database.

No matter what kind of time series data you’re working with – metrics, events, traces, or logs – InfluxDB Cloud provides a performant, elastic, serverless time series platform with the tools and features developers need. Native SQL compatibility makes it easy to get started with InfluxDB and to scale your solutions.

Companies like IBM, Cisco, and Robinhood all rely heavily on InfluxDB to build and manage responsive backend applications, to power predictive intelligence, and to monitor their systems for insights that they would otherwise miss.

See for yourself by quickly spinning up the platform and testing it out InfluxDB Cloud for free.

sponsored

Real World Recommendation SystemsThis is an interesting read that delves into building a recommendation system at scale and serving live requests within a few hundred milliseconds.Recommendation systems are usually composed of components like Retrieval, Filtering, Feature Extraction, Scoring, Ranking, Logging and more. The post delves into all the stages and the engineering required.fennel.ai/blog/real-world-recommendation-system
Managing technical quality in a codebaseWill Larson is the CTO of Calm and he wrote a detailed post delving into the steps you should take when improving the technical quality of your codebase.He goes through finding hot spots that cause immediate problems to adopting best practices and creating technical quality teams at your company.lethain.com/managing-technical-quality
How Dropbox Rewrote Their Web Serving StackDropbox is a cloud file storage company. They recently rewrote their web app to be a Single Page Application with ReactJS.They published a great blog post delving into Edison, their web serving stack that allows high developer velocity and gives the user a great experience.dropbox.tech/frontend/edison-webserver-a-faster-more-powerful-dropbox-on-the-web
When Is Short Tenure a Red Flag?When you’re hiring, at what point should short tenure be a red flag?Jacob Kaplan-Moss is an Engineering Leader and he has experience building teams at Heroku, Hangar and other companies.He recommends that 1-2 instances is not something to be concerned about. However, having 4-5 short jobs (a tenure of under a year) in the past 5-7 years is something you should ask the candidate about.jacobian.org/2022/oct/14/when-is-short-tenure-a-red-flag
How Google stores Exabytes of Data (2024)
Top Articles
Coaching...is and is not
How can you fly at the lowest possible price in the coming period? | Transavia
Dunhams Treestands
Aberration Surface Entrances
Kokichi's Day At The Zoo
Ixl Elmoreco.com
Jonathon Kinchen Net Worth
Davante Adams Wikipedia
MADRID BALANZA, MªJ., y VIZCAÍNO SÁNCHEZ, J., 2008, "Collares de época bizantina procedentes de la necrópolis oriental de Carthago Spartaria", Verdolay, nº10, p.173-196.
Hood County Buy Sell And Trade
Wicked Local Plymouth Police Log 2022
Craigslist Red Wing Mn
Ruben van Bommel: diepgang en doelgerichtheid als wapens, maar (nog) te weinig rendement
China’s UberEats - Meituan Dianping, Abandons Bike Sharing And Ride Hailing - Digital Crew
Vegas7Games.com
Theater X Orange Heights Florida
Drug Test 35765N
Restored Republic June 16 2023
Greensboro sit-in (1960) | History, Summary, Impact, & Facts
Chicago Based Pizza Chain Familiarly
Receptionist Position Near Me
Dhs Clio Rd Flint Mi Phone Number
27 Fantastic Things to do in Lynchburg, Virginia - Happy To Be Virginia
Visit the UK as a Standard Visitor
Tim Steele Taylorsville Nc
Darknet Opsec Bible 2022
After Transmigrating, The Fat Wife Made A Comeback! Chapter 2209 – Chapter 2209: Love at First Sight - Novel Cool
Japanese Pokémon Cards vs English Pokémon Cards
Lowell Car Accident Lawyer Kiley Law Group
Metra Union Pacific West Schedule
Daily Journal Obituary Kankakee
Indiefoxx Deepfake
Merge Dragons Totem Grid
That1Iggirl Mega
Geology - Grand Canyon National Park (U.S. National Park Service)
Wattengel Funeral Home Meadow Drive
Stanford Medicine scientists pinpoint COVID-19 virus’s entry and exit ports inside our noses
Empires And Puzzles Dark Chest
Skyward Marshfield
Sun Tracker Pontoon Wiring Diagram
Executive Lounge - Alle Informationen zu der Lounge | reisetopia Basics
Parent Portal Pat Med
LumiSpa iO Activating Cleanser kaufen | 19% Rabatt | NuSkin
Blackwolf Run Pro Shop
2013 Honda Odyssey Serpentine Belt Diagram
Rocket Lab hiring Integration & Test Engineer I/II in Long Beach, CA | LinkedIn
Argus Leader Obits Today
Page 5747 – Christianity Today
Barber Gym Quantico Hours
15:30 Est
Wrentham Outlets Hours Sunday
Latest Posts
Article information

Author: Lidia Grady

Last Updated:

Views: 6567

Rating: 4.4 / 5 (45 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Lidia Grady

Birthday: 1992-01-22

Address: Suite 493 356 Dale Fall, New Wanda, RI 52485

Phone: +29914464387516

Job: Customer Engineer

Hobby: Cryptography, Writing, Dowsing, Stand-up comedy, Calligraphy, Web surfing, Ghost hunting

Introduction: My name is Lidia Grady, I am a thankful, fine, glamorous, lucky, lively, pleasant, shiny person who loves writing and wants to share my knowledge and understanding with you.