Google reveals struggle to balance HDD and SSD use at scale

Even Google struggles to balance fast-but-pricey flash and cheap-but-slow hard disks

Reveals it ‘dramatically improved IOPS and throughput’ of its own storage with homebrew 'L4' automation and cache

Google has revealed that it still relies on hard disk drives for most of its storage needs, but has been able to ‘dramatically’ improve the performance of its storage systems with a homebrew automated data tiering system.

The ads and search giant admitted its ongoing fondness for spinning rust in a Thursday post that explains the workings of its “Colossus” universal storage platform.

Colossus underpins YouTube, Gmail, Google’s cloud storage services, and other applications.

“Most data centers have one cluster and thus one Colossus filesystem, regardless of how many workloads run inside the cluster,” the post states, before adding “Many Colossus filesystems have multiple exabytes of storage, including two different filesystems that have in excess of 10 exabytes of storage each.”

Colossus moves fast. Google’s post states that its largest filesystems regularly exceed read throughputs of 50 TB/s and write throughputs of 25 TB/s, with the busiest cluster delivering “over 600M IOPS, combined between reads and writes.”

Google last posted public info about Colossus in 2021, when it revealed the system “uses a mix of flash and disk storage” and places the most-frequently-accessed data on flash disks “for more efficient serving and lower latency.”

Colossus still moves in-demand data from hard disk (HDDs) to solid state disks (SSDs) and the new post states that doing so is “all the more pertinent today, as over the years, SSDs have gotten more affordable, increasing their prominence in our data centers.”

“However, SSD-only storage still poses a substantial cost premium over a blended storage fleet of SSD and HDD,” the post states. “The challenge is putting the right data — the data that gets the most I/Os or needs the lowest latency — on SSD while keeping the bulk of the data on HDD.”

Thursday’s post was penned by storage tech lead Larry Greenfield and storage software engineer Seth Pollen, who explain the tools Google uses to move data between solid state disks (aka Flash storage) and hard disk drives.

The duo reveal that Google’s internal users can force files onto flash or use a hybrid approach that sees one copy of a file placed on SSDs. The latter is sub-optimal because the servers Google uses to house storage devices are not always available, so a single copy of a file on SSD may not be accessible and internal users will therefore have to deal with the greater latency delivered by HDDs.

Most decisions about which medium is best for data is therefore made by an automated caching system called “L4” which Greenfield and Pollen wrote “dynamically picks the data that is most suitable for SSD.”

On The Register’s reading of the post, L4 caches data on SSD and builds an index that lists the data in those caches.

“That means that when an application wishes to read some data, it first consults an L4 index server. That index informs the client whether the data is in cache, in which case the client reads the data from one or more SSDs,” the pair wrote.

If data isn’t in cache, L4 reads it from HDD and moves it to a server that uses SSDs.

“L4 can be more or less aggressive about how much data to place on SSD,” the storage techs wrote. “We use a machine learning (ML) powered algorithm to decide between different policies for each workload: insert into the L4 cache when the data is written, after the first time it is read, or only after the second time it is read within a short time period.”

Google detailed some of those techniques in a 2022 presentation at the USENIX conference.

Performance boost, but problems persist

Greenfield and Pollen’s post says L4’s caching tech “works well for applications that read the same data often and has dramatically improved our IOPS and throughput.”

They also admit it has “a major weakness” because Google still writes new data to an HDD.

“And it turns out that there are other important classes of data where L4 read caching isn’t as effective at saving resources as we’d like, namely data that is written, read, and deleted quickly (such as intermediate results for a large batch-processing job), and database transaction logs and other files that see many tiny appends.”

Such workloads are poorly suited to HDD and the pair feel “it’s preferable to write them directly to SSD, and skip HDD entirely.”

L4 also automates data placement for new files that, because applications haven’t used them yet, can’t be assumed to need elevation to the SSD-packed cache.

When applications create new files they therefore share info such as file type, or metadata about the database column whose data is stored in a file.

These signals drive purchases of new SSD hardware and inform planners of ways to maximize efficiency

“L4 uses these features to segregate the files into ‘categories’ and observes the I/O patterns of each category over time,” Greenfield and Pollen write. “These I/O patterns drive an online simulation of different placement policies, such as ‘place on SSD for one hour,’ ‘place on SSD for two hours,’ or ‘don't place on SSD.’ Based on this simulation, L4 chooses the best policy for each category.”

Those situations also “predict what placement L4 would choose if more or less SSD capacity were available.

“Thus, we can predict how much I/O can be offloaded from HDD with different amounts of SSD. These signals drive purchases of new SSD hardware and inform planners of ways to shift SSD capacity between applications to maximize efficiency,” the pair wrote.

Google is not alone in wrestling with how to best blend SSD and HDD: Storage hardware vendors make a virtue of doing it well, but don’t have to operate at Exabyte-scale.

They, and you, may therefore benefit when Google reveals more info about its storage systems at the Google Cloud Next conference in April. Greenfield and Pollen recommend checking out sessions titled “What’s new with Google Cloud’s Storage” and “AI Hypercomputer: Mastering your Storage Infrastructure” if you show up at the Vegas gabfest. ®

推荐订阅源

The Register - Special Features: The State of Storage

Even Google struggles to balance fast-but-pricey flash and cheap-but-slow hard disks

Performance boost, but problems persist

These signals drive purchases of new SSD hardware and inform planners of ways to maximize efficiency