# TurboQuant: Is This the MP3 Moment for AI Memory Stocks?

> We read the compression paper that knocked 3-6% off the memory complex in two sessions and stress-tested the MP3 analogy against the DeepSeek precedent.

- Author: Barebone Research, Barebone AI
- Published: 2026-03-26
- Canonical: https://barebone.ai/resources/google-turboquant-mp3-moment-ai-memory-stocks
- Publisher: Barebone AI (https://barebone.ai)

---

## The Paper That Moved Seoul

On Tuesday, March 24, two Google researchers published a blog post about a compression algorithm. No product. No pricing. No launch date. It was, in the most literal sense, math.

By Thursday's close in Seoul, SK Hynix had fallen **6%** and Samsung Electronics nearly **5%** — billions of dollars of market value off two of the most valuable companies in Asia, on a research write-up. A day earlier in New York, SanDisk dropped **5.7%**, Western Digital **4.7%**, Seagate **4%**, and Micron **3%**. The internet, with its usual restraint, nicknamed the algorithm "Pied Piper," after the fictional compression startup from HBO's *Silicon Valley*.

The algorithm is called TurboQuant. The pitch: shrink the working memory an AI model needs at inference time by **at least 6x**, with effectively zero loss in quality. If true and deployed, that aims squarely at the most profitable shortage in the semiconductor industry — the memory chips that AI labs cannot currently buy enough of.

We used Barebone AI to rebuild the week from the source documents: the paper's actual claims, the tape across New York and Seoul, and the one precedent everyone is reaching for. The short version: **the technical claims are real, but much narrower than the selloff implies** — and the precedent, a January 2025 panic that erased **$589 billion** of NVIDIA in a single day before the stock went on to become the first $5 trillion company, cuts in both directions.

## What Google Actually Built

When a large language model answers you, it doesn't re-read the whole conversation from scratch for every word it generates. It keeps a running scratchpad — the **KV cache** (key-value cache) — holding a processed copy of everything already in the context window. Each new token the model produces gets checked against that entire scratchpad. The longer the conversation or document, the bigger the scratchpad grows, and it lives in high-bandwidth memory (HBM) stacked next to the GPU — some of the most expensive real estate in computing.

That scratchpad is what TurboQuant compresses. The paper — *TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate*, by Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni — squeezes those cached values down to roughly **3 bits each**, versus the 16-bit standard, using randomized rotations plus a bias-correcting transform. No retraining, no fine-tuning, applied on the fly.

Google's stated results: KV cache memory cut by **at least 6x**; attention computation up to **8x faster** on an NVIDIA H100; accuracy indistinguishable from full 16-bit precision on long-context benchmarks like LongBench and Needle in a Haystack, tested across Llama-3.1-8B-Instruct, Gemma, and Mistral models.

Impressive. Now the fine print, because the fine print is where the trade lives:

| The headline claim | What the paper actually shows |
|---|---|
| "AI memory cut 6x" | The **KV cache** only — model weights, activations, and training memory are untouched |
| "8x faster" | Attention computation at 4-bit on an H100, **versus an unquantized 32-bit baseline** — production stacks mostly run 16-bit, and end-to-end serving is not 8x faster |
| "No quality loss" | Demonstrated on benchmarks using small open models (8B-class) — not yet on frontier production systems |
| "Google just dropped this" | The paper has been public on arXiv since **April 2025**; it was accepted to ICLR 2026 and written up on Google's research blog March 24, with the conference presentation next month |

That last row deserves a beat. KV-cache quantization is not new — every serious inference stack already does some version of it. The paper's actual contribution is proving you can push compression near its mathematical limit, online, without retraining. And the math sat in public view for eleven months before anyone sold a share on it.

The market didn't reprice a preprint this week. It repriced a narrative.

## Two Sessions of Damage

The reaction, across the complex:

| Company | Listing | Session | Move | What they sell |
|---|---|---|---|---|
| SK Hynix | KRX | Thu, Mar 26 | **-6%** | DRAM, HBM |
| SanDisk | US | Wed, Mar 25 | **-5.7%** | NAND flash |
| Samsung Electronics | KRX | Thu, Mar 26 | **~-5%** | DRAM, HBM, NAND |
| Western Digital | US | Wed, Mar 25 | **-4.7%** | Hard drives |
| Seagate | US | Wed, Mar 25 | **-4%** | Hard drives |
| Micron | US | Wed, Mar 25 | **-3%** | DRAM, HBM, NAND |

<Chart name="TurboQuantSelloffChart" />

Two things about this tape are more informative than the headline.

First, the context. These stocks were coming off one of the great runs in semiconductor history — Korean coverage of the selloff put the trailing-12-month gains at roughly **+200% for Samsung** and **+300% for SK Hynix and Micron**. Brokers in Seoul argued the pullback said as much about profit-taking as about Google, and the same week carried a broader AI-related selloff. Deutsche Bank told clients to "continue to brace themselves for continuous AI-related disruption." When a sector is priced this perfectly, any plausible story will do as a trigger.

Second, look at *who* fell. Seagate makes hard drives. The KV cache lives in HBM, bolted to a GPU — it will never touch a hard drive. A 4% drop in an HDD maker on a KV-cache paper is not analysis; it's a sector ETF being sold. The market processed "Google made AI memory smaller" without reading past the headline.

Which, to be fair, is exactly what the bears say happened to the bulls on the way up.

## The Bear Case, Properly Stated

Strip away the panic and there is a real argument here. Commodity memory is priced at the margin: profits swing on the gap between supply growth and demand growth, and right now that gap is historically wide in the suppliers' favor.

One week before TurboQuant hit the tape, Micron reported its fiscal second quarter:

| Metric | FQ2 2026 (reported Mar 18) |
|---|---|
| Revenue | **$23.86B**, +196% YoY |
| Non-GAAP EPS | **$12.20** vs $8.79 expected |
| Gross margin | **75%** |
| Next-quarter revenue guide | **$33.5B** ± $0.75B |
| Next-quarter gross margin guide | **~81%** |
| FY2026 capex | Raised to **$25B+**, from $20B |
| 2026 HBM supply | **Sold out**, on multi-year contracts |

An 81% gross margin guide is not a normal number for a memory company. Margins like that exist only under genuine scarcity — which is exactly what software like TurboQuant attacks. If the fastest-growing memory workload suddenly needs a sixth of the capacity per unit of work, the marginal gigabyte gets cheaper and the pricing power behind those margins erodes, even if total bits keep growing.

That's the MP3 logic. Compression didn't shrink the demand for music — people listened to more music than ever after 1999. It destroyed the economics of the physical layer underneath: the disc, the pressing plant. Demand survived; the people selling the *containers* didn't.

So the bear case isn't "AI needs less memory." It's "the container business is about to lose its scarcity premium."

## The DeepSeek Precedent

Everyone reaching for a comparison this week landed on the same one. Cloudflare CEO Matthew Prince put it in six words: "This is Google's DeepSeek. So much more room to optimize AI inference."

Recall the original. On January 27, 2025, a Chinese lab's claim that it had trained a frontier-class model for a fraction of the assumed cost knocked **17%** off NVIDIA in one session — **$589 billion** of market value, the largest single-day loss in US market history. The thesis was identical in shape to this week's: efficiency software destroys hardware demand.

What NVIDIA did next:

<Chart name="TurboQuantDeepSeekChart" />

The panic was not instantly refuted — that part matters. NVIDIA closed March 2025 about **10%** below even its post-crash January close, and didn't decisively reclaim its old highs until June. But the demand thesis resolved emphatically: companies didn't run less AI because it got cheaper; they ran more of it. From January's month-end close to October's, NVIDIA gained **69%**, and on October 29, 2025 it became the first company in history to close above a **$5 trillion** market cap.

Economists call this Jevons' paradox: make a resource cheaper to use and total consumption of it rises. AI has been a Jevons machine at every step. SemiAnalysis analyst Ray Wang made the same argument about TurboQuant this week: resolve the memory bottleneck and you don't get less AI hardware demand — you get more capable models that strain the next bottleneck.

The KV cache has its own version of this. The reason Google is compressing it at all is that context windows keep growing — agents, million-token documents, video. If serving a million-token context gets 6x cheaper, the product roadmap doesn't bank the savings. It ships a ten-million-token context.

## Where the MP3 Analogy Breaks

So is memory the pressing plant, or is it NVIDIA in January 2025? This is the question the whole trade hangs on, and the honest answer is that the analogy fails in specific, checkable ways.

**TurboQuant compresses one line item, not the bill.** The KV cache is a large and fast-growing slice of inference memory, but weights, activations, and everything in the training pipeline still demand the same HBM as before. The MP3 compressed *the entire product*. TurboQuant compresses the scratchpad.

**It's a lab result, not a deployment.** Zero quality loss has been shown on 8B-parameter open models on academic benchmarks. Production frontier serving is a different animal, and Google has announced no deployment, no Gemini cost reduction, no TPU memory savings. The conference talk is next month.

**The suppliers don't look like pressing plants.** CD pressing was fragmented and replaceable. HBM is made by exactly three companies, all sold out for 2026 — SK Hynix and Micron have said so explicitly, and both Samsung and SK Hynix warned on recent earnings calls that the shortage could persist into 2027. When demand is rationed, efficiency gains get absorbed as relief, not destruction.

And one more reading, the one we find most persuasive: **you don't invent extreme compression for an abundant resource.** Google employs some of the best researchers alive and pointed them at squeezing 3 bits out of 16 — because even Google cannot buy enough memory at acceptable prices. TurboQuant is not evidence that the memory shortage is ending. It's evidence of how binding the shortage is.

## The Honest Caveat

Now the part the bulls skip. Jevons' paradox is a pattern, not a law of physics — and it has never promised that *every layer of the stack* captures the rebound.

After DeepSeek, demand came roaring back to NVIDIA — a company with a near-monopoly on its bottleneck and an ecosystem moat. Memory is different: it's a commodity oligopoly whose pricing power appears and disappears with the supply-demand gap. DRAM has spent most of its history in brutal boom-bust cycles. The bulls' "sold out through 2026" is real, but sold-out is a description of today's gap, not a property of the product. Supply is responding hard: Micron just raised capex 25%, and Samsung is reportedly planning a ~50% HBM capacity increase for 2026.

The competitive picture is shifting too — and not in the direction the knee-jerk trade assumed:

<Chart name="TurboQuantHbmShareChart" />

TrendForce projects SK Hynix's HBM bit share falling from **59% to ~50%** in 2026 as Samsung — which spent 2025 as the laggard, lost its annual-profit crown to SK Hynix for the first time, and only recently cleared NVIDIA's HBM4 qualification — climbs from **20% to ~28%**. The company with the most scarcity premium embedded in its price (SK Hynix supplies roughly two-thirds of NVIDIA's HBM4 for the upcoming Rubin platform) is the one with the most to lose if software relief and new supply arrive together. The diversified laggard arguably has the least TurboQuant exposure of the three.

So the honest statement is: TurboQuant alone probably doesn't end the memory cycle — but the memory cycle was always going to end the way memory cycles end, and a 6x compression paper is exactly the kind of headline that marks the moment investors start asking when.

## What This Means

Nobody resolves this debate today, including us. But the signals that will resolve it are specific:

1. **Deployment, not papers.** Watch whether TurboQuant-style quantization ships in production inference stacks — and especially whether Google ever discloses Gemini serving-cost or TPU memory savings from it.
2. **Contract prices, not tape.** HBM now sells on multi-year contracts. Real demand destruction would show up first in renegotiations and spot DRAM pricing, not in a Thursday selloff in Seoul.
3. **Micron's next print.** Guidance of $33.5B revenue at 81% gross margin is the boldest demand statement in the sector. The next quarter either validates it or becomes the first crack.
4. **The context-length race.** If model providers respond to cheaper KV caches by shipping dramatically longer contexts at flat prices, that's Jevons doing its work — the savings consumed, not banked.
5. **The DeepSeek clock.** NVIDIA needed five months to reclaim its highs and nine to make history. Efficiency scares in AI have so far marked repricings of *who* captures the spend, not reductions of the spend itself.

The MP3 didn't kill music. It killed the companies that confused the container for the product. The question this week posed — and didn't answer — is whether memory makers are selling the product or the container. The next two quarters of HBM contracts will say more than any paper.

---

*Data: Barebone | Sources: Google Research blog (Mar 24, 2026), TurboQuant paper (arXiv, ICLR 2026), Micron FQ2 2026 earnings (Mar 18, 2026), TechCrunch, KED Global, Digital Today, CNBC, TrendForce | Data as of March 26, 2026*
