· Gustav Söderström

How Spotify Thinks — Gustav Söderström on Invest Like the Best

Spotify survives technology shifts by prototyping + stack-ranked bets, running a fully synchronized leadership team, and demanding explanations (not pattern recognition) for why anything works — even A/B winners.

spotifyai-or-diebets-boardsuper-appbundlingproduct-strategymusic-industrydeutschgood-explanationsynchronized-orgfree-tiermarginal-cost-aipodcast-exclusivitymeasure-inputs95% confidence

Why this is in the corpus

Rare operator-dense view into how a 700M-user super-app allocates capital (bets board), runs product (E-Team), embraces AI without overfitting to the current moment, and rebuilt its business model (free shuffle tier) from first principles.

Summary for skimmers

Gustav Söderström walks through Spotify's operating system: a VC-style "bets board" where ~44 bets from 14 VPs are stack-ranked every 6 months; a 3-hour Tuesday E-Team meeting where no topic goes "offline" and direct reports are banned so VPs must know their own details; prototyping the next 6 months in Figma/AI tools before committing to synchronize the super-app org; David Deutsch's "good explanation" bar — falsifiable, has reach, hard to vary — applied to product decisions (no launch without a theory); the macro-wind / "AI or Die" framing; generative AI flipping consumer products from asymmetric downlink to symmetric conversation; admitting podcast exclusivity was a bad bet and reversing quickly; the shuffle-mode free tier as a first-principles answer to YouTube's foreground ad model; Spotify as the de facto R&D department of the music industry (15 years unprofitable, labels profitable throughout); Bezos-style "measure inputs, not outputs" culture that lets Gustav survive failed launches like the Moments UI.

Briefing

What survives the editorial filter

This page should feel like a smart colleague already listened for you and left only the operating logic worth keeping. Not everything said in the episode makes it through.

Trust signal

Direct episode extraction

Best used for

Decision-grade retrieval metadata not yet added for this episode.

Hold lightly

No explicit downgrade reason stored yet for this episode.

Principles

Durable claims that survive beyond the speaker's biography — each with explicit limits, transferability judgment, and evidence.

Principle

Admit bad strategy and reverse — defending past decisions is the real cost

Two ways to be right: always guess right, or change your mind when wrong. The second is cheaper.

There are two ways to always be right. One is to always guess right. The other is to just change your mind whenever you're wrong... The real cost is when you try to defend your past decisions.

Principle

Ban "offline" and "later" in executive meetings — resolve in the room

Real-time resolution compounds; deferral compounds faster. With all decision-makers present, deferral is a choice, not a necessity.

You're not allowed to say the word 'offline' or 'later' because that person is in the room... Very simple in theory, but incredibly powerful in practice.

Principle

Stack-rank every bet globally — equal priority is a decision punted to the org

Always stack-rank; refusing to rank is how leaders unknowingly set their orgs up for political fighting.

Very few people manage to say this is actually more important than that. They're just saying these things are very important, both of them... If you as a leader don't bring clarity, you're going to set your org up for fighting.

Principle

Prototype the next 6 months before committing — synchronize disagreement early

Render the future visually before committing so alignment is forced while changes are still cheap.

What I've tried to do now, together with Alex Nordstrom, we synchronized the entire company... we prototype everything up front. So all this so-called fighting happens before you actually commit to doing something.

Principle

No direct reports in the executive meeting — force VPs to know their own details

Executives should be able to defend their own work without backup; rotating participants kills candor.

You're not allowed to bring anyone else in to explain your thing. You have to be on top of it enough to explain it to yourself. Over time these groups get very tight... People can be honest, no one is afraid.

Principle

Measure inputs, not outputs — good ideas that fail should still be rewarded

Judging outputs promotes the lucky; judging inputs gives good reasoners more at-bats until they hit.

Daniel was like, 'I understand. I agreed with the thoughts and the ideas. What was the mistake?'... I judged you by the inputs you had, not the outputs. That made me actually take more risk instead of scaling down.

Principle

Never launch an A/B winner without a theory of why it works

Require a causal explanation, not just an A/B lift, before launch — explanations scale across the org, pattern recognition doesn't.

I don't want to launch it until you have a good theory of why it works... if you figure out the why, it's the difference between pattern recognition and actually understanding something.

Frameworks

Reusable systems and operating models — including when they help and when they break.

Framework

Good-Calories Litmus (nutrition test for product)

Subscription model frees you from engagement-at-any-cost; pick verticals that produce "good calories" and you compound retention instead of guilt.

  1. Test 1: Post-hour feeling — energized vs. guilty
  2. Test 2: Parental-time-transfer — do parents push kids INTO it or OUT of it
  3. Test 3: Is it in line with the existing "nutritious" mission?
  4. Green-light if it passes all three
Use when: Choosing new bundle additions, vetoing feature ideas that would optimize short-term engagement at the cost of user regret.
Skip when: Ad-supported businesses where regret isn't penalized by the business model; discovery features where "junk food" engagement is the whole product.
If you lose an hour on Spotify, how do you come out feeling versus if you lose an hour doom scrolling in the bathroom... We see parents restricting screen time for their kids and saying, 'Go to Spotify instead.'

Framework

The Bets Board (6-month VC-style stack rank)

A structured ritual that combines bottoms-up idea generation with global top-down prioritization, replacing political allocation with a transparent rank.

  1. VPs pitch bets as if they were startups pitching a VC
  2. Co-presidents stack-rank all bets 1..N globally
  3. Orgs resource from the top down until capacity is exhausted
  4. Orgs COMMIT to what they can deliver (bottoms-up commitment)
  5. Execute for 6 months
  6. Prototyping phase for NEXT 6 months runs in parallel
Use when: Large, multi-team product orgs that need to allocate scarce engineering time across many competing bets without letting VP politics decide.
Skip when: Small teams (<50) where a single roadmap works; or pure research orgs where scheduled commitment destroys serendipity. Also fails if planning tooling is weak — overhead exceeds execution.
Every six months, these VPs, they pitch, literally pitch, as if we were a VC and they were a startup... This time we have 44 bets. We stack rank them from 1 to 44.

Framework

Deutsch's Good Explanation bar

An explanation you can swap characters in (like a conspiracy theory or Thor-causes-thunder) is too easy to vary; a theory where parameters are load-bearing is close to truth.

  1. Test 1: Is it falsifiable?
  2. Test 2: Does it have reach? (works at multiple scales / domains)
  3. Test 3: Is it hard to vary? (swap a parameter → prediction breaks)
  4. Reject: pattern recognition dressed as reasoning
  5. Accept: a theory that survives parameter perturbation
Use when: Evaluating competing product/strategy explanations, filtering plausible-sounding rationales from durable ones, onboarding senior hires into structured reasoning.
Skip when: Pure exploratory brainstorming where premature falsification kills options too early.
A good explanation has to be hard to vary. If you move one of the parameters, the entire thing is not predictive anymore, then you're probably close to the truth.

Framework

Willingness-to-Pay vs Willingness-to-Sell value stick (Oberholzer-Gee)

Bundling + keeping price far below WTP is how Spotify manufactures consumer surplus; mission + culture lower willingness-to-sell so talent stays below market wage.

  1. Increase willingness-to-pay (stack value: music + podcast + books + video)
  2. Keep actual price far below willingness-to-pay
  3. Decrease willingness-to-sell via mission + culture, not just wages
  4. Capture value only where the gap is widest
Use when: Bundled consumer subscriptions that need to justify price raises over time; talent markets where cash alone won't win.
Skip when: Zero-margin commodity businesses where there is no surplus to divide; early-stage startups that can't afford mission-over-cash hiring.
Our goal as a service is to make sure that Spotify is just an amazing deal. You're always going to feel the willingness to pay the actual value you perceive is way over the price.

Signals

What appears to be shifting, for whom it matters, and what happens if you ignore it.

Signal

AI has non-zero marginal cost — business models will tier by inference consumption

The next wave of consumer pricing will look more like Spotify's label-royalty model (per-use cost must be recovered) than Twitter's 2010s model (worry about monetization later).

The previous VC model was you make a big upfront investment and you get to almost zero marginal cost. That's how software worked, that's not how AI works. The marginal cost is high and you need to cover it... you're probably going to see more tiering of consumer products based on how much inference you want.

Signal

Non-developers are starting to use Cursor via MCP

The bottleneck for AI inside big companies is no longer AI engineering — it's boring old-school API exposure. Once data is real-time and MCP-wrapped, the user base of AI-native tools explodes past developers.

I had one of my PMs who doesn't speak Swedish, she did her taxes in Cursor, managed to wrap the Swedish tax authority in an MCP, not a developer. So I think it's going to grow outside of developers.

Signal

Big-company coding speedup from AI is ~7% today — but the unlock is yet to come

Public-market expectations of AI productivity gains at large companies are temporarily inflated; the durable gains will come from refactor-capable models + non-coding workflows, not Cursor-style autocomplete.

I've seen studies from other big companies that if you actually measure out of a developer's time, the speed up is 7% or something, which sounds very disappointing... but I think it's going to turn into the opposite. I think it's going to have tremendous impact over the longer term.

Opportunities

Only included where there is a buyer, a real wedge, and a plausible revenue path — not vague idea theater.

Opportunity

Wrap legacy enterprise data in MCP so the non-engineer 80% can reason over it

Boring-but-critical infra work: API-ify every cold dataset, wrap in MCP, ship an internal AI workbench per skill-group.

Wedge: Start with one workflow (e.g. "query 15 years of contracts") at orgs with deep structured data — banks, pharma, telcos, media.
Why now: LLMs cheap enough to be a reasoning substrate + MCP stabilizing as the standard + non-engineers demanding it. Three-way convergence that didn't exist 12 months ago.
Actually my biggest job to enable AI is not AI engineering, it's old-school engineering exposing all this data.

Opportunity

Mainstreaming audiobooks via subscription bundling (à la Nordics)

Bundle audiobooks into Premium with a generous monthly cap + top-up — exactly Spotify's playbook.

Wedge: Audio/content bundles that can license publisher catalogs — not just Spotify; also niche literary apps (e.g. Substack + audio).
Why now: Consumer willingness to stack subscriptions has peaked; bundled audiobooks land inside an existing subscription.
Audiobook was a very niche behavior. 10, 11 million or something... you can see in the Nordics where you have the access model that it's getting very mainstream.

Opportunity

Product-overhang exploitation — ship two years of features on today's models

Aggressive product refactoring on today's GPT/Claude-class models: rebuild core workflows as two-way conversation, not downlink-heavy UIs.

Wedge: Mid-stage consumer products with large user bases — Notion, Duolingo, any media subscription — not AI-native startups already doing this.
Why now: Gustav explicitly subscribes to product overhang; inference cost is dropping fast enough that the economics already work.
I subscribe to the product overhang idea that there's a huge product overhang and if we froze, I think we would see products shipped that look amazing for several years before we exhausted what we have.

Lessons still worth keeping

Useful takeaways that did not fully clear the bar for durable principle status.

Lesson

Podcast exclusivity — betting on celebrity content in a low-production-cost medium

Exclusivity is powerful when content is capital-intensive and content-picking skill is rare. In podcasts, neither held — they should have followed the YouTube model from the start.

Before copying a content-strategy from another medium, check whether the underlying economics (production cost, talent supply) match. If they don't, the strategy inverts.

The macro trend for podcasts was that the production cost was so low... to go in and do exclusivities on top of that is counter-purpose. The whole point is more like YouTube... We also betted a lot on celebrities and they are celebrities, but they're not always good podcast hosts.

Lesson

The free shuffle tier — first-principles reasoning beat pattern-matching YouTube

When the pattern-matched move exists, reason from underlying usage data instead. Foreground ads were a local optimum; 91% of actual listening was background.

Even inside a company, pattern-matching to a visible competitor feels safer than first-principles reasoning — but the first-principles answer is where durable differentiation lives.

Turns out back then it was 9% or something. So you have 91% of the use case being in the background... Even the people inside the company said that's a terrible idea. But we trusted the data... This is what made growth explode.

Lesson

The Moments UI — shipped ahead of the underlying ML

Great product vision + weak underlying technology = premature launch. Even a clean A/B result can hide an instrumentation bug when the UI is radically new.

Don't ship a UI paradigm that requires capability your stack doesn't yet have — and treat "A/B looks okay" on a novel surface with extreme skepticism.

The idea was far ahead of where the technology was and it costed a lot of money. We actually announced it. We had A/B tested it and it looked okay, which is what we launched. Then we discovered there was a bug in the A/B test when it was live. And it actually underperformed drastically what we had.

The Plays

Try these this week

Verb-first executable actions — each one tied to a stated outcome in the episode.

Launch a Free Shuffle-Only Tier Using Premium Engagement Data

Outcome: If premium users spend ~50% of time shuffling, offer shuffle-only access for free — it captures roughly half the value but never 100% of any user's need, minimizing cannibalization while driving top-of-funnel growth.

Gustav Söderström — How Spotify Thinks — Gustav Söderström on Invest Like the Best
Gustav Söderström
  1. 1

    Analyze premium user behavior to identify the single highest-usage feature or mode.

    At Spotify, 50% of premium listening sessions were shuffle.

  2. 2

    Model cannibalization risk: confirm that the feature represents a large share of aggregate usage but not 100% of any individual's consumption.

    If it's 50% of sessions, no user relies on it exclusively, so free access won't fully replace premium.

  3. 3

    Launch a free tier restricted to that single mode (e.g., shuffle-only playback).

    At Spotify, this became the free mobile experience.

  4. 4

    Track both free-tier growth and premium conversion/churn to validate the no-cannibalization hypothesis.

    Spotify saw 'growth explode' without material premium erosion.

Stop or pivot when

  • If the feature accounts for ≥100% of any significant user cohort's usage, cannibalization risk is too high

Before you start

  • · Detailed usage telemetry by feature/mode
  • · Ability to gate features at the product level (e.g., shuffle vs. on-demand)
  • · Willingness to launch a lower-value tier despite internal skepticism
freemium-tier-designgrowthcannibalization-modeling1-1010-50

Require a Falsifiable Theory Before Launching Winning A/B Tests

Outcome: Even when an A/B test shows positive results, insist that the team articulate a 'hard to vary' explanation for why it works before you ship — ensuring the learning scales across the organization.

Gustav Söderström — How Spotify Thinks — Gustav Söderström on Invest Like the Best
Gustav Söderström
  1. 1

    Run the A/B test and observe the metric lift.

    Standard experimentation; measure whether the treatment wins.

  2. 2

    Before greenlighting launch, ask the team for a theory that explains why the test worked.

    The theory must be falsifiable, have reach (scale), and be hard to vary (changing one parameter breaks the predictiveness).

  3. 3

    If no coherent theory emerges, hold the launch or run additional experiments to uncover the mechanism.

    Pattern recognition alone ('it worked in the test') is not sufficient justification.

  4. 4

    Document and share the theory so the entire org can apply the learning to future decisions.

    This is how insights compound across teams.

Stop or pivot when

  • If the team cannot articulate a hard-to-vary explanation, do not launch the winning variant

Before you start

  • · Active A/B testing infrastructure
  • · Organizational norm that metrics alone do not justify launches
  • · Leadership willing to delay or reject statistically significant wins without theory
experimentation-rigorknowledge-scalinglaunch-gating1-1010-5050+

Hold a Weekly All-VPs Escalation Meeting with No-Offline Rule

Outcome: Run a recurring all-VPs meeting (e.g., Tuesday) where blocked issues are escalated, 'offline' discussions are banned, and no direct reports attend — forcing senior leaders to resolve cross-functional blockers in real time.

Gustav Söderström — How Spotify Thinks — Gustav Söderström on Invest Like the Best
Gustav Söderström
ongoing weekly cadence1 per 7 days
  1. 1

    Schedule a weekly all-VPs escalation meeting (e.g., every Tuesday).

    With five-day work weeks, no one waits more than 2.5 days on average to escalate a blocker.

  2. 2

    Enforce a 'no offline' rule: when someone says 'let's take that offline' or 'I'll talk later,' immediately require resolution in the room.

    The goal is to eliminate deferred decisions and force closure on the spot.

  3. 3

    Ban direct reports from attending; only VPs may participate.

    This forces VPs to own details and resolve issues themselves rather than delegating on the fly.

Before you start

  • · VP-level or equivalent leadership cohort who own cross-functional outcomes
  • · Cultural buy-in to real-time decision-making and no deferral norm
escalation-hygienecross-functional-coordinationdecision-velocity10-5050+

Run Six-Month VC-Style Bet Pitches with Stack-Ranked Resourcing

Outcome: Every six months, VPs pitch bets as if to a VC, the company stack-ranks them (e.g., 1–44), then resourcing teams work top-down until capacity is exhausted, ensuring only the highest-conviction initiatives get funded.

Gustav Söderström — How Spotify Thinks — Gustav Söderström on Invest Like the Best
Gustav Söderström
six months40 per 180 days
  1. 1

    Hold a pitch session every six months where VPs present their proposed bets as if pitching a VC.

    Treat the session formally; each VP makes the case for why the company should back their initiative.

  2. 2

    Stack-rank all submitted bets from highest to lowest priority.

    The example given was 44 bets ranked 1 through 44; typical range is 30–50 bets.

  3. 3

    Hand the ranked list to resource-allocation teams and start resourcing from the top.

    Teams work down the list until they hit capacity (e.g., 'maybe they get to 30' out of 44).

  4. 4

    Commit only the top bets that can be resourced over the next six months.

    Unfunded bets below the cut line are deferred or killed for the cycle.

Stop or pivot when

  • If a bet cannot be resourced after working top-down through the stack rank, it is not executed in that cycle

Before you start

  • · VP-level or equivalent leadership team with ownership of strategic initiatives
  • · Ability to estimate resourcing capacity for a six-month window
  • · Organizational willingness to kill or defer lower-ranked bets
roadmap-prioritizationresource-allocationportfolio-management10-5050+

Tensions surfaced

Contradictions and trade-offs the episode raises — judgment calls a thoughtful operator has to navigate.

Tension

Build for today's AI workflows or wait for the next model

Ship velocity vs overfitting risk. Every feature you ship is effectively a bet on a snapshot of capability that will be obsolete before it pays back.

You're somewhere right now, but we're pretty certain that that somewhere is on this curve... you don't want to overfit too much to the moment.

Tension

Synchronized super-app vs divide-and-conquer speed

Global changes at scale require synchronization. Rapid local experimentation requires decoupling. The same org cannot do both equally well.

We're good at doing global changes, like changing the entire UI because we're synchronized, but we're probably much slower than other companies at trying something. It's not the right one. It's the right one for us.

Tension

Per-stream payout metric is lower when your product is BETTER

Engagement quality drives the per-stream metric down even as it drives aggregate label payouts up. Creator-facing transparency and shareholder-facing logic pull opposite directions.

These other companies have higher per stream because they have a worse product... we have twice the engagement and half the churn of competing services. So that's a curse.

Corpus connection

Where this episode fits for retrieval

What kinds of decisions this briefing is best pulled into.

Primary decisions

  • product-strategy
  • capital-allocation
  • business-model
  • hiring-culture