Refactoring Large, Stubborn Codebases

Jake Zimmerman
Getty D. Ritter

November 19, 2024

Complaints about stubborn codebases

  • Our code isn’t modular enough!
  • This dependency is 10 years out of date!

  • We need to change how we talk to the database!

→ We can refactor to a happy state!

Best to centralize the refactor ☀️

Have one team drive the refactor:

  • concentrates expertise
    most problems will be repeat problems

  • incentivizes automation
    fewer engineer-hours overall

  • more likely to finish
    no need to wait for each team to plan and prioritize

Centralized migration needs two things:

Leverage over the codebase

Way to ratchet incremental progress


To
you need to
and to

refactor a large, stubborn codebase
have a point of leverage
pick good ratchets.


Agenda 🎯


  • Improving developer satisfaction with Sorbet

  • Making a Ruby monolith more modular

  • Lessons learned from ratchets ratcheted


🥺 Stripe’s developers were unhappy

“hard to understand”
“waiting for tests is slow”
“only breaks in production”
“don’t trust the docs”
“too much low-quality code”

 
 
 
 
 

(sentiment from company-wide survey)

💡 Building Sorbet introduced leverage

“hard to understand”
“waiting for tests is slow”
“only breaks in production”
“don’t trust the docs”
“too much low-quality code”

IDE aids understanding
all code type checks in seconds
type checker catches bugs in CI
runtime makes types trustworthy
bad code is hard to type

(sentiment from company-wide survey)

Brief history of Sorbet 👨‍🏫

  • Began fall 2017
  • Stripe: ~800 employees (200 – 400 engineers)
  • Initial project: 3 engineers, full time

Timeline

  • 9 months to build Sorbet…
    … but has served as the foundation for hundreds of codemods
  • 3 months to get to 75% adoption…
    …contained to just three engineers

Aside: you can do it too! 👷🏼‍♀️

Tools that you can use to bootstrap something:

Ratcheting with # typed comments

# typed comment at the top of each file

# typed: false
# typed: true
# typed: strict

→ just syntax and constants
→ inference in methods
→ every method needs a signature

typed: falsetrue

💡 local, incremental, and actionable

Alternatives to # typed: comment:

  • by folder → too broad
    (not local enough, not incremental enough)

  • by coverage percent → too granular
    (noisy, hard to action)

actionable = high signal, low noise

Developer satisfaction improved because

we
by
and

refactored a large, stubborn codebase
having a point of leverage (Sorbet)
picking good ratchets (# typed:)

Agenda 🎯


  • Improving developer satisfaction with Sorbet

  • Making a Ruby monolith more modular

  • Lessons learned from ratchets ratcheted


Why do we need modularity?

Simple example

# a toy logger
class Logger
  def log(message, **storytime)
    payload = storytime.map do |k, v|
      "#{k}=#{v.inspect}"
    end.join(" ")

    @output.puts("#{Time.now.to_i}: #{message} #{payload}")
  end
end

Simple example

# elsewhere
logger.log("Attempting operation", my_op, m)
# 1730756308: Attempting operation op=:update merchant=#<Merchant id=22 secret="hunter2">

Simple example

    # ...
    payload = storytime.map do |k, v|
      if v.is_a?(Merchant)  # if we're logging a merchant...
        "#{k}=Merchant(id=#{v.id}, ...)"  # redact most fields
      else
        "#{k}=#{v.inspect}"  # other objects can be logged as-is
      end
    end.join(" ")
    # ...

Simple example

    # ...
    payload = storytime.map do |k, v|
      if v.is_a?(Merchant)  # if we're logging a merchant...
        "#{k}=Merchant(id=#{v.id}, ...)"  # redact most fields
      else
        "#{k}=#{v.inspect}"  # other objects can be logged as-is
      end
    end.join(" ")
    # ...

Well-intentioned changes can produce tangled code

…and tangled code has non-local effects!

Why do we need modularity?

Tangled code is…

  • difficult to debug
  • difficult to test
  • prone to larger deploys or builds
  • prone to higher runtime memory usage

A drag on both developer velocity and runtime performance

Point of leverage: packaging

# lib/logger/__package.rb
class Logger < PackageSpec
  import Merchant

  export Logger
end

# lib/merchant/__package.rb
class Merchant < PackageSpec
  import Logger

  export Merchant
end

Point of leverage: layering

The essential principle is that any element of a layer depends only on other elements in the same layer or on elements of the layer ’beneath’ it. Communication upward must pass through some indirect mechanism.

—Eric Evans, Domain-Driven Design: Tackling Complexity in the Heart of Software

Point of leverage: layering

Point of leverage: layering

Point of leverage: layering

class Logger < PackageSpec
  layer 'utility'
  import Merchant # <- ill-layered import!
  export Logger
end

class Merchant < PackageSpec
  layer 'business'
  import Logger
  export Merchant
end

Building a ratchet: strict_dependencies

Level zero: 'false'

Level one: 'layered'

Level one: 'layered'

Level two: 'layered_dag'

Level two: 'layered_dag'

Level three: 'dag'

Level three: 'dag'

Building a ratchet: strict_dependencies

  • strict_dependencies 'false'
  • strict_dependencies 'layered'
  • strict_dependencies 'layered_dag'
  • strict_dependencies 'dag'

Developer velocity and production latency are improving because

we
by
and

are modularizing a large, stubborn codebase
having a point of leverage (packages and layering)
picking good ratchets (strict_dependencies)

What makes a good ratchet?

  • local
    • Sorbet: per-file
    • Dependencies: per-package
  • incremental
    • Sorbet: false true strict
    • Dependencies: false layered layered_dag dag
  • actionable
    • Sorbet: “Where do I need types in my current files?”
    • Dependencies: “What bad edges can I remove from my current package?”

Agenda 🎯


  • Improving developer satisfaction with Sorbet

  • Making a Ruby monolith more modular

  • Lessons learned from ratchets ratcheted


How can this approach fall down?


In theory, there is no difference between theory and practice. In practice, there is.

— Walter J. Savitch, relaying a quote overheard at a computer science conference


Beyond leverage and ratchets:

Important to have:

  • a reason to refactor
  • comprehensive documentation
  • targeted tooling
  • organizational support

Tools aren’t always perfect at first!

Tools aren’t always perfect at first!


  • Originally packaging was also runtime-enforced
    • …but this was invasive and potentially risky.
  • Originally, we intended exports to be hand-written
    • …but this slowed down developer velocity unacceptably.


Corollary: don’t rush the launch!


  • First impressions matter a lot!
  • For Sorbet:
    • …run in quiet mode for a while
  • For packages:
    • …proved out on our CI service before we moved on to other code.


Who ratchets the ratchets?

Our approach: two-level ratchets


class MyPackage < PackageSpec
  sorbet 'strict'
  # ...
end


Our approach: two-level ratchets


class MyPackage < PackageSpec
  sorbet 'strict', 'true'
  # ...
end


Our approach: two-level ratchets

Our approach: two-level ratchets

Our approach: two-level ratchets

Sorbet supporting tooling

  • Metrics and dashboards
  • Sorbet autocorrects
    • “autofix these problems to ratchet up”
    • makes the ratchet actionable even post-migration
  • Editor integration
    • turns unactionable (low-signal) error into actionable

Packager supporting tooling


  • gen-packages, for automatically fixing up imports and exports
    • …and visualizing error messages when we hit the ratchet!
  • Package Explorer, for visualizing package dependencies
  • Dependency Doctor: “I’ve hit a dependency issue, what do I do?”
    • …including automated codemod suggestions!
  • Editor integration for packaging errors and fixes
    • …immediate feedback is invaluable!



One team can
by
and

refactor a large, stubborn codebase
having a point of leverage
picking good ratchets


and don’t forget to be patient—it’s hard work!

Questions?