Summary
Figma’s database stack has grown almost 100x since 2020. To keep our databases from toppling, we needed a bigger lever. We outlined a number of goals and must-haves to tackle short-term challenges while setting us up for smooth long-term growth.
Horizontal sharding is the process of breaking up a single table or group of tables and splitting the data across multiple physical database instances. The company wanted to avoid complex solutions like double-writes that are challenging to implement without taking downtime or compromising on consistency. This reduced the risk of being stuck in a bad state when unknown unknowns occur.
Horizontal sharding was an order of magnitude more complex than our previous scaling efforts. Once a table is horizontally sharded at the application layer, it can support any number of shards at the physical layer. We can always scale out further by simply running a physical shard split.
Horizontal sharding adds many data model constraints that revolve around the shard key. We considered using the same sharding key for every table, but there was no single good candidate. Instead, we selected a handful of sharding keys like UserID, FileID, or OrgID.
To support horizontal sharding, we had to significantly re-architect our backend stack. To support this, we built out a new golang service, DBProxy. DBProxy sits between the application layer and PGBouncer. It includes logic for load-shedding, transaction support, database topology management, and a lightweight query engine.
We wanted to simplify our API to minimize DBProxy’s complexity. We explored partitioning the data using separate Postgres databases or Postgres schemas. This would have required physical data changes when we logically sharded the application. Instead, we chose to represent our shards with Postgres views.
Horizontal sharding will be a multi-year investment into Figma’s future scalability. To remove our last scaling limits and truly take flight, we will need to horizontally shard every table at Figma. A fully horizontally sharded world will bring many other benefits: improved reliability, cost savings, and developer velocity.
We’ve made a lot of exciting progress on our horizontal sharding journey, but our challenges are just beginning. Stay tuned for more deep dives into different parts of our horizontalSharding stack. If you’re interested in working on projects like this, please reach out!
Safe bet is that they've run into vendor lock-in with AWS. There was a response medium post that went into this.
I'm usually one to build a lot myself to avoid being stuck with a dependency I don't own so I don't blame figma wanting to do something similar. They should be able to port a lot of this to a new platform should they choose to leave AWS and not be stuck with worrying if an extension is supported or not.