Database sharding, or shared nothing partitioning is a technique that helps sites with massive amounts of data scale. Usually sharding is built into the database technology, either RDBMS or NoSQL. Twitter has released Gizzard, a middleware networking service that allows you to shard data across arbitrary backend datastores.

The partitioning rules are stored in a forwarding table that maps key ranges to partitions. Each partition manages its own replication through a declarative replication tree. Gizzard supports “migrations” (for example, elastically adding machines to the cluster) and gracefully handles failures. The system is made eventually consistent by requiring that all write-operations are idempotent and as operations fail (because of, e.g., a network partition) they are retried at a later time.

Gizzard handles both physical and logical shards. Physical shards point to a physical database backend whereas logical shards are trees of other shards.

diagram

Gizzard supports advanced features such as fault tolerance, replication, and migrations. Twitter has a nice README with a great amount of documentation.

[Source on GitHub] [README]


Have comments? Send a tweet to @TheChangelog on Twitter.

Subscribe to The Changelog Weekly – our weekly email covering everything that hits our open source radar.