Lark is a REST interface for Redis #

You might have seen our post on webdis a couple years ago. Like webdis, Lark is a REST interface for Redis.

At it’s core it’s just a way of transforming HTTP requests into redis commands, but it comes with a few additions to make this a little more sane.

It comes with a Flask blueprint and a Django app, but it should work with any python web framework.

Disclaimer: Alex (this post’s author) is the creator of Lark.

Tesseract: fast n-dimensional filtering and grouping of records in the browser #

Square has released Tesseract, a JavaScript library for filtering and grouping datasets in the browser.


A tesseract represents a multi-dimensional dataset and contain arrays of JavaScript objects or primitives:

var payments = tesseract([
  {date: "2011-11-14T16:17:54Z", quantity: 2, total: 190, tip: 100, type: "tab"},
  {date: "2011-11-14T16:20:19Z", quantity: 2, total: 190, tip: 100, type: "tab"},
  {date: "2011-11-14T16:28:54Z", quantity: 1, total: 300, tip: 200, type: "visa"},

A tesseract dimension is created by providing an accessor function that does not modify the underlying collection:

// Create a new dimension
var paymentsByTotal = payments.dimension(function(d) { return; });

Dimensions can be then be filtered:

paymentsByTotal.filter([100, 200]); // selects payments whose total is between 100 and 200
paymentsByTotal.filter(120); // selects payments whose total equals 120
paymentsByTotal.filter(null); // selects all payments

Check out the API docs wiki or source for advanced filtering, grouping and other features.


csonv.js: Fetch and transform CSV data into JSON #

Since it’s typed, human readable, and supported darn near everywhere,
JSON is the new hotness for data transport formats.
Unfortunately, many systems don’t expose a JSON API. Relational data is
often represented in the tried and true CSV format.

Paul Engel has introduced
CSONV.js, a JavaScript library
that can consume remote CSV data and transform it to JSON, a more
client-side developer friendly format.

Consider the following delimited files:

# books.csv
1;To Kill an Angry Bird;1
2;The Rabbit;2
4;The Lord of the Things;2
5;The Michelangelo Code;4

# authors.csv
1;Harper Lee;
2;JRR Tolkien;
3;William Shakespeare;
4;Dan Brown;

CSONV.js can transform these two relational sources into the following JSON:

    "id": 1,
    "name": "To Kill an Angry Bird",
    "author": {
      "id": 1,
      "name": "Harper Lee"
    "id": 2,
    "name": "The Rabbit",
    "author": {
      "id": 2,
      "name": "JRR Tolkien"
    "id": 3,
    "name": "Parslet",
    "author": {
      "id": 3,
      "name": "William Shakespeare"
    "id": 4,
    "name": "The Lord of the Things",
    "author": {
      "id": 2,
      "name": "JRR Tolkien"
    "id": 5,
    "name": "The Michelangelo Code",
    "author": {
      "id": 4,
      "name": "Dan Brown"

Be sure and check out the project on
for complete feature list
and advanced usage info.

[Source on GitHub]

Large Hadron Migrator: Update huge SQL tables without going offline #

With all the NoSQL hotness out
there, believe it or not, some people are still using relational
databases. (I know, right?).

When it comes to dealing with schema changes, the Active Record
Migrations in Rails make schema changes so easy, developers often take
them for granted. However, for extremely large sets of data, running an
ALTER TABLE might mean taking your database offline for hours. After
considering other projects Rany
and the smart folks at
Soundcloud developed their own solution.

Large Hadron
, named for
CERN’s high energy particle
uses a combination of copy table, triggers, and a journal table to move
data bit by bit into a new table while capturing everything still coming
into the source table in the live application.


To install, configure the gem in your Gemfile:

gem 'large-hadron-migrator'

… and run bundle install.

Next, write your migration as you normally would, using the
LargeHadronMigration class instead:

class AddIndexToEmails < LargeHadronMigration
  def self.up
    large_hadron_migrate :emails, :wait => 0.2 do |table_name|
      execute %Q{
        alter table %s
          add index index_emails_on_hashed_address (hashed_address)
      } % table_name

Be sure to check out the project
or blog post for advanced
usage and caveats.

[Source on GitHub]

Cascalog – Clojure-based query language for Hadoop #

Whilst at Chirp, Wynn interviewed Mike Montano from BackType (for a new API-focused podcast we’re launching next week) and he told us about Cascalog a new Clojure-based query language for Hadoop.

Inspired by Datalog, Cascalog is a DSL in Clojure that lets the BackType team query their massive amounts of data. Since we don’t pretend to be Clojure or Hadoop nerds, I’ll let creator Nathan Marz’ blog post lay out the feature set:

  • Simple – Functions, filters, and aggregators all use the same syntax. Joins are implicit and natural.
  • Expressive – Logical composition is very powerful, and you can run arbitrary Clojure code in your query with little effort.
  • Interactive – Run queries from the Clojure REPL.
  • Scalable – Cascalog queries run as a series of MapReduce jobs.
  • Query anything – Query HDFS data, database data, and/or local data by making use of Cascading’s “Tap” abstraction
  • Careful handling of null values – Null values can make life difficult. Cascalog has a feature called “non-nullable variables” that makes dealing with nulls painless.
  • First class interoperability with Cascading – Operations defined for Cascalog can be used in a Cascading flow and vice-versa
  • First class interoperability with Clojure – Can use regular Clojure functions as operations or filters, and since Cascalog is a Clojure DSL, you can use it in other Clojure code.

If you are a Hadoop or Clojure nerd, let us know about cool projects using either, or both!

Also, this is the first Changelog post from 30,000 feet. Go go @gogoinflight!

[Source on GitHub] [Blog post]

Gizzard – Twitter just sharded #

Database sharding, or shared nothing partitioning is a technique that helps sites with massive amounts of data scale. Usually sharding is built into the database technology, either RDBMS or NoSQL. Twitter has released Gizzard, a middleware networking service that allows you to shard data across arbitrary backend datastores.

The partitioning rules are stored in a forwarding table that maps key ranges to partitions. Each partition manages its own replication through a declarative replication tree. Gizzard supports “migrations” (for example, elastically adding machines to the cluster) and gracefully handles failures. The system is made eventually consistent by requiring that all write-operations are idempotent and as operations fail (because of, e.g., a network partition) they are retried at a later time.

Gizzard handles both physical and logical shards. Physical shards point to a physical database backend whereas logical shards are trees of other shards.


Gizzard supports advanced features such as fault tolerance, replication, and migrations. Twitter has a nice README with a great amount of documentation.

[Source on GitHub] [README]