The Changelog

Open Source moves fast. Keep up.

Clean your HTML with Bleach #

When developing for the web a time will come when you’ll need to sanitize HTML. If you need to do this in Python then you should check out Bleach.

Bleach is an HTML sanitizing library that escapes or strips markup and attributes based on a white list. Bleach can also linkify text safely, applying filters that Django’s urlize filter cannot, and optionally setting rel attributes, even on links already in the text.

Even if all you want to do is apply rel='nofollow' to the links in user generated content, Bleach has you covered. So, check it out the next time you need to clean some HTML.

Easily Build Mac OS X Status Bar Apps With Python #

From time to time, the thought has a occurred to me that it would be cool if I could build simple native apps with Python. So, I was excited when I found rumps.

Ridiculously Uncomplicated Mac os x Python Statusbar apps

You can’t make full blown apps, but if you’ve ever had a status bar app idea you can use rumps to build it.

Bunch lets you use a Python dict like it’s an Object #

Sometimes, in Python, I wish I could access dicts as if they are objects. Bunch makes it easy to do that.

A Bunch is a Python dictionary that provides attribute-style access (a la JavaScript objects).

Bunch acts like an object and a dict.

>>> b = Bunch()
>>> b.hello = 'world'
>>> b.hello
>>> b['hello'] += "!"
>>> b.hello

And it even plays nice with serialization.

>>> b = Bunch(foo=Bunch(lol=True), hello=42, ponies='are pretty!')
>>> import json
>>> json.dumps(b)
'{"ponies": "are pretty!", "foo": {"lol": true}, "hello": 42}'

This approach isn’t for everything, but if you want a dict that acts like an object checkout Bunch.

Quickly reduce the amount of data your node API returns #

When designing an API, it’s easy to forget that not everyone has a cable modem. What if a client could easily request exactly the data it needed. That is what JSON Mask aims to do.

This is a tiny language and an engine for selecting specific parts of a JS object, hiding/masking the rest.

A code example helps to demonstrate how this works.

var mask = require('json-mask')
mask({p: {a: 1, b: 2}, z: 1}, 'p/a,z')  // {p: {a: 1}, z: 1}

JSON Mask seems like an interesting way to reduce the amount of data we send down the pipes.

Dominate HTML in Python #

Have you ever wished that you had a sweet little API to generate HTML in Python? Dominate is probably what you are looking for.

Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API.

Now, I’m a self admitted HTML purist, but look at how the dominate API works.

from dominate.tags import ul, li
list = ul()
for item in range(4):
    list += li('Item #', item)

If done correctly HTML generators can blend in with your code nicely.

Checkout Dominate the next time you’re looking for a nice native HTML generator API for python.

Can you use Python 3? #

Good question. It’s a long road to Python 3, but it’s a little easier to navigate now with the release of caniusepython3.

This script takes in a set of dependencies and then figures out which of them are holding you up from porting to Python 3.

It’s a simple script which makes it just a little easier to use Python 3.

The output of the script will tell you how many (implicit) dependencies you need to transition to Python 3 in order to allow you to make the same transition. It will also list what projects have no explicit dependency blocking their transition so you can ask them consider starting a port to Python 3.

Want to run SQL on a CSV file? #

Now you can with q, a Python lib.

q allows performing SQL-like statements on tabular text data.

It seems this idea isn’t restricted to Python either. TextQL is a project written in Go that promises to do roughly the same thing.

You always need another Python task queue #

I kid, diversity is the key to a healthy ecosystem. Huey is a simple offline Python task queue that has relatively few dependencies.

a lightweight alternative: written in python, no deps outside the standard lib except Redis (or you can roll your own backend), and support for Django.

Sometimes a little goes a long way. Checkout Huey if you need a lightweight Python task queue. If you need more features I would recommend RQ, or Celery.

Generate 4 language bindings for your API in one Go #

You just built an API, and want to make sure everyone can use it. Building libraries in every language isn’t only going to be hard, its going to take a lot of time. Time you don’t have. This is where Alpaca can help.

You define your API according to the format, alpaca builds the API libraries along with their documentation. All you have to do is publishing them to their respective package managers.

Right now it can generate API clients in PHP, Python, Ruby, and JavaScript. You can see examples of the generated client libraries here. I can’t speak to the quality of all the generated language bindings, but I took a cursory look at the Python lib and it looks good. Looks like Alpaca could save us all a lot of time.

Show a progress bar for long running loops with tqdm #

I can’t tell you how many times I’ve kicked off a long running process only to kill it and add in a progress indicator. I probably should have come up with something standard awhile ago, but now I don’t have to. tqdm has created one kind of solution.

Instantly make your loops show a progress meter – just wrap any iterator with tqdm(iterator), and you’re done!

Can’t say much more about it, but if you have had this problem in the past you might want to check out tqdm.

Speed up AWS S3 by 2000x with this transparent proxy #

Amazon S3 works pretty well, is cheap, and is not too slow. It is employed as a blob store by so many companies that it’s practically the de facto solution. So, if you could speed up S3 I am sure it would have a pretty big impact. That is exactly what MimicDB is trying to do.

By maintaining a transactional record of every API call to S3, MimicDB provides a local, isometric key-value store of data on S3. MimicDB stores everything except the contents of objects locally. Tasks like listing, searching and calculating storage usage on massive amounts of data are now fast and free.

The readme says that on average tasks like those are 2000x faster using MimicDB. It also reduced the number of API calls to S3 thus reducing the price. If you use S3 heavily, MimicDB looks like it could be an interesting addition to your stack.

Build newsfeeds with Feedly (not the RSS reader) #

Feedly, no not this feedly, is a python lib that provides a high level abstraction for building news feeds.

Feedly is a Python library, which allows you to build newsfeed and notification systems using Cassandra and/or Redis.

If you are building a social stream at some point in time SELECT * FROM updates WHERE user_id IN (people user follows) ORDER BY id DESC stops working. At that point you need to build something a little more advanced. Feedly gives you those tools.

Thumbor is a self hosted thumbnail-as-a-service #

Thumbor is pretty impressive. Not only does it take something like thumbnailing, which is always painful, and makes it easy. It has cool image operation out of the box.

It also features a VERY smart detection of important points in the image for better cropping and resizing, using state-of-the-art face and feature detection algorithms

It even sports an east to use API with urls like:


If you’re like me and think thumbnailing is a pain, checkout Thumbor.

Service orchestration from the folks behind Vagrant #

As we move into this world of loosely connected VMs, containers, and servers we need a layer that tells everything where everything else is. Serf is Hashicorp’s solution for this problem.

Serf is a decentralized solution for service discovery and orchestration that is lightweight, highly available, and fault tolerant.

Are you wondering why you might need this? Let’s say you have an app that requires Redis. You could put Redis inside your container, but then every app server you push would have its own copy of Redis. Instead you might want to have a Redis container and an app container.

Of course, when you only have two containers it wouldn’t be hard to hand manage the connection. When you increase the number of elements in the system the complexity increases as well. This is where service orchestration come in. It helps handle the complexity. Serf is one solution to this problem.

Newspaper delivers Instapaper style article extraction #

Newspaper lets anyone do article extraction like Instapaper and Pocket.

Newspaper is a Python 2 library for extracting & curating articles from the web.
It wants to change the way people handle article extraction with a new, more precise layer of abstraction.

Besides “read later” services, there’s a growing number of APIs that provide article extraction as a service like diffbot and Those services are great, but it’s nice that newspaper is open source and hackable.

For instance, when I first checked out newspaper it only had plain text article extraction. Sometimes, though, I want the original markup of the article with some sanitization. It helps to have the paragraphs, links, and headers accurately represent the article. So, I forked the project, made some changes, and the maintainer codelucas was reactive and worked with me to get my changes merged in.

If you want a place to start working on article extraction Newspaper looks like a good bet.

Lark is a REST interface for Redis #

You might have seen our post on webdis a couple years ago. Like webdis, Lark is a REST interface for Redis.

At it’s core it’s just a way of transforming HTTP requests into redis commands, but it comes with a few additions to make this a little more sane.

It comes with a Flask blueprint and a Django app, but it should work with any python web framework.

Disclaimer: Alex (this post’s author) is the creator of Lark.

Utterly simple data capture with Dataset #

Dataset is databases for lazy people from Friedrich Lindenberg.

Why do we see an awful lot of data stored in static files in CSV or JSON format, even though they are hard to query and update incrementally? The answer is that programmers are lazy … This is what dataset is going to change!

We’ve all been there. You have an idea, but before you can start working on the fun part you need to setup a database. Now you don’t have to do that. Use dataset to take care of the grunt work and let your self concentrate on the juicy bits.

Give any SQL database a REST interface with Sandman #

Let’s say you have a SQL database. You might wish it had a REST interface and a nice admin UI. That used to mean writing a bunch of code but now with Sandman it’s a one liner.

Zero boilerplate code is required. In fact, using sandmanctl, no code is required at all. Your existing database structure and schema is introspected and your database tables magically get a RESTful API and admin interface.

Sandman was created by Jeff Knupp and can be used with any database that SQLAlchemy supports.