Cython Trouble

Here are a couple of things I experienced using Cython to wrap my C++ library grakopp: Assignment and Coercion I couldn’t find a nice way to wrap boost::variant. Although the direct approach works, an assignment to the variant requires an unsafe cast, but that also adds the overhead of a copy. To work around this, I used accessor functions (requires changing the C++ implementation). The operator= is not supported to declare custom assignment functions.

Read More…

Bayesian inference introduction

I wrote a small introduction to Bayesian inference, but because it is pretty heavy on math, I used the format of an IPython notebook. Bayesian inference is an important process in machine learning, with many real-world applications, but if you were born any time in the 20th century, you were most likely to learn about probability theory from a frequentist point of view. One reason may be that calculating some integrals in Bayesian statistics was too difficult to do without computers, so frequentist statistics was more economical.

Read More…

Replacing native code with Cython

Here is a little exercise in rewriting native code with Cython while not losing performance. It turns out that this requires pulling out all the stops and applying a lot of optimization magic provided by Cython. On the other hand, the resulting code is portable to Windows without worrying about compilers etc. A real world example The example code comes from a real world project, OCRopus, a wonderful collection of OCR tools that uses latest algorithms in machine learning (such as deep learning) to transform images to text.

Read More…

A better iterator for couchdb-python

The most used python library for CouchDB seems to be couchdb-python. It is a simple little library that makes accessing a couch database a breeze, but it has a serious limitation when using views: The first thing it always does is to fetch all rows in the view. Then it processes them and keeps them as a list in memory. Because it is using simple data types (as opposed to a C extension or at least records), these lists have a huge overhead of about 1 KB per row (ouch!).

Read More…

Parsing Unicode text with Python PLY

Python PLY is quite helpful in writing a simple scanner and parser, but I had some trouble figuring out how to make it accept Unicode tokens. I kept getting weird errors like this: Getr. Zählung Syntax Error: '̈hlung' The syntax error is displayed as an h with umlaut, something that does not even exist in German. On closer inspection, the problem turns out to be the COMBINING DIAERESIS character, which prevents the regular expressions in PLY from matching.

Read More…

Adding extra margin to a PDF

I wanted to add my own little margin notes to any PDF file (using LaTeX and the multistamp feature of PDFTk). However, because the PDF was arbitrary, I could not write on the existing margin. Instead, I had to add an extra margin at the side. But none of the standard tools supports that, and using pypdf and similar toolsets loses the metadata like bookmarks. Here is a quick hack to expand the relevant page boxes: #!env python2.7 import os import os.path import subprocess import re # Margin in points.

Read More…

VcOrm source code available

I published source code for VcOrm, my version controlled object relational model implementation using SQLAlchemy and SQLite3 with automatic triggers. It is modular enough to be used in arbitrary projects. Only basic features are implemented right now, but it’s a start.

How to use SQLite’s backup in Python

SQLite is a very portable database that is easy to deploy, as it does not require a server and work off a single file. This makes it ideally suited as an application file format. An important part of any application is the ability make copies of documents (“Save-As”). There is no way to do this within SQL, but SQLite provides a backup interface that can be used for that purpose. Unfortunately, SQLite’s backup interface is not generally part of the generic database APIs in many higher level languages, including Python.

Read More…

Automatic Version Control with SQL Triggers pt. 2

In the first part of this mini series, we saw a database schema for storing all versions of all objects in an ORM database, using SQL triggers as the main mechanism. In this part, we will dig deep into the implementation. We will use SQL Alchemy, SQLite3, and a couple of standard libraries: import os import datetime import sqlite3 import sqlalchemy from sqlalchemy import (Table, Column, Integer, String, DateTime, MetaData, ForeignKey, DDL) from sqlalchemy.ext.declarative import declarative_base To start off, we define the support tables that do not depend on specific object types.

Read More…

Documenting Interfaces in Python

Programs are written for humans as well as for computers, and never the twain shall meet. On the one hand, we are not smart enough to understand executables the way computers do: Programmers are notoriously bad at predicting the performance of their work as it is interpreted by actual machines, and much time is expended in chasing down bugs that result from an insufficient mental model of the machine in the programmer’s head.

Read More…

All Posts by Category or Tags.