Cython TroubleHere are a couple of things I experienced using Cython to wrap my C++ library grakopp: Assignment and Coercion I couldn’t find a nice way to wrap boost::variant. Although the direct approach works, an assignment to the variant requires an unsafe cast, but that also adds the overhead of a copy. To work around this, I used accessor functions (requires changing the C++ implementation). The operator= is not supported to declare custom assignment functions.
Bayesian inference introductionI wrote a small introduction to Bayesian inference, but because it is pretty heavy on math, I used the format of an IPython notebook. Bayesian inference is an important process in machine learning, with many real-world applications, but if you were born any time in the 20th century, you were most likely to learn about probability theory from a frequentist point of view. One reason may be that calculating some integrals in Bayesian statistics was too difficult to do without computers, so frequentist statistics was more economical.
Replacing native code with CythonHere is a little exercise in rewriting native code with Cython while not losing performance. It turns out that this requires pulling out all the stops and applying a lot of optimization magic provided by Cython. On the other hand, the resulting code is portable to Windows without worrying about compilers etc. A real world example The example code comes from a real world project, OCRopus, a wonderful collection of OCR tools that uses latest algorithms in machine learning (such as deep learning) to transform images to text.
A better iterator for couchdb-pythonThe most used python library for CouchDB seems to be [couchdb-python]. It is a simple little library that makes accessing a couch database a breeze, but it has a serious limitation when using views: The first thing it always does is to fetch all rows in the view. Then it processes them and keeps them as a list in memory. Because it is using simple data types (as opposed to a C extension or at least records), these lists have a huge overhead of about 1 KB per row (ouch!
Parsing Unicode text with Python PLYPython PLY is quite helpful in writing a simple scanner and parser, but I had some trouble figuring out how to make it accept Unicode tokens. I kept getting weird errors like this: Getr. Zählung Syntax Error: '̈hlung' The syntax error is displayed as an h with umlaut, something that does not even exist in German. On closer inspection, the problem turns out to be the COMBINING DIAERESIS character, which prevents the regular expressions in PLY from matching.
Adding extra margin to a PDFI wanted to add my own little margin notes to any PDF file (using LaTeX and the multistamp feature of PDFTk). However, because the PDF was arbitrary, I could not write on the existing margin. Instead, I had to add an extra margin at the side. But none of the standard tools supports that, and using pypdf and similar toolsets loses the metadata like bookmarks. Here is a quick hack to expand the relevant page boxes:
VcOrm source code available
I published source code for VcOrm, my version controlled object relational model implementation using SQLAlchemy and SQLite3 with automatic triggers. It is modular enough to be used in arbitrary projects. Only basic features are implemented right now, but it’s a start.