Query Semantic MediaWiki with Angular through CORS

I have a private MediaWiki with the Semantic MediaWiki extensions, to keep some personal data. Wouldn’t it be nice to query that data from some other server, or from a web app? Semantic MediaWiki has a nice API that allows us to get data in JSON format. But we need to defeat the Same-Origin-Policy that protects our servers from evil code. JSONP is a well-known method that works, but only for anonymous requests on public wikis.

Read More…

Cython Trouble

Here are a couple of things I experienced using Cython to wrap my C++ library grakopp: Assignment and Coercion I couldn’t find a nice way to wrap boost::variant. Although the direct approach works, an assignment to the variant requires an unsafe cast, but that also adds the overhead of a copy. To work around this, I used accessor functions (requires changing the C++ implementation). The operator= is not supported to declare custom assignment functions.

Read More…

6 things you didn’t know about MediaWiki

… (and were afraid to ask). HTML Tag Scope: If you mix HTML tags with wikitext, which is allowed for so-called “transparent tags”, MediaWiki will check the element nesting independent of the wikitext structure in a preprocessing step (include/Sanitizer.php::removeHTMLtags). Later on, when parsing the wikitext, some elements may be closed automatically (for example at the end of a block). The now-dangling close tag will be ignored, although it is detached from its counterpart by then: <span style="color: red">test this</span> will result in: test this while test this</span> will result in: test this</span> This can happen across a long part of the wikitext document, with many intermediate blocks, so the treatment of close tags has a wide context-sensitivity, which is generally bad for formal parsing.

Read More…

Including binary file in executable

A friend asked how to include a binary file in an executable. Under Windows, one would use resource files, but under Linux the basic tools are sufficient to include arbitrary binary data in object files and access them as extern symbols. Here is my example file. To make it more fun, the same file is also a Makefile and a shell script, and the program prints itself when run (without requiring the source file to be present).

Read More…

A better iterator for couchdb-python

The most used python library for CouchDB seems to be couchdb-python. It is a simple little library that makes accessing a couch database a breeze, but it has a serious limitation when using views: The first thing it always does is to fetch all rows in the view. Then it processes them and keeps them as a list in memory. Because it is using simple data types (as opposed to a C extension or at least records), these lists have a huge overhead of about 1 KB per row (ouch!).

Read More…

Parsing Unicode text with Python PLY

Python PLY is quite helpful in writing a simple scanner and parser, but I had some trouble figuring out how to make it accept Unicode tokens. I kept getting weird errors like this: Getr. Zählung Syntax Error: '̈hlung' The syntax error is displayed as an h with umlaut, something that does not even exist in German. On closer inspection, the problem turns out to be the COMBINING DIAERESIS character, which prevents the regular expressions in PLY from matching.

Read More…

VcOrm source code available

I published source code for VcOrm, my version controlled object relational model implementation using SQLAlchemy and SQLite3 with automatic triggers. It is modular enough to be used in arbitrary projects. Only basic features are implemented right now, but it’s a start.

How to use SQLite’s backup in Python

SQLite is a very portable database that is easy to deploy, as it does not require a server and work off a single file. This makes it ideally suited as an application file format. An important part of any application is the ability make copies of documents (“Save-As”). There is no way to do this within SQL, but SQLite provides a backup interface that can be used for that purpose. Unfortunately, SQLite’s backup interface is not generally part of the generic database APIs in many higher level languages, including Python.

Read More…

SQL Alchemy and string interning

In a recent post I explored hash values as a more efficient alternative for string discriminators in joined table inheritance with SQLAlchemy. This time, I present an alternative solution, less efficient, but also a bit clearer and with no mathematical background: String interning. Normalizing databases so that referenced table names (or discriminators) are stored in a separate table and referenced by their primary ID is a well known best practice. The problem is to make SQLAlchemy dynamically allocate and look up the IDs as discriminators in an open name space.

Read More…

How many digits of SHA1 are needed to differentiate english words?

For SQLAlchemy, when using joined table inheritance, a discriminator needs to be provided that specifies the type of a derived object, so that additional object data can be looked up in tables specific to that type. The problem is how to choose a discriminator in an open system, such as an application extensible by plugins that add their own types. One way to do this is to take the hash value of a representation of the exact signature of the type.

Read More…

All Posts by Category or Tags.