… (and were afraid to ask).
HTML Tag Scope: If you mix HTML tags with wikitext, which is allowed for so-called “transparent tags”, MediaWiki will check the element nesting independent of the wikitext structure in a preprocessing step (include/Sanitizer.php::removeHTMLtags
). Later on, when parsing the wikitext, some elements may be closed automatically (for example at the end of a block). The now-dangling close tag will be ignored, although it is detached from its counterpart by then:
<span style="color: red">test
this</span>
will result in:
while
test
this</span>
will result in:
This can happen across a long part of the wikitext document, with many intermediate blocks, so the treatment of close tags has a wide context-sensitivity, which is generally bad for formal parsing.
Breaking the cssCheck: If a CSS style attribute contains character references to invalid Unicode code points, the page renderer terminates with a fatal error from include/normal/UtfNormalUtil.php::codepointToUtf8
called through include/Sanitizer.php::decodeCharReferencesCallback
:
<span style="\110000">x</span>
leads to the fatal error:
Asked for code outside of range (1114112)
It’s a rare chance to see an uncaught exception to leak through to the user, and could be avoided by calling include/Sanitizer.php::validateCodepoint
first and falling back to UTF8_REPLACEMENT
.
Update: I submitted a patch for this to MediaWiki’s code review platform.
HTML attribute junk: You can write just about anything (except <
, >
or />
) in the attribute space of an HTML opening tag, and MediaWiki will ignore it. This even includes a signature, like in the following example:
<strong !@#$%^&*()_foobar?,./';": style="color: red" ~~~~>test</strong>
yields
test
As long as attributes are separated from junk by whitespace, they are preserved (such as the style
attribute above).
Missing block elements: You can avoid generation of paragraph elements (<p>
) around inline text by inserting an empty <div style="display:inline">
element on the same line. If you drop the style attribute, the text will be broken in two paragraphs by the browser, though, and the text before and after the div will not connect to preceding or following inline elements.
Table header data synonymity: In a table, after a !
, the table data cell separator ||
is synonymous with the table header cell separator !!
:
{|
! header1 || header2
|}
yields
with two table headers. The opposite does not work, though:
{|
| data1 !! data2
|}
yields
Note that this example also introduces a non-breakable space character after “data1”, because MediaWiki interprets the following exclamation mark as french interpunction.
Using indent-pre were it is not allowed: In some places, indent-pre (creating <pre>
elements by indenting text lines with a space) is disallowed for compatibility. This affects <blockquote>
, <p>
, <li>
, <dt>
, and <dd>
elements, and also prevents you from creating new paragraphs and <br/>
elements with empty lines. The restriction is only active up to the first block level element, though, so it is easy to avoid it:
<ul><li>test
this
<div></div>
and
this
</li></ul>
yields
which demonstrates the limitation of the restriction.