Misaka

Misaka is a CFFI-based binding for Hoedown, a fast markdown processing library written in C. It features a fast HTML renderer and functionality to make custom renderers (e.g. man pages or LaTeX).

See the Changelog for all changes.

Installation

Misaka has been tested on CPython 2.6, 2.7, 3.2, 3.3, 3.4, 3.5, 3.6 and PyPy 2.6+. CFFI 1.0 or newer is required. This means Misaka will not work on PyPy 2.5 and older versions.

If you’re installing from source and are using Debian or a Debian derivative (e.g. Ubuntu) make sure build-essential, python-dev and libffi-dev are installed.

Install with pip:

pip install misaka

Or grab the source from Github:

git clone https://github.com/FSX/misaka.git
cd misaka
python setup.py install

Consult the CFFI documentation if you experience problems installing CFFI.

Use the following commands to install Misaka in Termux:

apt update
apt upgrade
apt install clang python python-dev libffi libffi-dev
pip install misaka

Usage

Very simple example:

import misaka as m
print m.html('some other text')

Or:

from misaka import Markdown, HtmlRenderer

rndr = HtmlRenderer()
md = Markdown(rndr)

print md('some text')

Here’s a simple example that uses Pygments to highlight code (houdini is used to escape the HTML):

import houdini as h
import misaka as m
from pygments import highlight
from pygments.formatters import HtmlFormatter, ClassNotFound
from pygments.lexers import get_lexer_by_name

class HighlighterRenderer(m.HtmlRenderer):
    def blockcode(self, text, lang):
        try:
            lexer = get_lexer_by_name(lang, stripall=True)
        except ClassNotFound:
            lexer = None

        if lexer:
            formatter = HtmlFormatter()
            return highlight(text, lexer, formatter)
        # default
        return '\n<pre><code>{}</code></pre>\n'.format(
                            h.escape_html(text.strip()))

renderer = HighlighterRenderer()
md = m.Markdown(renderer, extensions=('fenced-code',))

print(md("""
Here is some code:

```python
print(123)
```

More code:

    print(123)
"""))

The above code listing subclasses HtmlRenderer and implements a BaseRenderer.blockcode() method. See tests/test_renderer.py for a renderer with all its methods implemented.

Tests

tidy is needed to run the tests. tox can be used to run the tests on all supported Python versions with one command.

Run one of the following commands to install tidy:

apt-get install tidy  # Debian and derivatives
pacman -S tidyhtml    # Arch Linux

And run the tests with:

python setup.py test

It’s also possible to include or exclude tests. -i and -e accept a comma separated list of testcases:

# Only run MarkdownConformanceTest_10
python setup.py test -i MarkdownConformanceTest_10

# Or everything except MarkdownConformanceTest_10
python setup.py test -e MarkdownConformanceTest_10

# Or everything except MarkdownConformanceTest_10 and MarkdownConformanceTest_103
python setup.py test -e MarkdownConformanceTest_10,MarkdownConformanceTest_103

-l prints a list of all testcases:

$ python setup.py test -l
[... build output ...]
MarkdownConformanceTest_10
MarkdownConformanceTest_103
BenchmarkLibraries
ArgsToIntTest
CustomRendererTest
SmartypantsTest

And -b runs benchmarks (-i and -e can also be used in combination with -b):

$ python setup.py test -b
[... build output ...]
>> BenchmarkLibraries
test_hoep                     3270         1.00 s/t     305.91 us/op
test_markdown                   20         1.23 s/t      61.44 ms/op
test_markdown2                  10         3.29 s/t     329.34 ms/op
test_misaka                   3580         1.00 s/t     280.01 us/op
test_misaka_classes           3190         1.00 s/t     314.00 us/op
test_mistune                    70         1.04 s/t      14.91 ms/o

What you see in the above output are the name, repetitions, total amount of time (in seconds) and the time taken for an operation (one repetition). A benchmark tries to stay within one second and runs a test for a minimum of ten repetitions and tries another ten if there’s time left.

API

Extensions

Name Constant
tables EXT_TABLES
fenced-code EXT_FENCED_CODE
footnotes EXT_FOOTNOTES
autolink EXT_AUTOLINK
strikethrough EXT_STRIKETHROUGH
underline EXT_UNDERLINE
highlight EXT_HIGHLIGHT
quote EXT_QUOTE
superscript EXT_SUPERSCRIPT
math EXT_MATH
no-intra-emphasis EXT_NO_INTRA_EMPHASIS
space-headers EXT_SPACE_HEADERS
math-explicit EXT_MATH_EXPLICIT
disable-indented-code EXT_DISABLE_INDENTED_CODE

HTML render flags

Name Constant
skip-html HTML_SKIP_HTML
escape HTML_ESCAPE
hard-wrap HTML_HARD_WRAP
use-xhtml HTML_USE_XHTML

Functions

misaka.html(text, extensions=0, render_flags=0)

Convert markdown text to HTML.

extensions can be a list or tuple of extensions (e.g. ('fenced-code', 'footnotes', 'strikethrough')) or an integer (e.g. EXT_FENCED_CODE | EXT_FOOTNOTES | EXT_STRIKETHROUGH).

render_flags can be a list or tuple of flags (e.g. ('skip-html', 'hard-wrap')) or an integer (e.g. HTML_SKIP_HTML | HTML_HARD_WRAP).

misaka.smartypants(text)

Transforms sequences of characters into HTML entities.

Markdown HTML Result
's (s, t, m, d, re, ll, ve) &rsquo;s ’s
"Quotes" &ldquo;Quotes&rdquo; “Quotes”
--- &mdash;
-- &ndash;
... &hellip;
. . . &hellip;
(c) &copy; ©
(r) &reg; ®
(tm) &trade;
3/4 &frac34; ¾
1/2 &frac12; ½
1/4 &frac14; ¼
misaka.escape_html(text, escape_slash=False)

Binding for Hoedown’s HTML escaping function.

The implementation is inspired by the OWASP XSS Prevention recommendations:

& --> &amp;
< --> &lt;
> --> &gt;
" --> &quot;
' --> &#x27;
/ --> &#x2F;  when escape_slash is set to True

New in version 2.1.0.

Classes

class misaka.Markdown(renderer, extensions=0)

Parses markdown text and renders it using the given renderer.

extensions can be a list or tuple of extensions (e.g. ('fenced-code', 'footnotes', 'strikethrough')) or an integer (e.g. EXT_FENCED_CODE | EXT_FOOTNOTES | EXT_STRIKETHROUGH).

class misaka.HtmlRenderer(flags=0, nesting_level=0)

A wrapper for the HTML renderer that’s included in Hoedown.

render_flags can be a list or tuple of flags (e.g. ('skip-html', 'hard-wrap')) or an integer (e.g. HTML_SKIP_HTML | HTML_HARD_WRAP).

nesting_level limits what’s included in the table of contents. The default value is 0, no headers.

An instance of the HtmlRenderer can not be shared with multiple Markdown instances, because it carries state that’s changed by the Markdown instance.

class misaka.SaferHtmlRenderer(flags=(), sanitization_mode='skip-html', nesting_level=0, link_rewrite=None, img_src_rewrite=None)

A subclass of HtmlRenderer which adds protections against Cross-Site Scripting (XSS):

  1. The 'skip-html' flag is turned on by default, preventing injection of HTML elements. If you want to escape HTML code instead of removing it entirely, change sanitization_mode to 'escape'.
  2. The URLs of links and images are filtered to prevent JavaScript injection. This also blocks the rendering of email addresses into links. See the check_url() method below.
  3. Optionally, the URLs can also be rewritten to counter other attacks such as phishing.

Enabling URL rewriting requires extra arguments:

Parameters:
  • link_rewrite – the URL of a redirect page, necessary to rewrite the href attributes of links
  • img_src_rewrite – the URL of an image proxy, necessary to rewrite the src attributes of images

Both strings should include a {url} placeholder for the URL-encoded target. Examples:

link_rewrite='https://example.com/redirect?url={url}',
img_src_rewrite='https://img-proxy-domain/{url}'

New in version 2.1.0.

Filters links generated by the autolink extension.

check_url(url, is_image_src=False)

This method is used to check a URL.

Returns True if the URL is “safe”, False otherwise.

The default implementation only allows HTTP and HTTPS links. That means no mailto:, no xmpp:, no ftp:, etc.

This method exists specifically to allow easy customization of link filtering through subclassing, so don’t hesitate to write your own.

If you’re thinking of implementing a blacklist approach, see “Which URL schemes are dangerous (XSS exploitable)?”.

image(raw_url, title='', alt='')

Filters the src attribute of an image.

Note that filtering the source URL of an <img> tag is only a very basic protection, and it’s mostly useless in modern browsers (they block JavaScript in there by default). An example of attack that filtering does not thwart is phishing based on HTTP Auth, see this issue for details.

To mitigate this issue you should only allow images from trusted services, for example your own image store, or a proxy (see rewrite_url()).

Filters links.

rewrite_url(url, is_image_src=False)

This method is called to rewrite URLs.

It uses either self.link_rewrite or self.img_src_rewrite depending on the value of is_image_src. The URL is returned unchanged if the corresponding attribute is None.

class misaka.HtmlTocRenderer(nesting_level=6)

A wrapper for the HTML table of contents renderer that’s included in Hoedown.

nesting_level limits what’s included in the table of contents. The default value is 6, all headers.

An instance of the HtmlTocRenderer can not be shared with multiple Markdown instances, because it carries state that’s changed by the Markdown instance.

class misaka.BaseRenderer
blockcode(text, lang='')

lang contains the language when fenced code blocks are enabled and a language is defined in ther code block.

blockquote(content)
header(content, level)

level can be a humber from 1 to 6.

hrule()
list(content, is_ordered, is_block)
listitem(content, is_ordered, is_block)
paragraph(content)
table(content)

Depends on the tables extension.

table_header(content)

Depends on the tables extension.

table_body(content)

Depends on the tables extension.

table_row(content)

Depends on the tables extension.

table_cell(content, align, is_header)

Depends on the tables extension.

align can be empty, center, left or right.

footnotes(content)

Depends on the footnotes extension.

footnote_def(content, num)

Depends on the footnotes extension.

footnote_ref(num)

Depends on the footnotes extension.

blockhtml(text)

Depends on the autolink extension.

codespan(text)
double_emphasis(content)
emphasis(content)
underline(content)

Depends on the underline extension.

highlight(content)

Depends on the highlight extension.

quote(content)

Depends on the quote extension.

image(link, title='', alt='')
linebreak()
triple_emphasis(content)
strikethrough(content)

Depends on the strikethrough extension.

superscript(content)

Depends on the superscript extension.

math(text, displaymode)

Depends on the math extension.

displaymode can be 0 or 1. This is how HtmlRenderer handles it:

if displaymode == 1:
    return '\\[{}\\]'.format(text)
else:  # displaymode == 0
    return '\\({}\\)'.format(text)
raw_html(text)
entity(text)
normal_text(text)
doc_header(inline_render)