Misaka¶
Misaka is a CFFI-based binding for Hoedown, a fast markdown processing library written in C. It features a fast HTML renderer and functionality to make custom renderers (e.g. man pages or LaTeX).
See the Changelog for all changes.
Installation¶
Misaka has been tested on CPython 2.6, 2.7, 3.2, 3.3, 3.4, 3.5, 3.6 and PyPy 2.6+. CFFI 1.0 or newer is required. This means Misaka will not work on PyPy 2.5 and older versions.
If you’re installing from source and are using Debian or a Debian derivative
(e.g. Ubuntu) make sure build-essential
, python-dev
and libffi-dev
are installed.
Install with pip:
pip install misaka
Or grab the source from Github:
git clone https://github.com/FSX/misaka.git
cd misaka
python setup.py install
Consult the CFFI documentation if you experience problems installing CFFI.
Use the following commands to install Misaka in Termux:
apt update
apt upgrade
apt install clang python python-dev libffi libffi-dev
pip install misaka
Usage¶
Very simple example:
import misaka as m
print m.html('some other text')
Or:
from misaka import Markdown, HtmlRenderer
rndr = HtmlRenderer()
md = Markdown(rndr)
print md('some text')
Here’s a simple example that uses Pygments to highlight code (houdini is used to escape the HTML):
import houdini as h
import misaka as m
from pygments import highlight
from pygments.formatters import HtmlFormatter, ClassNotFound
from pygments.lexers import get_lexer_by_name
class HighlighterRenderer(m.HtmlRenderer):
def blockcode(self, text, lang):
try:
lexer = get_lexer_by_name(lang, stripall=True)
except ClassNotFound:
lexer = None
if lexer:
formatter = HtmlFormatter()
return highlight(text, lexer, formatter)
# default
return '\n<pre><code>{}</code></pre>\n'.format(
h.escape_html(text.strip()))
renderer = HighlighterRenderer()
md = m.Markdown(renderer, extensions=('fenced-code',))
print(md("""
Here is some code:
```python
print(123)
```
More code:
print(123)
"""))
The above code listing subclasses HtmlRenderer
and implements
a BaseRenderer.blockcode()
method. See tests/test_renderer.py
for a renderer with all its methods implemented.
Tests¶
tidy is needed to run the tests. tox can be used to run the tests on all supported Python versions with one command.
Run one of the following commands to install tidy:
apt-get install tidy # Debian and derivatives
pacman -S tidyhtml # Arch Linux
And run the tests with:
python setup.py test
It’s also possible to include or exclude tests. -i
and -e
accept a
comma separated list of testcases:
# Only run MarkdownConformanceTest_10
python setup.py test -i MarkdownConformanceTest_10
# Or everything except MarkdownConformanceTest_10
python setup.py test -e MarkdownConformanceTest_10
# Or everything except MarkdownConformanceTest_10 and MarkdownConformanceTest_103
python setup.py test -e MarkdownConformanceTest_10,MarkdownConformanceTest_103
-l
prints a list of all testcases:
$ python setup.py test -l
[... build output ...]
MarkdownConformanceTest_10
MarkdownConformanceTest_103
BenchmarkLibraries
ArgsToIntTest
CustomRendererTest
SmartypantsTest
And -b
runs benchmarks (-i
and -e
can also be used in
combination with -b
):
$ python setup.py test -b
[... build output ...]
>> BenchmarkLibraries
test_hoep 3270 1.00 s/t 305.91 us/op
test_markdown 20 1.23 s/t 61.44 ms/op
test_markdown2 10 3.29 s/t 329.34 ms/op
test_misaka 3580 1.00 s/t 280.01 us/op
test_misaka_classes 3190 1.00 s/t 314.00 us/op
test_mistune 70 1.04 s/t 14.91 ms/o
What you see in the above output are the name, repetitions, total amount of time (in seconds) and the time taken for an operation (one repetition). A benchmark tries to stay within one second and runs a test for a minimum of ten repetitions and tries another ten if there’s time left.
API¶
Extensions¶
Name | Constant |
---|---|
tables | EXT_TABLES |
fenced-code | EXT_FENCED_CODE |
footnotes | EXT_FOOTNOTES |
autolink | EXT_AUTOLINK |
strikethrough | EXT_STRIKETHROUGH |
underline | EXT_UNDERLINE |
highlight | EXT_HIGHLIGHT |
quote | EXT_QUOTE |
superscript | EXT_SUPERSCRIPT |
math | EXT_MATH |
no-intra-emphasis | EXT_NO_INTRA_EMPHASIS |
space-headers | EXT_SPACE_HEADERS |
math-explicit | EXT_MATH_EXPLICIT |
disable-indented-code | EXT_DISABLE_INDENTED_CODE |
HTML render flags¶
Name | Constant |
---|---|
skip-html | HTML_SKIP_HTML |
escape | HTML_ESCAPE |
hard-wrap | HTML_HARD_WRAP |
use-xhtml | HTML_USE_XHTML |
Functions¶
-
misaka.
html
(text, extensions=0, render_flags=0)¶ Convert markdown text to HTML.
extensions
can be a list or tuple of extensions (e.g.('fenced-code', 'footnotes', 'strikethrough')
) or an integer (e.g.EXT_FENCED_CODE | EXT_FOOTNOTES | EXT_STRIKETHROUGH
).render_flags
can be a list or tuple of flags (e.g.('skip-html', 'hard-wrap')
) or an integer (e.g.HTML_SKIP_HTML | HTML_HARD_WRAP
).
-
misaka.
smartypants
(text)¶ Transforms sequences of characters into HTML entities.
Markdown HTML Result 's
(s, t, m, d, re, ll, ve)’s ’s "Quotes"
“Quotes” “Quotes” ---
— — --
– – ...
… … . . .
… … (c)
© © (r)
® ® (tm)
™ ™ 3/4
¾ ¾ 1/2
½ ½ 1/4
¼ ¼
-
misaka.
escape_html
(text, escape_slash=False)¶ Binding for Hoedown’s HTML escaping function.
The implementation is inspired by the OWASP XSS Prevention recommendations:
& --> & < --> < > --> > " --> " ' --> ' / --> / when escape_slash is set to True
New in version 2.1.0.
Classes¶
-
class
misaka.
Markdown
(renderer, extensions=0)¶ Parses markdown text and renders it using the given renderer.
extensions
can be a list or tuple of extensions (e.g.('fenced-code', 'footnotes', 'strikethrough')
) or an integer (e.g.EXT_FENCED_CODE | EXT_FOOTNOTES | EXT_STRIKETHROUGH
).
-
class
misaka.
HtmlRenderer
(flags=0, nesting_level=0)¶ A wrapper for the HTML renderer that’s included in Hoedown.
render_flags
can be a list or tuple of flags (e.g.('skip-html', 'hard-wrap')
) or an integer (e.g.HTML_SKIP_HTML | HTML_HARD_WRAP
).nesting_level
limits what’s included in the table of contents. The default value is 0, no headers.An instance of the
HtmlRenderer
can not be shared with multipleMarkdown
instances, because it carries state that’s changed by theMarkdown
instance.
-
class
misaka.
SaferHtmlRenderer
(flags=(), sanitization_mode='skip-html', nesting_level=0, link_rewrite=None, img_src_rewrite=None)¶ A subclass of
HtmlRenderer
which adds protections against Cross-Site Scripting (XSS):- The
'skip-html'
flag is turned on by default, preventing injection of HTML elements. If you want to escape HTML code instead of removing it entirely, changesanitization_mode
to'escape'
. - The URLs of links and images are filtered to prevent JavaScript injection.
This also blocks the rendering of email addresses into links.
See the
check_url()
method below. - Optionally, the URLs can also be rewritten to counter other attacks such as phishing.
Enabling URL rewriting requires extra arguments:
Parameters: - link_rewrite – the URL of a redirect page, necessary to rewrite the
href
attributes of links - img_src_rewrite – the URL of an image proxy, necessary to rewrite the
src
attributes of images
Both strings should include a
{url}
placeholder for the URL-encoded target. Examples:link_rewrite='https://example.com/redirect?url={url}', img_src_rewrite='https://img-proxy-domain/{url}'
New in version 2.1.0.
-
autolink
(raw_url, is_email)¶ Filters links generated by the
autolink
extension.
-
check_url
(url, is_image_src=False)¶ This method is used to check a URL.
Returns
True
if the URL is “safe”,False
otherwise.The default implementation only allows HTTP and HTTPS links. That means no
mailto:
, noxmpp:
, noftp:
, etc.This method exists specifically to allow easy customization of link filtering through subclassing, so don’t hesitate to write your own.
If you’re thinking of implementing a blacklist approach, see “Which URL schemes are dangerous (XSS exploitable)?”.
-
image
(raw_url, title='', alt='')¶ Filters the
src
attribute of an image.Note that filtering the source URL of an
<img>
tag is only a very basic protection, and it’s mostly useless in modern browsers (they block JavaScript in there by default). An example of attack that filtering does not thwart is phishing based on HTTP Auth, see this issue for details.To mitigate this issue you should only allow images from trusted services, for example your own image store, or a proxy (see
rewrite_url()
).
-
link
(content, raw_url, title='')¶ Filters links.
-
rewrite_url
(url, is_image_src=False)¶ This method is called to rewrite URLs.
It uses either
self.link_rewrite
orself.img_src_rewrite
depending on the value ofis_image_src
. The URL is returned unchanged if the corresponding attribute isNone
.
- The
-
class
misaka.
HtmlTocRenderer
(nesting_level=6)¶ A wrapper for the HTML table of contents renderer that’s included in Hoedown.
nesting_level
limits what’s included in the table of contents. The default value is 6, all headers.An instance of the
HtmlTocRenderer
can not be shared with multipleMarkdown
instances, because it carries state that’s changed by theMarkdown
instance.
-
class
misaka.
BaseRenderer
¶ -
blockcode
(text, lang='')¶ lang
contains the language when fenced code blocks are enabled and a language is defined in ther code block.
-
blockquote
(content)¶
-
header
(content, level)¶ level
can be a humber from 1 to 6.
-
hrule
()¶
-
list
(content, is_ordered, is_block)¶
-
listitem
(content, is_ordered, is_block)¶
-
paragraph
(content)¶
-
table
(content)¶ Depends on the tables extension.
-
table_header
(content)¶ Depends on the tables extension.
-
table_body
(content)¶ Depends on the tables extension.
-
table_row
(content)¶ Depends on the tables extension.
-
table_cell
(content, align, is_header)¶ Depends on the tables extension.
align
can be empty,center
,left
orright
.
-
footnotes
(content)¶ Depends on the footnotes extension.
-
footnote_def
(content, num)¶ Depends on the footnotes extension.
-
footnote_ref
(num)¶ Depends on the footnotes extension.
-
blockhtml
(text)¶
-
autolink
(link, is_email)¶ Depends on the autolink extension.
-
codespan
(text)¶
-
double_emphasis
(content)¶
-
emphasis
(content)¶
-
underline
(content)¶ Depends on the underline extension.
-
highlight
(content)¶ Depends on the highlight extension.
-
quote
(content)¶ Depends on the quote extension.
-
image
(link, title='', alt='')¶
-
linebreak
()¶
-
link
(content, link, title='')¶
-
triple_emphasis
(content)¶
-
strikethrough
(content)¶ Depends on the strikethrough extension.
-
superscript
(content)¶ Depends on the superscript extension.
-
math
(text, displaymode)¶ Depends on the math extension.
displaymode
can be0
or1
. This is howHtmlRenderer
handles it:if displaymode == 1: return '\\[{}\\]'.format(text) else: # displaymode == 0 return '\\({}\\)'.format(text)
-
raw_html
(text)¶
-
entity
(text)¶
-
normal_text
(text)¶
-
doc_header
(inline_render)¶
-