API

API stability

Everything described here is considered “public”: this is what you can rely on. We will try to maintain backward-compatibility, although there is no hard promise until version 1.0.

Anything else should not be used outside of WeasyPrint itself: we reserve the right to change it or remove it at any point. Use it at your own risk, or have dependency to a specific WeasyPrint version in your setup.py or requirements.txt file.

Command-line API

weasyprint.__main__.main(argv=sys.argv)

The weasyprint program takes at least two arguments:

weasyprint [options] <input> <output>

The input is a filename or URL to an HTML document, or - to read HTML from stdin. The output is a filename, or - to write to stdout.

Options can be mixed anywhere before, between, or after the input and output:

-e <input_encoding>, --encoding <input_encoding>

Force the input character encoding (e.g. -e utf8).

-f <output_format>, --format <output_format>

Choose the output file format among PDF and PNG (e.g. -f png). Required if the output is not a .pdf or .png filename.

-s <filename_or_URL>, --stylesheet <filename_or_URL>

Filename or URL of a user CSS stylesheet (see Stylesheet origins) to add to the document (e.g. -s print.css). Multiple stylesheets are allowed.

-r <dpi>, --resolution <dpi>

For PNG output only. Set the resolution in PNG pixel per CSS inch. Defaults to 96, which means that PNG pixels match CSS pixels.

--base-url <URL>

Set the base for relative URLs in the HTML input. Defaults to the input’s own URL, or the current directory for stdin.

-m <type>, --media-type <type>

Set the media type to use for @media. Defaults to print.

-a <file>, --attachment <file>

Adds an attachment to the document. The attachment is included in the PDF output. This option can be used multiple times.

-p, --presentational-hints

Follow HTML presentational hints.

--version

Show the version number. Other options and arguments are ignored.

-h, --help

Show the command-line usage. Other options and arguments are ignored.

Python API

class weasyprint.HTML(input, **kwargs)

Represents an HTML document parsed by html5lib.

You can just create an instance with a positional argument: doc = HTML(something) The class will try to guess if the input is a filename, an absolute URL, or a file-like object.

Alternatively, use one named argument so that no guessing is involved:

Parameters:
  • filename – A filename, relative to the current directory, or absolute.
  • url – An absolute, fully qualified URL.
  • file_obj – A file-like: any object with a read() method.
  • string – A string of HTML source. (This argument must be named.)

Specifying multiple inputs is an error: HTML(filename="foo.html", url="localhost://bar.html") will raise a TypeError.

You can also pass optional named arguments:

Parameters:
  • encoding – Force the source character encoding.
  • base_url – The base used to resolve relative URLs (e.g. in <img src="../foo.png">). If not provided, try to use the input filename, URL, or name attribute of file-like objects.
  • url_fetcher – A function or other callable with the same signature as default_url_fetcher() called to fetch external resources such as stylesheets and images. (See URL fetchers.)
  • media_type – The media type to use for @media. Defaults to 'print'. Note: In some cases like HTML(string=foo) relative URLs will be invalid if base_url is not provided.
render(stylesheets=None, enable_hinting=False, presentational_hints=False)

Lay out and paginate the document, but do not (yet) export it to PDF or another format.

This returns a Document object which provides access to individual pages and various meta-data. See write_pdf() to get a PDF directly.

New in version 0.15.

Parameters:
  • stylesheets – An optional list of user stylesheets. List elements are CSS objects, filenames, URLs, or file-like objects. (See Stylesheet origins.)
  • enable_hinting (bool) – Whether text, borders and background should be hinted to fall at device pixel boundaries. Should be enabled for pixel-based output (like PNG) but not for vector-based output (like PDF).
  • presentational_hints (bool) – Whether HTML presentational hints are followed.
Returns:

A Document object.

write_pdf(target=None, stylesheets=None, zoom=1, attachments=None, presentational_hints=False)

Render the document to a PDF file.

This is a shortcut for calling render(), then Document.write_pdf().

Parameters:
  • target – A filename, file-like object, or None.
  • stylesheets – An optional list of user stylesheets. The list’s elements are CSS objects, filenames, URLs, or file-like objects. (See Stylesheet origins.)
  • zoom (float) – The zoom factor in PDF units per CSS units. Warning: All CSS units are affected, including physical units like cm and named sizes like A4. For values other than 1, the physical CSS units will thus be “wrong”.
  • attachments – A list of additional file attachments for the generated PDF document or None. The list’s elements are Attachment objects, filenames, URLs or file-like objects.
  • presentational_hints (bool) – Whether HTML presentational hints are followed.
Returns:

The PDF as byte string if target is not provided or None, otherwise None (the PDF is written to target).

write_png(target=None, stylesheets=None, resolution=96, presentational_hints=False)

Paint the pages vertically to a single PNG image.

There is no decoration around pages other than those specified in CSS with @page rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.

This is a shortcut for calling render(), then Document.write_png().

Parameters:
  • target – A filename, file-like object, or None.
  • stylesheets – An optional list of user stylesheets. The list’s elements are CSS objects, filenames, URLs, or file-like objects. (See Stylesheet origins.)
  • resolution (float) – The output resolution in PNG pixels per CSS inch. At 96 dpi (the default), PNG pixels match the CSS px unit.
  • presentational_hints (bool) – Whether HTML presentational hints are followed.
Returns:

The image as byte string if target is not provided or None, otherwise None (the image is written to target.)

class weasyprint.CSS(input, **kwargs)

Represents a CSS stylesheet parsed by tinycss2.

An instance is created in the same way as HTML, except that the tree argument is not available. All other arguments are the same.

An additional argument called font_config must be provided to handle @font-config rules. The same fonts.FontConfiguration object must be used for different CSS objects applied to the same document.

CSS objects have no public attribute or method. They are only meant to be used in the write_pdf(), write_png() and render() methods of HTML objects.

weasyprint.default_url_fetcher(url)

Fetch an external resource such as an image or stylesheet.

Another callable with the same signature can be given as the url_fetcher argument to HTML or CSS. (See URL fetchers.)

Parameters:url (Unicode string) – The URL of the resource to fetch.
Raises:An exception indicating failure, e.g. ValueError on syntactically invalid URL.
Returns:A dict with the following keys:
  • One of string (a byte string) or file_obj (a file-like object)
  • Optionally: mime_type, a MIME type extracted e.g. from a Content-Type header. If not provided, the type is guessed from the file extension in the URL.
  • Optionally: encoding, a character encoding extracted e.g. from a charset parameter in a Content-Type header
  • Optionally: redirected_url, the actual URL of the resource if there were e.g. HTTP redirects.
  • Optionally: filename, the filename of the resource. Usually derived from the filename parameter in a Content-Disposition header

If a file_obj key is given, it is the caller’s responsibility to call file_obj.close().

class weasyprint.document.Document(pages, metadata, url_fetcher)

A rendered document, with access to individual pages ready to be painted on any cairo surfaces.

Typically obtained from HTML.render(), but can also be instantiated directly with a list of pages, a set of metadata, and a url_fetcher.

pages = None

A list of Page objects.

metadata = None

A DocumentMetadata object. Contains information that does not belong to a specific page but to the whole document.

url_fetcher = None

A url_fetcher for resources that have to be read when writing the output.

copy(pages='all')

Take a subset of the pages.

Parameters:pages – An iterable of Page objects from pages.
Returns:A new Document object.

Examples:

Write two PDF files for odd-numbered and even-numbered pages:

# Python lists count from 0 but pages are numbered from 1.
# [::2] is a slice of even list indexes but odd-numbered pages.
document.copy(document.pages[::2]).write_pdf('odd_pages.pdf')
document.copy(document.pages[1::2]).write_pdf('even_pages.pdf')

Write each page to a numbred PNG file:

for i, page in enumerate(document.pages):
    document.copy(page).write_png('page_%s.png' % i)

Combine multiple documents into one PDF file, using metadata from the first:

all_pages = [p for p in doc.pages for doc in documents]
documents[0].copy(all_pages).write_pdf('combined.pdf')

Resolve internal hyperlinks.

Links to a missing anchor are removed with a warning. If multiple anchors have the same name, the first is used.

Returns:A generator yielding lists (one per page) like Page.links, except that target for internal hyperlinks is (page_number, x, y) instead of an anchor name. The page number is a 0-based index into the pages list, and x, y are in CSS pixels from the top-left of the page.
make_bookmark_tree()

Make a tree of all bookmarks in the document.

Returns:a list of bookmark subtrees. A subtree is (label, target, children). label is a string, target is (page_number, x, y) like in resolve_links(), and children is a list of child subtrees.
write_pdf(target=None, zoom=1, attachments=None)

Paint the pages in a PDF file, with meta-data.

PDF files written directly by cairo do not have meta-data such as bookmarks/outlines and hyperlinks.

Parameters:
  • target – A filename, file-like object, or None.
  • zoom (float) – The zoom factor in PDF units per CSS units. Warning: All CSS units are affected, including physical units like cm and named sizes like A4. For values other than 1, the physical CSS units will thus be “wrong”.
  • attachments – A list of additional file attachments for the generated PDF document or None. The list’s elements are Attachment objects, filenames, URLs, or file-like objects.
Returns:

The PDF as byte string if target is None, otherwise None (the PDF is written to target).

write_png(target=None, resolution=96)

Paint the pages vertically to a single PNG image.

There is no decoration around pages other than those specified in CSS with @page rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.

Parameters:
  • target – A filename, file-like object, or None.
  • resolution (float) – The output resolution in PNG pixels per CSS inch. At 96 dpi (the default), PNG pixels match the CSS px unit.
Returns:

A (png_bytes, png_width, png_height) tuple. png_bytes is a byte string if target is None, otherwise None (the image is written to target). png_width and png_height are the size of the final image, in PNG pixels.

class weasyprint.document.DocumentMetadata(title=None, authors=None, description=None, keywords=None, generator=None, created=None, modified=None, attachments=None)
Contains meta-information about a Document
that belongs to the whole document rather than specific pages.

New attributes may be added in future versions of WeasyPrint.

title = None

The title of the document, as a string or None. Extracted from the <title> element in HTML and written to the /Title info field in PDF.

authors = None

The authors of the document as a list of strings. Extracted from the <meta name=author> elements in HTML and written to the /Author info field in PDF.

description = None

The description of the document, as a string or None. Extracted from the <meta name=description> element in HTML and written to the /Subject info field in PDF.

keywords = None

Keywords associated with the document, as a list of strings. (Defaults to the empty list.) Extracted from <meta name=keywords> elements in HTML and written to the /Keywords info field in PDF.

generator = None

The name of one of the software packages used to generate the document, as a string or None. Extracted from the <meta name=generator> element in HTML and written to the /Creator info field in PDF.

created = None

The creation date of the document, as a string or None. Dates are in one of the six formats specified in W3C’s profile of ISO 8601. Extracted from the <meta name=dcterms.created> element in HTML and written to the /CreationDate info field in PDF.

modified = None

The modification date of the document, as a string or None. Dates are in one of the six formats specified in W3C’s profile of ISO 8601. Extracted from the <meta name=dcterms.modified> element in HTML and written to the /ModDate info field in PDF.

attachments = None

File attachments as a list of tuples of URL and a description or None. Extracted from the <link rel=attachment> elements in HTML and written to the /EmbeddedFiles dictionary in PDF.

class weasyprint.document.Page

Represents a single rendered page.

New in version 0.15.

Should be obtained from Document.pages but not instantiated directly.

width = None

The page width, including margins, in CSS pixels.

height = None

The page height, including margins, in CSS pixels.

bookmarks = None

A list of (bookmark_level, bookmark_label, target) tuples. bookmark_level and bookmark_label are respectively an integer and a Unicode string, based on the CSS properties of the same names. target is an (x, y) point in CSS pixels from the top-left of the page.

A list of (link_type, target, rectangle) tuples. A rectangle is (x, y, width, height), in CSS pixels from the top-left of the page. link_type is one of two strings:

  • 'external': target is an absolute URL
  • 'internal': target is an anchor name (see Page.anchors).
  • 'attachment': target is an absolute URL and points to a resource to attach to the document.
anchors = None

A dict mapping each anchor name to its target, an (x, y) point in CSS pixels from the top-left of the page.

paint(cairo_context, left_x=0, top_y=0, scale=1, clip=False)

Paint the page in cairo, on any type of surface.

Parameters:
  • cairo_context – Any cairocffi.Context object.
  • left_x (float) – X coordinate of the left of the page, in cairo user units.
  • top_y (float) – Y coordinate of the top of the page, in cairo user units.
  • scale (float) – Zoom scale in cairo user units per CSS pixel.
  • clip (bool) – Whether to clip/cut content outside the page. If false or not provided, content can overflow.