Haskell usedata objects

Haskell
Lua
pandoc
Quarto
Author

Albert Krewinkel

Published

September 7, 2022

When extending pandoc (or Quarto) with Lua filters, we interact with so-called Lua userdata objects. These objects are used to wrap document AST elements, making them accessible from Lua scripts. They mostly behave like normal Lua tables. This post is intended as a quick overview, listing interesting properties of userdata objects.

Userdata objects

Haskell-generated userdata objects have three main components: a type name, properties, and methods:

Type name

The name can be retrieved with pandoc.utils.type. It should be treated as a read-only constant. Internally, the name is used as a type tag, which is important when retrieving an object from a Lua script back into the main program. It is possible to access the name through the debug interface, e.g.,

function typename (x)
  return debug.getmetatable(x).__name
end

The above typename function is actually a bit faster than pandoc.utils.type. It can improve performance when accessing this info in a tight loop, but its use is not recommended.

Properties

Properties, e.g., the content fields of Para elements, are “lazy”: the property value is marshaled to Lua when the property is accessed for the first time. We are usually interested in no more than one or two element properties, so this is a big performance improvement for most scripts. Lazy properties are especially useful with large objects like Pandoc, which would otherwise take a long time to marshal and unmarshal with all their child elements.

However, this lazy marshaling is slower if all properties will be accessed anyway. If there are performance issues due to lazy properties, then please let me know, and I’ll try to find a fix.

Methods

Methods are wrapped Haskell functions: when calling a method, the arguments are unmarshaled back into Haskell objects, which are then passed to the wrapped function and processed in Haskell. The computation’s result is then pushed back to Lua.

This may sound weirdly complicated. While it is slow for very simple functions (like pandoc.utils.type), it’s very fast and convenient for complex methods like Pandoc:walk: objects with lazy properties are fast to unmarshal, the main Haskell code is fast, and it frees us from having to re-implement Haskell algorithms in Lua.

Iterating

The properties and methods are listed when iterating with pairs. The iteration order is defined at compile time, with properties listed first, followed by methods. We usually try to keep each of these lists sorted alphabetically, but there may be exceptions.

Calling pairs on a Haskell userdata object will always succeed, even if it has neither methods nor properties; the result will be the empty iterator in that case.

Aliases

Some objects also have property aliases: E.g., div.classes is really just an alias for div.attr.classes. Both entries point to the same list object. Aliases are not included in the iterator generated by pairs. See the internals on how to get a hold of them.

List behavior

Userdata objects can be made to behave like lists, but iterating over those “lists” is comparatively slow. That’s why the only object that uses this feature is PANDOC_VERSION: for example, we can write PANDOC_VERSION[2] >= 19 to check just the major version.1

Internals

Each type has a metatable, which defines its behavior. Most users should not need to access the metatable, so the getmetatable function returns true instead of the actual metatable when called on a Haskell userdata object. As we’ve seen with type names, It’s still possible to inspect the userdata metatable with the help of debug.getmetatable.

The metatable has four interesting fields: methods, aliases, getters, and setters. The fields contain just what you’d expect.

E.g., we can inspect the list of aliases defined for Inline objects with

local InlineMT = debug.getmetatable(pandoc.Str '')
for name, keys in pairs(InlineMT.aliases) do
  -- print the alias name and the alias value
  print(name, 'is an alias for', table.concat(keys, '.'))
end

Do not rely on the internal structure, and do not modify the metatable. Here be dragons.

Footnotes

  1. This feature exists only to ensure backwards compatibility. It is better to do comparisons in like PANDOC_VERSION >= '2.19' instead.↩︎