Haskell usedata objects
When extending pandoc (or Quarto) with Lua filters, we interact with so-called Lua userdata objects. These objects are used to wrap document AST elements, making them accessible from Lua scripts. They mostly behave like normal Lua tables. This post is intended as a quick overview, listing interesting properties of userdata objects.
Userdata objects
Haskell-generated userdata objects have three main components: a type name, properties, and methods:
Type name
The name can be retrieved with pandoc.utils.type
. It should be treated as a read-only constant. Internally, the name is used as a type tag, which is important when retrieving an object from a Lua script back into the main program. It is possible to access the name through the debug interface, e.g.,
function typename (x)
return debug.getmetatable(x).__name
end
The above typename
function is actually a bit faster than pandoc.utils.type
. It can improve performance when accessing this info in a tight loop, but its use is not recommended.
Properties
Properties, e.g., the content
fields of Para elements, are “lazy”: the property value is marshaled to Lua when the property is accessed for the first time. We are usually interested in no more than one or two element properties, so this is a big performance improvement for most scripts. Lazy properties are especially useful with large objects like Pandoc, which would otherwise take a long time to marshal and unmarshal with all their child elements.
However, this lazy marshaling is slower if all properties will be accessed anyway. If there are performance issues due to lazy properties, then please let me know, and I’ll try to find a fix.
Methods
Methods are wrapped Haskell functions: when calling a method, the arguments are unmarshaled back into Haskell objects, which are then passed to the wrapped function and processed in Haskell. The computation’s result is then pushed back to Lua.
This may sound weirdly complicated. While it is slow for very simple functions (like pandoc.utils.type
), it’s very fast and convenient for complex methods like Pandoc:walk
: objects with lazy properties are fast to unmarshal, the main Haskell code is fast, and it frees us from having to re-implement Haskell algorithms in Lua.
Iterating
The properties and methods are listed when iterating with pairs
. The iteration order is defined at compile time, with properties listed first, followed by methods. We usually try to keep each of these lists sorted alphabetically, but there may be exceptions.
Calling pairs
on a Haskell userdata object will always succeed, even if it has neither methods nor properties; the result will be the empty iterator in that case.
Aliases
Some objects also have property aliases: E.g., div.classes
is really just an alias for div.attr.classes
. Both entries point to the same list object. Aliases are not included in the iterator generated by pairs
. See the internals on how to get a hold of them.
List behavior
Userdata objects can be made to behave like lists, but iterating over those “lists” is comparatively slow. That’s why the only object that uses this feature is PANDOC_VERSION
: for example, we can write PANDOC_VERSION[2] >= 19
to check just the major version.1
Internals
Each type has a metatable, which defines its behavior. Most users should not need to access the metatable, so the getmetatable
function returns true
instead of the actual metatable when called on a Haskell userdata object. As we’ve seen with type names, It’s still possible to inspect the userdata metatable with the help of debug.getmetatable
.
The metatable has four interesting fields: methods
, aliases
, getters
, and setters
. The fields contain just what you’d expect.
E.g., we can inspect the list of aliases defined for Inline objects with
local InlineMT = debug.getmetatable(pandoc.Str '')
for name, keys in pairs(InlineMT.aliases) do
-- print the alias name and the alias value
print(name, 'is an alias for', table.concat(keys, '.'))
end
Do not rely on the internal structure, and do not modify the metatable. Here be dragons.
Footnotes
This feature exists only to ensure backwards compatibility. It is better to do comparisons in like
PANDOC_VERSION >= '2.19'
instead.↩︎