Post linebreak filter

From LuaTeXWiki


The post_linebreak_filter callback is called after a paragraph has been built, i.e. after the linebreak_filter callback or (if the latter isn't specified) TeX's default paragraph builder. The callback receives two arguments: the first is the head of the vertical list returned by the previous operation, the second is a string specifying in which context the paragraph is being built: an empty string means the main vertical list (a simple paragraph on the page), and other values include vbox, vtop...

The return value should be one of the the following three possibilities: true signals that everything is ok and the list that was passed to the callback should be processed further; false signals that the list should be flushed from memory: the paragraph simply disappears; finally, if a node list is returned it will be processed instead of the original one. Hence the syntax:

function (<node> head, <string> context)
  return true | false | <node> newhead

The callback doesn't replace any internal code and isn't expected to do anything meaningful. Its main use is to manipulate the paragraph once built; the list received as the first argument is made mostly of horizontal lists (the lines of text) interspersed with glues (to adjust baseline distances); any kind of postprocessing can be applied to those nodes (not to mention other types of nodes, if any: \vadjusted material, inserts...).


Between raggedright and justified

In paragraph 5.9.6 of TeX by Topic, Victor Eijkhout gives a code so that `lines that would stretch beyond certain limits are set with their glue at natural width' (producing a paragraph between raggedright and justified, a style popular in advertisement at the time TeX by Topic was published).

That is exactly the kind of job the post_linebreak_filter filter is designed for (the smaller the \hsize, the more the result will stand out):

local function check_lines (head)
  for line in node.traverse_id("hlist"), head) do
    if line.glue_order == 0 and line.glue_sign == 1 and line.glue_set > .4 then
      line.list = node.hpack(line.list)
  return head

callback.register("post_linebreak_filter", check_lines)

All the lines in a paragraph are inspected, thanks to the node.traverse_id() iterator, which loops over all the nodes with a given id in a list of node. For each line, one checks whether the following holds: first, the line has been justified with finite glue (it makes little sense to reset lines justified with infinite glue), i.e. the glue_order field of the line is 0 (larger values mean different orders of infinite); second, the glues have been stretched, not shrunk, i.e. glue_sign is 1, not 2; finally, the glue ratio, i.e. the amount of glue used, recorded in glue_set is above an arbitrary threshold: lines whose glues aren't stretched so much are left untouched.

If the conditions are satisfied, we simply reassign as the contents of the horizontal list that very contents processed with node.hpack(); the latter function turns a list of nodes into material suitable for a horizontal list, somehow like \hbox. Here, since no extra argument signalling that the horizontal material should be set to a certain width (as with the keyword to in \hbox), glues aren't stretched nor shrunk and the material is set to its natural width, as wanted.

At the end the head of the list is returned; returning true would have the same result, since LuaTeX is supposed to processed the same list that was received as an argument to the callback; that the list have been modified is immaterial here.

Margin notes

The callback can also be used to add material to the lines of text. Margin notes are such material: they are related to something in the paragraph, but as long as the paragraph isn't built, one doesn't know where they should be placed. With post_linebreak_filter, since the paragraph can be analysed, the limitation vanishes: one can spot the lines to which a note should be appended (their contents will be marked with attributes), and add that note.

Note that the following code is meant to illustrate the use of the post_linebreak_filter and of attributes, and isn't optimal for marginal notes themselves. They would be better dealt with in the output routine: there they can be moved up or down if necessary (so that they don't bleed into the bottom margin, for instance) and placed in the proper margin (left or right, depending on the page). Here all notes will be placed in the right margin.

The code works along the following lines: the \marginnote command takes two argument, the first being the text in the paragraph to which the note relates, the second the note itself. The text in the paragraph is marked with an attribute, and can be identified later in the callback, each mark being a new value for the attribute. The note itself is built in a local box and assigned to a PDF Form XObject; the latter move is for practical reasons only and irrelevant to the code here (working with boxes all the way down would yield the same result). After the paragraph is built, we inspect each line of text to check whether it contains material marked with the attribute; if so, the value of that attribute points to a given Form XObject, which is appended to the line.

Here's the code for the \marginnote command:

    \hsize10em \leftskip1em
    \rightskip0pt plus 1fill
    \noindent #2}%
  \count0=\wd0 \count2=\ht0 \count4=\dp0
    local xform  ="whatsit"), node.subtype("pdf_refxform"))
    xform.objnum = \the\pdflastxform;
    xform.width  = \the\count0;
    xform.height = \the\count2;
    xform.depth  = \the\count4;
    xforms[\the\pdflastxform] = xform

The first part typesets the note in a box (a \vtop, so the first line of the note will be aligned with the text it refers to in the paragraph), which is assigned to an XObject thanks to \pdfxform. The box itself has reduced \hsize, is typeset raggedright, and has a non-zero \leftskip meant to leave a gap between its left margin and the right margin of the paragraph. Other changes (fonts, baseline distances) would be suitable here too.

The second part of the code creates a node using that XObject: that is the kind of node \pdfrefxform\pdflastxform would have created, except we have to set the dimension by hand, which is why the dimensions of the box were recorded (the box has been emptied when assigned to \pdfxform). The node is then stored in a table (to be created permanently in the following code).

The third part simply releases the first argument, to be typeset as part of the paragraph, but marked with a special value of attribute 1 (any attribute would do, of course, and actually one should have something like \newattribute to allocate attributes properly, see attributes).

The following designs the code to handle the lines in post_linebreak_filter (and creates the xforms table beforehand):

xforms = {}

local function find_notes (head)
  for line in node.traverse_id ("hlist"), head) do
    for item in node.traverse (line.list) do
      local attr = node.has_attribute(item, 1)
      if attr and xforms[attr] then
        node.insert_after(line.list, node.tail(line.list), xforms[attr])
        xforms[attr] = nil
  return head

callback.register("post_linebreak_filter", find_notes)

In the node list representing the constructed paragraph, it inspects all lines (i.e. nodes of type hlist). It loops over the material of those lines to check whether some material is marked with attribute 1. If there is, and if the xforms table contains an XObject at the index with that value (a condition to be explained presently), the XObject is simply added at the end of the line's material with node.insert_after(). The function node.tail() returns the last node of a list, hence xform is added at the very end.

Then the entry in the table is deleted. That is necessary, because the material marked to receive a footnote can end up broken over two lines (or more); if the entry wasn't deleted, the code would try to add the note twice (once for each line). That is the reason why we check whether the entry xforms[attr] exists.

Finally, break ends the inspection of the line, under the assumption (typographic rather than logical) that there is no more than one note per line.

Hyphenation points

The article Show the hyphenation points uses the callback to insert markers indicating where hyphenation might have occured in the paragraph; doing so after the paragraph has been built avoids hindering its proper construction.