Writing Lua in TeX

From LuaTeXWiki
Revision as of 15:41, 18 December 2010 by Paul (talk | contribs)

Embedding Lua code in a TeX document

Although it is simpler to put Lua code in Lua files, from time to time one may want or need to go Lua in the middle of a document. To this end, LuaTeX has two commands: \directlua and \latelua. They work the same, except \latelua is processed when the page where it appears is shipped out, whereas \directlua is processed at once; the distinction is immaterial here, and what is said of \directlua also applies to \latelua.

\directlua can be called in three ways:

\directlua {<lua code>}
\directlua name {<name>} {<lua code>}
\directlua <number> {<lua code>}

Those three ways are equivalent when it comes to process <lua code>, but in the second case processing will occur in a chunk named <name>, and in the third it will occur in a chunk whose name is the entry <number> in the table lua.name. The difference manifests itself only when errors occur, in which case the name of the chunk, if any, is reported.

Each call to \directlua, named or not, is processed in a separate chunk. That means that any local variable is defined for this call only and is lost afterward. Hence:

\directlua{
  one = 1
  local two = 2
}
\directlua{
  texio.write_nl(type(one))
  texio.write_nl(type(two))
}

will report number and nil (texio.write_nl writes to the log file). On the other hand, Lua code is completely insensitive to TeX's grouping mechanism. In other words, calling \directlua between \bgroup and \egroup doesn't affect the code to be processed.

TeX catcodes in Lua

By default, the code passed to \directlua is treated as normal TeX input and only then sent to the Lua interpreter. This may lead to unwanted results and must be acknowledged.

Expansion

As with any other special, the code in \directlua (and the <name>, if specifed) is fully expanded. This means that macros can be safely passed to \directlua if one wants their values, but that they should also be properly escaped when needed. For instance:

\def\macro{1}
\directlua{
  myvar = \macro
}

defines myvar as the number 1. To store the control sequence \macro instead, another course of action is needed: see the section on backslash below.

Line ends

When TeX reads a file, it normally turns line ends into spaces. That means that what looks like several lines is actually fed to the Lua interpreter as one big line. For instance:

\directlua{
  myvar = 1
  anothervar = 2
  onelastvar = 3
}

amounts to the following, if it were written in a separate Lua file:

myvar = 1 anothervar = 2 onelastvar = 3

That is perfectly legitimate, but strange things might happen. First, TeX macros gobble spaces as usual. Hence:

\def\macro{1}
\directlua{
  myvar = \macro
  anothervar = 2
}

will be fed to the interpreter as

myvar = 1anothervar = 2

which is not legitimate at all. Second, the Lua comment -- will affect everything to the end of the \directlua call. That is:

\directlua{
  myvar = 1
--  anothervar = 2
  onelastvar = 3
}

will be processed as

myvar = 1 -- anothervar = 2 onelastvar = 3

which works but only defines myvar. Third, when reporting error, the Lua interpreter will always mention the line number as 1, since it processes one big line only; that isn't extremely useful when the code is large.

The solution is to set \endlinechar=10 or \catcode`\^^M=12. In both cases, line ends will be preserved and the code will be processed as it is input.

Special characters

In TeX, some characters have a special behavior. That must be taken into account when writing Lua code: one must change their catcodes beforehand if one wants to handle them as Lua would, as has just been done for line ends. That means that \directlua, as such, is clearly insufficient to write any extended chunk of code. It is thus better to devise a special macro that sets the catcodes to the appropriate values, reads the Lua code, feeds it to \directlua, and restores the catcodes. The following code does the job:

\def\luacode{%
  \bgroup
  \catcode`\{=12
  \catcode`\}=12
  \catcode`\^^M=12
  \catcode`\#=12
  \catcode`\~=12
  \catcode`\%=12
  \doluacode
}

\bgroup
\catcode`\^^M=12 %
\gdef\doluacode#1^^M#2\endluacode{\directlua{#2}\egroup}%
\egroup

Note that not all special characters are set to normal (catcode 12) characters; that is explained for each below. Note also that \doluacode, internally called by \luacode, is defined to get rid of anything up to the line end, and then pass anything up to \endluacode to \directlua. Discarding what follows \luacode is important, otherwise a simple code as

\luacode
myvar = 1
\endluacode

would actually create two lines, the first being empty; it is annoying because errors are then reported with the wrong line number (i.e. any error in this one-line code would be reported to happen on line 2).

However, the rest of the line after \luacode could also be processed, instead of discarded, to manage special effects (e.g. specifying a chunk's name, storing the code in a control sequence, or even setting which catcodes should be changed or not).

Backslash

The backslash in TeX is used to form control sequences. In the definition of \luacode above, it isn't changed and thus behaves as usual. It allows commands to be passed and expanded to the Lua code. Anyway a backslash in Lua is also an escape character in strings. Hence, if one wants to store the name of a macro in Lua code, the following won't work:

\luacode
myvar = "\noexpand\macro"
\endluacode

because to the Lua interpreter the string is made of \m followed by acro; since \m is not defined in Lua, the string is read as macro, but in other circumstances strange things might happen: for instance, \n is a newline. The proper way to pass a macro verbatim is:

\luacode
myvar = "\noexpand\\macro"
\endluacode

which Lua will correctly read as

myvar = "\\macro"

with the backslash escaped to represent itself. Another solution is:

myvar = [[\noexpand\macro]]

because the double brackets signals a string in Lua where no escape sequence occurs (and the string can also run on several lines). Note however that in the second case myvar will be defined with a trailing space, i.e. as "\macro ", because of TeX's habit to append a trailing space to unexpanded (or unexpandable) control sequences.

Braces

One may want to define a string in Lua which contains unbalanced braces, i.e.:

\luacode
myvar = "{"
\endluacode

If the braces' catcodes hadn't been changed beforehand, that would be impossible. Note, however, that this means that one can't feed arguments to commands in the usual way. I.e. the following will produce nothing good:

\luacode
myvar = "\dosomething{\macro}"
\endluacode

\dosomething will be expanded with the left brace (devoid of its usual delimiter-ness) as its argument, and the rest of the line might produce chaos. Thus, one may also choose not to change the catcodes of braces, depending on how \luacode is most likely to be used. Note that strings with unbalanced braces can still be defined, even if braces have their usual catcodes, thanks to the following trick:

\luacode
myvar = "{" -- }
\endluacode

When the code is passed to \directlua, braces are balanced because the Lua comment means nothing to TeX; when passed to the Lua interpreter, on the other hand, the right brace is ignored.

Hash and comment

The hash sign # in Lua is the length operator: prefixed to a string or table variable, it returns its length. If its catcode weren't taken care of, LuaTeX would pass to \directlua a double hash for each hash, i.e. each # would be turned into ##. That is normal TeX behavior, but unwanted here.

As for the commen sign %, it is useful in Lua when manipulating strings. If it weren't escaped it would discard parts of the code when TeX reads it, and a mutilated version of the input would be passed to the Lua interpreter. In turn, discarding a line by commenting it in \luacode should be done with the Lua comment --.

Active characters

The ~ character is generally active and used as a no-break space in TeX. It it were passed as is to \directlua, it would expand to uninterpretable control sequences, whereas in Lua it is used to form the unequal operator ~=.

Other possible active characters should be taken care of, but which characters are active is unpredictable; punctuation marks might be so to accommodate special spacing, as with LaTeX's babel package, but such tricks are unlikely to survive in LuaTeX (cleaner methods exist that add a space before punctuation marks when necessary).

Other characters

When processing verbatim text in TeX, one generally also change the catcodes of $, &, ^, _ and the space character, because they too are special. When passed to the Lua interpreter, though, their usual catcodes won't do any harm, that is why they are left unmodified here.

\luaescapestring

Although it can't do all of what's been explained, the \luaescapestring command might be useful in some cases: it expands its argument (which must be enclosed in real braces) fully, then modify it so that dangerous characters are escaped: backslashes, hashes, quotes and line ends. For instance:

\def\macro{"\noexpand\foo"}
\luacode
myvar = "\luaescapestring{\macro}"
\endluacode

will be passed to Lua as

myvar = "\"\\foo \""

so that myvar is defined as "\foo ", with the quotes as parts of it. Note that the trailing space after \foo still happens.