Writing Lua in TeX
Contents
Embedding Lua code in a TeX document
Although it is simpler to put Lua code in Lua files, from time to time one may want or need to go Lua in the middle of a document. To this end, LuaTeX has two commands: \directlua and \latelua. They work the same, except \latelua is processed when the page where it appears is shipped out, whereas \directlua is processed at once; the distinction is immaterial here, and what is said of \directlua also applies to \latelua.
\directlua can be called in three ways:
\directlua {<lua code>} \directlua name {<name>} {<lua code>} \directlua <number> {<lua code>}
Those three ways are equivalent when it comes to process <lua code>, but in the second case processing will occur in a chunk named <name>, and in the third it will occur in a chunk whose name is the entry <number> in the table lua.name. The difference manifests itself only when errors occur, in which case the name of the chunk, if any, is reported.
Each call to \directlua, named or not, is processed in a separate chunk. That means that any local variable is defined for this call only and is lost afterward. Hence:
\directlua{ one = 1 local two = 2 } \directlua{ texio.write_nl(type(one)) texio.write_nl(type(two)) }
will report number and nil (texio.write_nl writes to the log file). On the other hand, Lua code is completely insensitive to TeX's grouping mechanism. In other words, calling \directlua between \bgroup and \egroup doesn't affect the code to be processed.
TeX catcodes in Lua
By default, the code passed to \directlua is treated as normal TeX input and only then sent to the Lua interpreter. This may lead to unwanted results and must be acknowledged.
Expansion
As with any other special, the code in \directlua (and the <name>, if specifed) is fully expanded. This means that macros can be safely passed to \directlua if one wants their values, but that they should also be properly escaped when needed. For instance:
\def\macro{1} \directlua{ myvar = \macro }
defines myvar as the number 1. To store the control sequence \macro instead, another course of action is needed: see the section on backslash below.
It should be noted that \par tokens are removed from \directlua if they are unexpandable (i.e., most likely, if they have their original meaning). Hence, empty lines can be used in Lua code.
Line ends
When TeX reads a file, it normally turns line ends into spaces. That means that what looks like several lines is actually fed to the Lua interpreter as one big line. For instance:
\directlua{ myvar = 1 anothervar = 2 onelastvar = 3 }
amounts to the following, if it were written in a separate Lua file:
myvar = 1 anothervar = 2 onelastvar = 3
That is perfectly legitimate, but strange things might happen. First, TeX macros gobble spaces as usual. Hence:
\def\macro{1} \directlua{ myvar = \macro anothervar = 2 }
will be fed to the interpreter as
myvar = 1anothervar = 2
which is not legitimate at all. Second, the Lua comment -- will affect everything to the end of the \directlua call. That is:
\directlua{ myvar = 1 -- anothervar = 2 onelastvar = 3 }
will be processed as
myvar = 1 -- anothervar = 2 onelastvar = 3
which works but only defines myvar. Third, when reporting error, the Lua interpreter will always mention the line number as 1, since it processes one big line only; that isn't extremely useful when the code is large.
The solution is to set \endlinechar=10 or \catcode`\^^M=12. In both cases, line ends will be preserved and the code will be processed as it is input.
Special characters
In TeX, some characters have a special behavior. That must be taken into account when writing Lua code: one must change their catcodes beforehand if one wants to handle them as Lua would, as has just been done for line ends. That means that \directlua, as such, is clearly insufficient to write any extended chunk of code. It is thus better to devise a special macro that sets the catcodes to the appropriate values, reads the Lua code, feeds it to \directlua, and restores the catcodes. The following code does the job:
\def\luacode{% \bgroup \catcode`\{=12 \catcode`\}=12 \catcode`\^^M=12 \catcode`\#=12 \catcode`\~=12 \catcode`\%=12 \doluacode } \bgroup \catcode`\^^M=12 % \long\gdef\doluacode#1^^M#2\endluacode{\directlua{#2}\egroup}% \egroup
Note that not all special characters are set to normal (catcode 12) characters; that is explained for each below. Note also that \doluacode, internally called by \luacode, is defined to get rid of anything up to the line end, and then pass anything up to \endluacode to \directlua. Discarding what follows \luacode is important, otherwise a simple code as
\luacode myvar = 1 \endluacode
would actually create two lines, the first being empty; it is annoying because errors are then reported with the wrong line number (i.e. any error in this one-line code would be reported to happen on line 2).
However, the rest of the line after \luacode could also be processed, instead of discarded, to manage special effects (e.g. specifying a chunk's name, storing the code in a control sequence, or even setting which catcodes should be changed or not).
Backslash
The backslash in TeX is used to form control sequences. In the definition of \luacode above, it isn't changed and thus behaves as usual. It allows commands to be passed and expanded to the Lua code. Anyway a backslash in Lua is also an escape character in strings. Hence, if one wants to store the name of a macro in Lua code, the following won't work:
\luacode myvar = "\noexpand\macro" \endluacode
because to the Lua interpreter the string is made of \m followed by acro; since \m is not defined in Lua, the string is read as macro, but in other circumstances strange things might happen: for instance, \n is a newline. The proper way to pass a macro verbatim is:
\luacode myvar = "\noexpand\\macro" \endluacode
which Lua will correctly read as
myvar = "\\macro"
with the backslash escaped to represent itself. Another solution is:
myvar = [[\noexpand\macro]]
because the double brackets signals a string in Lua where no escape sequence occurs (and the string can also run on several lines). Note however that in the second case myvar will be defined with a trailing space, i.e. as "\macro ", because of TeX's habit to append a trailing space to unexpanded (or unexpandable) control sequences.
If one wants the backslash to be treated as any other character (not creating control sequences), then one has to rewrite \luacode as follows:
\def\luacode{% \bgroup \catcode`\\=12 \catcode`\{=12 \catcode`\}=12 \catcode`\^^M=12 \catcode`\#=12 \catcode`\~=12 \catcode`\%=12 \doluacode } \bgroup \catcode`\|=0 \catcode`\^^M=12 % \catcode`\\=12 % |long|gdef|doluacode#1^^M#2\endluacode{|directlua{#2}|egroup}% |egroup
Backslashes remain escape characters in Lua, though.
Braces
One may want to define a string in Lua which contains unbalanced braces, i.e.:
\luacode myvar = "{" \endluacode
If the braces' catcodes hadn't been changed beforehand, that would be impossible. Note, however, that this means that one can't feed arguments to commands in the usual way. I.e. the following will produce nothing good:
\luacode myvar = "\dosomething{\macro}" \endluacode
\dosomething will be expanded with the left brace (devoid of its usual delimiter-ness) as its argument, and the rest of the line might produce chaos. Thus, one may also choose not to change the catcodes of braces, depending on how \luacode is most likely to be used. Note that strings with unbalanced braces can still be defined, even if braces have their usual catcodes, thanks to the following trick:
\luacode myvar = "{" -- } \endluacode
When the code is passed to \directlua, braces are balanced because the Lua comment means nothing to TeX; when passed to the Lua interpreter, on the other hand, the right brace is ignored.
Hash and comment
The hash sign # in Lua is the length operator: prefixed to a string or table variable, it returns its length. If its catcode weren't taken care of, LuaTeX would pass to \directlua a double hash for each hash, i.e. each # would be turned into ##. That is normal TeX behavior, but unwanted here.
As for the commen sign %, it is useful in Lua when manipulating strings. If it weren't escaped it would discard parts of the code when TeX reads it, and a mutilated version of the input would be passed to the Lua interpreter. In turn, discarding a line by commenting it in \luacode should be done with the Lua comment --.
Active characters
The ~ character is generally active and used as a no-break space in TeX. It it were passed as is to \directlua, it would expand to uninterpretable control sequences, whereas in Lua it is used to form the unequal operator ~=.
Other possible active characters should be taken care of, but which characters are active is unpredictable; punctuation marks might be so to accommodate special spacing, as with LaTeX's babel package, but such tricks are unlikely to survive in LuaTeX (cleaner methods exist that add a space before punctuation marks when necessary).
Other characters
When processing verbatim text in TeX, one generally also changes the catcodes of $, &, ^, _ and the space character, because they too are special. When passed to the Lua interpreter, though, their usual catcodes won't do any harm, that is why they are left unmodified here.
\luaescapestring
Although it can't do all of what's been explained, the \luaescapestring command might be useful in some cases: it expands its argument (which must be enclosed in real braces) fully, then modify it so that dangerous characters are escaped: backslashes, hashes, quotes and line ends. For instance:
\def\macro{"\noexpand\foo"} \luacode myvar = "\luaescapestring{\macro}" \endluacode
will be passed to Lua as
myvar = "\"\\foo \""
so that myvar is defined as "\foo ", with the quotes as parts of it. Note that the trailing space after \foo still happens.
From Lua to TeX
Inside Lua code, one can pass strings to be processed by TeX with the functions tex.print(), tex.sprint() and tex.tprint(). All such calls are processed at the end of a \directlua call, even though they might happen in the middle of the code. This behavior is worth noting because it might be surprising in some cases, although it is generally harmless.
tex.print()
This function receives as its argument(s) either one or more strings or an array of strings. Each string is processed as an input line: an end-of-line character is appended (except to the last string), and TeX is in state newline when processing it (i.e. leading spaces are skipped). Hence the two equivalent calls:
tex.print("a", "b") tex.print({"a", "b"})
are both interpreted by TeX as would the following two lines:
a b
Thus `a b' is produced, since line ends normally produce a space.
The function can also take an optional number as its first argument; it is interpreted as referring to a catcode table (as defined by \initcatcodetable and \savecatcodetable), and each line is processed by TeX with that catcode regime. For instance (note that with such a minimal catcode table, braces don't even have their usual values):
\bgroup \initcatcodetable1 \catcode`\_=0 \savecatcodetable1 \egroup \directlua{tex.print(1, "_TeX")}
The string will be read with _ as an escape character, and thus interpreted as the command commonly known as \TeX. The catcode regime holds only for the strings passed to tex.print() and the rest of the document isn't affected.
If the optional number is -1, or points to an invalid (i.e. undefined) catcode table, then the strings are processed with the current catcodes, as if there was no optional argument. If it is -2, then the strings are read as if the result of \detokenize: all characters have catcode 12 (i.e. `other', characters that have no function beside representing themselves), except space, which has catcode 10 (as usual).
tex.sprint()
Like tex.print(), this function can receive either one or more strings or an array of strings, with an optional number as its first argument pointing to a catcode table. Unlike tex.print(), however, each string is processed as if TeX were in the middle of a line and not at the beginning of a new one: spaces aren't skipped, no end-of-line character is added and trailing spaces aren't ignored. Thus:
tex.sprint("a", "b")
is interpreted by TeX as
ab
without any space inbetween.
tex.tprint()
This function takes an unlimited number of tables as its arguments; each table must be an array of strings, with the first entry optionally being a number pointing to a catcode table. Then each table is processed as if passed to tex.sprint(). Thus:
tex.tprint({1, "a", "b"}, {"c", "d"})
is equivalent to
tex.sprint(1, "a", "b") tex.sprint("c", "d")
debug note
Note that there are some tables that Lua inside lua(la)tex has; one reference is in functionref.tex: "\type{luatex} is a typesetter; \type{texlua} and \type{luatex --luaonly} are lua interpreters. In lua interpreter mode, the lua tables \type{tex}, \type{token}, \type{node}, and \type{pdf} are unavailable."
For utilizing lua-specific debug techniques, see
- 23. The Debug Library - Lua,
- 23.1 - Introspective Facilities,
- lua -users wiki: Debug Library Tutorial
You can use commands like these in your .tex file:
\typeout{==\directlua{for k,v in pairs(tex) do print(k,v) end}==} \typeout{==\directlua{for k,v in pairs(lua) do print(k,v) end}==} \typeout{==\directlua{print(debug.traceback(1))}==}
... and the output of lua's print will be sent to stdout; the other characters in \typeout above will basically serve simply as a sort of "delimiters" in the output, which will look something like:
tprint function: 0x89086e8 box table: 0x895a0a0 getsfcode function: 0x895ae10 pdffontsize function: 0x895b048 shipout function: 0x895b2d8 getmath function: 0x895b380 setuccode function: 0x895ae48 getskip function: 0x8908860 pdfxformname function: 0x895b170 setlist function: 0x895ab38 setattribute function: 0x8908898 count table: 0x895b600 getuccode function: 0x895ae80 getbox function: 0x895ab00 uniformdeviate function: 0x895b080 fontname function: 0x895af58 primitives function: 0x895b220 badness function: 0x895b310 setmath function: 0x895b348 setskip function: 0x8908828 getmathcode function: 0x895ada0 getcount function: 0x8908950 toks table: 0x895b6b0 setsfcode function: 0x895add8 run function: 0x8908638 sfcode table: 0x895a130 pdfpageref function: 0x895b138 setcount function: 0x8908918 setdimen function: 0x89087b8 print function: 0x89086d0 setlccode function: 0x895acf8 fontidentifier function: 0x895af90 getlccode function: 0x895ad30 setmathcode function: 0x895ad68 round function: 0x895aeb8 getdimen function: 0x89087f0 write function: 0x89086b8 set function: 0x8908770 definefont function: 0x895b1b0 dimen table: 0x895b550 get function: 0x89087a0 number function: 0x895b0c0 nest table: 0x895a600 lists table: 0x895a550 setcatcode function: 0x895ac18 delcode table: 0x895a4a0 setnest function: 0x895aba8 mathcode table: 0x895a3f0 getdelcode function: 0x895acc0 catcode table: 0x895a340 pdffontname function: 0x895afd0 uccode table: 0x895a290 setdelcode function: 0x895ac88 lccode table: 0x895a1e0 sp function: 0x895af28 attribute table: 0x895b410 getcatcode function: 0x895ac50 extraprimitives function: 0x895b258 linebreak function: 0x895b3b8 setbox function: 0x895aac8 skip table: 0x895b4c0 pdffontobjnum function: 0x895b008 settoks function: 0x8908988 gettoks function: 0x895aa90 enableprimitives function: 0x895b298 getlist function: 0x895ab70 getnest function: 0x895abe0 getattribute function: 0x89088d8 romannumeral function: 0x895b0f8 scale function: 0x895aef0 hashtokens function: 0x895b1e8 error function: 0x8908720 finish function: 0x8908668 ==== setluaname function: 0x895efb8 setbytecode function: 0x895f028 name table: 0x895ef50 getluaname function: 0x895ef80 getbytecode function: 0x895eff0 version Lua 5.1 bytecode table: 0x895f080 ==== 1 stack traceback: <\directlua >:1: in main chunk ====
Note that trying to printout lua's debug.traceback with tex.print variants, e.g:
\directlua{tex.print(debug.traceback())}
... will result with failure:
! Missing number, treated as zero. <to be read again> > l.1 \directlua{tex.print(debug.traceback())}
... because the debug.traceback printout will contain a ">" character, on which Latex will choke.
The expansion of \directlua
A call to \directlua is fully expandable; i.e. it can be used in contexts where full expansion is required, as in:
\csname\directlua{tex.print("TeX")}\endcsname
which is a somewhat convoluted way of saying \TeX. Besides, since Lua code is processed at once, things that were previously unthinkable can now be done easily. For instance, it is impossible to perform an assignment in an \edef by TeX's traditional means. I.e. the following:
\edef\macro{\count1=5}
defines \macro as \count1=5 but doesn't perform the assignment (the \edef does nothing more than a simple \def). After the definition, the value of \count1 hasn't changed. The same is not true, though, if such an assigment is made with Lua code. The following:
\edef\macro{\directlua{tex.count[1] = 5}}
defines \macro emptily (since nothing remains after \directlua has been processed) and sets count 1 to 5. Since such a behavior is totally unexpected in normal TeX, one should be wary when using \directlua in such contexts.