Writing Lua in TeX

From LuaTeXWiki
Revision as of 18:35, 29 August 2011 by 87.60.144.211 (talk) (From Lua to TeX: - added debug note (sdaau))

Embedding Lua code in a TeX document

Although it is simpler to put Lua code in Lua files, from time to time one may want or need to go Lua in the middle of a document. To this end, LuaTeX has two commands: \directlua and \latelua. They work the same, except \latelua is processed when the page where it appears is shipped out, whereas \directlua is processed at once; the distinction is immaterial here, and what is said of \directlua also applies to \latelua.

\directlua can be called in three ways:

\directlua {<lua code>}
\directlua name {<name>} {<lua code>}
\directlua <number> {<lua code>}

Those three ways are equivalent when it comes to process <lua code>, but in the second case processing will occur in a chunk named <name>, and in the third it will occur in a chunk whose name is the entry <number> in the table lua.name. The difference manifests itself only when errors occur, in which case the name of the chunk, if any, is reported.

Each call to \directlua, named or not, is processed in a separate chunk. That means that any local variable is defined for this call only and is lost afterward. Hence:

\directlua{
  one = 1
  local two = 2
}
\directlua{
  texio.write_nl(type(one))
  texio.write_nl(type(two))
}

will report number and nil (texio.write_nl writes to the log file). On the other hand, Lua code is completely insensitive to TeX's grouping mechanism. In other words, calling \directlua between \bgroup and \egroup doesn't affect the code to be processed.

TeX catcodes in Lua

By default, the code passed to \directlua is treated as normal TeX input and only then sent to the Lua interpreter. This may lead to unwanted results and must be acknowledged.

Expansion

As with any other special, the code in \directlua (and the <name>, if specifed) is fully expanded. This means that macros can be safely passed to \directlua if one wants their values, but that they should also be properly escaped when needed. For instance:

\def\macro{1}
\directlua{
  myvar = \macro
}

defines myvar as the number 1. To store the control sequence \macro instead, another course of action is needed: see the section on backslash below.

It should be noted that \par tokens are removed from \directlua if they are unexpandable (i.e., most likely, if they have their original meaning). Hence, empty lines can be used in Lua code.

Line ends

When TeX reads a file, it normally turns line ends into spaces. That means that what looks like several lines is actually fed to the Lua interpreter as one big line. For instance:

\directlua{
  myvar = 1
  anothervar = 2
  onelastvar = 3
}

amounts to the following, if it were written in a separate Lua file:

myvar = 1 anothervar = 2 onelastvar = 3

That is perfectly legitimate, but strange things might happen. First, TeX macros gobble spaces as usual. Hence:

\def\macro{1}
\directlua{
  myvar = \macro
  anothervar = 2
}

will be fed to the interpreter as

myvar = 1anothervar = 2

which is not legitimate at all. Second, the Lua comment -- will affect everything to the end of the \directlua call. That is:

\directlua{
  myvar = 1
--  anothervar = 2
  onelastvar = 3
}

will be processed as

myvar = 1 -- anothervar = 2 onelastvar = 3

which works but only defines myvar. Third, when reporting error, the Lua interpreter will always mention the line number as 1, since it processes one big line only; that isn't extremely useful when the code is large.

The solution is to set \endlinechar=10 or \catcode`\^^M=12. In both cases, line ends will be preserved and the code will be processed as it is input.

Special characters

In TeX, some characters have a special behavior. That must be taken into account when writing Lua code: one must change their catcodes beforehand if one wants to handle them as Lua would, as has just been done for line ends. That means that \directlua, as such, is clearly insufficient to write any extended chunk of code. It is thus better to devise a special macro that sets the catcodes to the appropriate values, reads the Lua code, feeds it to \directlua, and restores the catcodes. The following code does the job:

\def\luacode{%
  \bgroup
  \catcode`\{=12
  \catcode`\}=12
  \catcode`\^^M=12
  \catcode`\#=12
  \catcode`\~=12
  \catcode`\%=12
  \doluacode
}

\bgroup
\catcode`\^^M=12 %
\long\gdef\doluacode#1^^M#2\endluacode{\directlua{#2}\egroup}%
\egroup

Note that not all special characters are set to normal (catcode 12) characters; that is explained for each below. Note also that \doluacode, internally called by \luacode, is defined to get rid of anything up to the line end, and then pass anything up to \endluacode to \directlua. Discarding what follows \luacode is important, otherwise a simple code as

\luacode
myvar = 1
\endluacode

would actually create two lines, the first being empty; it is annoying because errors are then reported with the wrong line number (i.e. any error in this one-line code would be reported to happen on line 2).

However, the rest of the line after \luacode could also be processed, instead of discarded, to manage special effects (e.g. specifying a chunk's name, storing the code in a control sequence, or even setting which catcodes should be changed or not).

Backslash

The backslash in TeX is used to form control sequences. In the definition of \luacode above, it isn't changed and thus behaves as usual. It allows commands to be passed and expanded to the Lua code. Anyway a backslash in Lua is also an escape character in strings. Hence, if one wants to store the name of a macro in Lua code, the following won't work:

\luacode
myvar = "\noexpand\macro"
\endluacode

because to the Lua interpreter the string is made of \m followed by acro; since \m is not defined in Lua, the string is read as macro, but in other circumstances strange things might happen: for instance, \n is a newline. The proper way to pass a macro verbatim is:

\luacode
myvar = "\noexpand\\macro"
\endluacode

which Lua will correctly read as

myvar = "\\macro"

with the backslash escaped to represent itself. Another solution is:

myvar = [[\noexpand\macro]]

because the double brackets signals a string in Lua where no escape sequence occurs (and the string can also run on several lines). Note however that in the second case myvar will be defined with a trailing space, i.e. as "\macro ", because of TeX's habit to append a trailing space to unexpanded (or unexpandable) control sequences.

If one wants the backslash to be treated as any other character (not creating control sequences), then one has to rewrite \luacode as follows:

\def\luacode{%
  \bgroup
  \catcode`\\=12
  \catcode`\{=12
  \catcode`\}=12
  \catcode`\^^M=12
  \catcode`\#=12
  \catcode`\~=12
  \catcode`\%=12
  \doluacode
}

\bgroup
\catcode`\|=0
\catcode`\^^M=12 %
\catcode`\\=12 %
|long|gdef|doluacode#1^^M#2\endluacode{|directlua{#2}|egroup}%
|egroup

Backslashes remain escape characters in Lua, though.

Braces

One may want to define a string in Lua which contains unbalanced braces, i.e.:

\luacode
myvar = "{"
\endluacode

If the braces' catcodes hadn't been changed beforehand, that would be impossible. Note, however, that this means that one can't feed arguments to commands in the usual way. I.e. the following will produce nothing good:

\luacode
myvar = "\dosomething{\macro}"
\endluacode

\dosomething will be expanded with the left brace (devoid of its usual delimiter-ness) as its argument, and the rest of the line might produce chaos. Thus, one may also choose not to change the catcodes of braces, depending on how \luacode is most likely to be used. Note that strings with unbalanced braces can still be defined, even if braces have their usual catcodes, thanks to the following trick:

\luacode
myvar = "{" -- }
\endluacode

When the code is passed to \directlua, braces are balanced because the Lua comment means nothing to TeX; when passed to the Lua interpreter, on the other hand, the right brace is ignored.

Hash and comment

The hash sign # in Lua is the length operator: prefixed to a string or table variable, it returns its length. If its catcode weren't taken care of, LuaTeX would pass to \directlua a double hash for each hash, i.e. each # would be turned into ##. That is normal TeX behavior, but unwanted here.

As for the commen sign %, it is useful in Lua when manipulating strings. If it weren't escaped it would discard parts of the code when TeX reads it, and a mutilated version of the input would be passed to the Lua interpreter. In turn, discarding a line by commenting it in \luacode should be done with the Lua comment --.

Active characters

The ~ character is generally active and used as a no-break space in TeX. It it were passed as is to \directlua, it would expand to uninterpretable control sequences, whereas in Lua it is used to form the unequal operator ~=.

Other possible active characters should be taken care of, but which characters are active is unpredictable; punctuation marks might be so to accommodate special spacing, as with LaTeX's babel package, but such tricks are unlikely to survive in LuaTeX (cleaner methods exist that add a space before punctuation marks when necessary).

Other characters

When processing verbatim text in TeX, one generally also changes the catcodes of $, &, ^, _ and the space character, because they too are special. When passed to the Lua interpreter, though, their usual catcodes won't do any harm, that is why they are left unmodified here.

\luaescapestring

Although it can't do all of what's been explained, the \luaescapestring command might be useful in some cases: it expands its argument (which must be enclosed in real braces) fully, then modify it so that dangerous characters are escaped: backslashes, hashes, quotes and line ends. For instance:

\def\macro{"\noexpand\foo"}
\luacode
myvar = "\luaescapestring{\macro}"
\endluacode

will be passed to Lua as

myvar = "\"\\foo \""

so that myvar is defined as "\foo ", with the quotes as parts of it. Note that the trailing space after \foo still happens.

From Lua to TeX

Inside Lua code, one can pass strings to be processed by TeX with the functions tex.print(), tex.sprint() and tex.tprint(). All such calls are processed at the end of a \directlua call, even though they might happen in the middle of the code. This behavior is worth noting because it might be surprising in some cases, although it is generally harmless.

tex.print()

This function receives as its argument(s) either one or more strings or an array of strings. Each string is processed as an input line: an end-of-line character is appended (except to the last string), and TeX is in state newline when processing it (i.e. leading spaces are skipped). Hence the two equivalent calls:

tex.print("a", "b")
tex.print({"a", "b"})

are both interpreted by TeX as would the following two lines:

a
b

Thus `a b' is produced, since line ends normally produce a space.

The function can also take an optional number as its first argument; it is interpreted as referring to a catcode table (as defined by \initcatcodetable and \savecatcodetable), and each line is processed by TeX with that catcode regime. For instance (note that with such a minimal catcode table, braces don't even have their usual values):

\bgroup
\initcatcodetable1
\catcode`\_=0
\savecatcodetable1
\egroup

\directlua{tex.print(1, "_TeX")}

The string will be read with _ as an escape character, and thus interpreted as the command commonly known as \TeX. The catcode regime holds only for the strings passed to tex.print() and the rest of the document isn't affected.

If the optional number is -1, or points to an invalid (i.e. undefined) catcode table, then the strings are processed with the current catcodes, as if there was no optional argument. If it is -2, then the strings are read as if the result of \detokenize: all characters have catcode 12 (i.e. `other', characters that have no function beside representing themselves), except space, which has catcode 10 (as usual).

tex.sprint()

Like tex.print(), this function can receive either one or more strings or an array of strings, with an optional number as its first argument pointing to a catcode table. Unlike tex.print(), however, each string is processed as if TeX were in the middle of a line and not at the beginning of a new one: spaces aren't skipped, no end-of-line character is added and trailing spaces aren't ignored. Thus:

tex.sprint("a", "b")

is interpreted by TeX as

ab

without any space inbetween.

tex.tprint()

This function takes an unlimited number of tables as its arguments; each table must be an array of strings, with the first entry optionally being a number pointing to a catcode table. Then each table is processed as if passed to tex.sprint(). Thus:

tex.tprint({1, "a", "b"}, {"c", "d"})

is equivalent to

tex.sprint(1, "a", "b")
tex.sprint("c", "d")

debug note

Note that there are some tables that Lua inside lua(la)tex has; one reference is in functionref.tex: "\type{luatex} is a typesetter; \type{texlua} and \type{luatex --luaonly} are lua interpreters. In lua interpreter mode, the lua tables \type{tex}, \type{token}, \type{node}, and \type{pdf} are unavailable."

For utilizing lua-specific debug techniques, see

You can use commands like these in your .tex file:

\typeout{==\directlua{for k,v in pairs(tex) do print(k,v) end}==}
\typeout{==\directlua{for k,v in pairs(lua) do print(k,v) end}==}
\typeout{==\directlua{print(debug.traceback(1))}==}

... and the output of lua's print will be sent to stdout; the other characters in \typeout above will basically serve simply as a sort of "delimiters" in the output, which will look something like:

tprint	function: 0x89086e8
box	table: 0x895a0a0
getsfcode	function: 0x895ae10
pdffontsize	function: 0x895b048
shipout	function: 0x895b2d8
getmath	function: 0x895b380
setuccode	function: 0x895ae48
getskip	function: 0x8908860
pdfxformname	function: 0x895b170
setlist	function: 0x895ab38
setattribute	function: 0x8908898
count	table: 0x895b600
getuccode	function: 0x895ae80
getbox	function: 0x895ab00
uniformdeviate	function: 0x895b080
fontname	function: 0x895af58
primitives	function: 0x895b220
badness	function: 0x895b310
setmath	function: 0x895b348
setskip	function: 0x8908828
getmathcode	function: 0x895ada0
getcount	function: 0x8908950
toks	table: 0x895b6b0
setsfcode	function: 0x895add8
run	function: 0x8908638
sfcode	table: 0x895a130
pdfpageref	function: 0x895b138
setcount	function: 0x8908918
setdimen	function: 0x89087b8
print	function: 0x89086d0
setlccode	function: 0x895acf8
fontidentifier	function: 0x895af90
getlccode	function: 0x895ad30
setmathcode	function: 0x895ad68
round	function: 0x895aeb8
getdimen	function: 0x89087f0
write	function: 0x89086b8
set	function: 0x8908770
definefont	function: 0x895b1b0
dimen	table: 0x895b550
get	function: 0x89087a0
number	function: 0x895b0c0
nest	table: 0x895a600
lists	table: 0x895a550
setcatcode	function: 0x895ac18
delcode	table: 0x895a4a0
setnest	function: 0x895aba8
mathcode	table: 0x895a3f0
getdelcode	function: 0x895acc0
catcode	table: 0x895a340
pdffontname	function: 0x895afd0
uccode	table: 0x895a290
setdelcode	function: 0x895ac88
lccode	table: 0x895a1e0
sp	function: 0x895af28
attribute	table: 0x895b410
getcatcode	function: 0x895ac50
extraprimitives	function: 0x895b258
linebreak	function: 0x895b3b8
setbox	function: 0x895aac8
skip	table: 0x895b4c0
pdffontobjnum	function: 0x895b008
settoks	function: 0x8908988
gettoks	function: 0x895aa90
enableprimitives	function: 0x895b298
getlist	function: 0x895ab70
getnest	function: 0x895abe0
getattribute	function: 0x89088d8
romannumeral	function: 0x895b0f8
scale	function: 0x895aef0
hashtokens	function: 0x895b1e8
error	function: 0x8908720
finish	function: 0x8908668

====
setluaname	function: 0x895efb8
setbytecode	function: 0x895f028
name	table: 0x895ef50
getluaname	function: 0x895ef80
getbytecode	function: 0x895eff0
version	Lua 5.1
bytecode	table: 0x895f080
====
1
stack traceback:
	<\directlua >:1: in main chunk
====

Note that trying to printout lua's debug.traceback with tex.print variants, e.g:

\directlua{tex.print(debug.traceback())}

... will result with failure:

! Missing number, treated as zero.
<to be read again> 
                   >
l.1 \directlua{tex.print(debug.traceback())}

... because the debug.traceback printout will contain a ">" character, on which Latex will choke.

The expansion of \directlua

A call to \directlua is fully expandable; i.e. it can be used in contexts where full expansion is required, as in:

\csname\directlua{tex.print("TeX")}\endcsname

which is a somewhat convoluted way of saying \TeX. Besides, since Lua code is processed at once, things that were previously unthinkable can now be done easily. For instance, it is impossible to perform an assignment in an \edef by TeX's traditional means. I.e. the following:

\edef\macro{\count1=5}

defines \macro as \count1=5 but doesn't perform the assignment (the \edef does nothing more than a simple \def). After the definition, the value of \count1 hasn't changed. The same is not true, though, if such an assigment is made with Lua code. The following:

\edef\macro{\directlua{tex.count[1] = 5}}

defines \macro emptily (since nothing remains after \directlua has been processed) and sets count 1 to 5. Since such a behavior is totally unexpected in normal TeX, one should be wary when using \directlua in such contexts.