Difference between revisions of "Traversing tokens"
(Copied from the old bluwiki.com luatex wiki) |
m (minor correction) |
||
Line 4: | Line 4: | ||
<pre> | <pre> | ||
− | \ | + | \directlua{ |
callback.register('token_filter', | callback.register('token_filter', | ||
function() | function() |
Latest revision as of 09:40, 8 December 2010
Traversing (or tracing) tokens[edit]
Add the following where you want to start tracing tokens:
\directlua{ callback.register('token_filter', function() t = token.get_next() if (t[3]==0) then texio.write_nl('term and log', 'CHAR TOKEN char=' .. unicode.utf8.char(t[2]) .. ' catcode=' .. t[1]) else texio.write_nl('term and log', 'CSEQ TOKEN name=' .. token.command_name(t) .. ' code=' .. t[1] .. ' id=' .. t[3]) end return t end ) }
The anonymous function reads tokens using token.get_next(), processes them, and then supplies them (or different ones) to TeX by its return value. The function unicode.utf8.char reads a numeric value and produces a UTF-8 character. For all functions in the string library of lua you have the equivalents functions in the unicode.utf8 library which is loaded by luatex.
When I apply this to file
\font\x=omlgc \x VAT bl? bla bla bla
I get the following:
This is luaTeX, Version 3.141592-snapshot-2007062922 (Web2C 7.5.6) (format= 2007.6.25) 4 JUL 2007 00:14 **&plain test ( CHAR TOKEN char= catcode=10 CSEQ TOKEN name=register code=93 id=1115710 CSEQ TOKEN name=expand_after code=118 id=1114234 CHAR TOKEN char== catcode=12 CHAR TOKEN char=o catcode=11 CHAR TOKEN char=o catcode=11 CHAR TOKEN char=m catcode=11 CHAR TOKEN char=l catcode=11 CHAR TOKEN char=g catcode=11 CHAR TOKEN char=c catcode=11 CHAR TOKEN char= catcode=10 CSEQ TOKEN name=def_font code=92 id=1114234 CSEQ TOKEN name=def_font code=92 id=1114234 CSEQ TOKEN name=def_font code=92 id=1114234 CSEQ TOKEN name=def_font code=92 id=1114234 CSEQ TOKEN name=def_font code=92 id=1114234 CHAR TOKEN char=V catcode=11 CHAR TOKEN char=V catcode=11 CHAR TOKEN char=A catcode=11 CHAR TOKEN char=T catcode=11 CHAR TOKEN char= catcode=10 CHAR TOKEN char=b catcode=11 CHAR TOKEN char=l catcode=11 CHAR TOKEN char=? catcode=12 CHAR TOKEN char= catcode=10 CHAR TOKEN char=b catcode=11 CHAR TOKEN char=l catcode=11 CHAR TOKEN char=a catcode=11 CHAR TOKEN char= catcode=10 CHAR TOKEN char=b catcode=11 CHAR TOKEN char=l catcode=11 CHAR TOKEN char=a catcode=11 CHAR TOKEN char= catcode=10 CHAR TOKEN char=b catcode=11 CHAR TOKEN char=l catcode=11 CHAR TOKEN char=a catcode=11 CHAR TOKEN char= catcode=10 CSEQ TOKEN name=par_end code=13 id=1114870 ...
If you wondering why the letters o and V appear twice, here are the answers I received from Taco:
why do I get the o character token twice?[edit]
This is because of how the filename scanning works: it has to pre-scan the first character to decide between the two allowed syntaxes
\font\f=omlgc
and
\font\f={omlgc}
why do I get the V character token twice?[edit]
Because it is a character encoutered in vmode. TeX switches to hmode, then re-reads the character.
Yannis 04:40, 4 July 2007 (EDT)