Traversing tokens

From LuaTeXWiki

Traversing (or tracing) tokens[edit]

Add the following where you want to start tracing tokens:

\directlua{
callback.register('token_filter',
function()
	t = token.get_next()
	if (t[3]==0) then
	texio.write_nl('term and log', 'CHAR TOKEN char=' .. unicode.utf8.char(t[2]) .. ' catcode=' .. t[1])
	else
	texio.write_nl('term and log', 'CSEQ TOKEN name=' .. token.command_name(t) .. ' code=' .. t[1] .. ' id=' .. t[3])
	end
	return t
end )
}

The anonymous function reads tokens using token.get_next(), processes them, and then supplies them (or different ones) to TeX by its return value. The function unicode.utf8.char reads a numeric value and produces a UTF-8 character. For all functions in the string library of lua you have the equivalents functions in the unicode.utf8 library which is loaded by luatex.

When I apply this to file

\font\x=omlgc
\x VAT bl? bla bla bla 

I get the following:

This is luaTeX, Version 3.141592-snapshot-2007062922 (Web2C 7.5.6) (format= 2007.6.25)  4 JUL 2007 00:14
**&plain test
(
CHAR TOKEN char=  catcode=10
CSEQ TOKEN name=register code=93 id=1115710
CSEQ TOKEN name=expand_after code=118 id=1114234
CHAR TOKEN char== catcode=12
CHAR TOKEN char=o catcode=11
CHAR TOKEN char=o catcode=11
CHAR TOKEN char=m catcode=11
CHAR TOKEN char=l catcode=11
CHAR TOKEN char=g catcode=11
CHAR TOKEN char=c catcode=11
CHAR TOKEN char=  catcode=10
CSEQ TOKEN name=def_font code=92 id=1114234
CSEQ TOKEN name=def_font code=92 id=1114234
CSEQ TOKEN name=def_font code=92 id=1114234
CSEQ TOKEN name=def_font code=92 id=1114234
CSEQ TOKEN name=def_font code=92 id=1114234
CHAR TOKEN char=V catcode=11
CHAR TOKEN char=V catcode=11
CHAR TOKEN char=A catcode=11
CHAR TOKEN char=T catcode=11
CHAR TOKEN char=  catcode=10
CHAR TOKEN char=b catcode=11
CHAR TOKEN char=l catcode=11
CHAR TOKEN char=? catcode=12
CHAR TOKEN char=  catcode=10
CHAR TOKEN char=b catcode=11
CHAR TOKEN char=l catcode=11
CHAR TOKEN char=a catcode=11
CHAR TOKEN char=  catcode=10
CHAR TOKEN char=b catcode=11
CHAR TOKEN char=l catcode=11
CHAR TOKEN char=a catcode=11
CHAR TOKEN char=  catcode=10
CHAR TOKEN char=b catcode=11
CHAR TOKEN char=l catcode=11
CHAR TOKEN char=a catcode=11
CHAR TOKEN char=  catcode=10
CSEQ TOKEN name=par_end code=13 id=1114870
...

If you wondering why the letters o and V appear twice, here are the answers I received from Taco:

why do I get the o character token twice?[edit]

This is because of how the filename scanning works: it has to pre-scan the first character to decide between the two allowed syntaxes

  \font\f=omlgc

and

  \font\f={omlgc}

why do I get the V character token twice?[edit]

Because it is a character encoutered in vmode. TeX switches to hmode, then re-reads the character.

Yannis 04:40, 4 July 2007 (EDT)