Difference between revisions of "TeX without TeX"

From LuaTeXWiki
m (typo (</pre>))
m (typo)
Line 235: Line 235:
We create a glyph, pack it into an hbox (horizontal box), pack that hbox into a vbox (a vertical box) and write this box to the main vertical list. The char number (65 in the example above) is based on an internal encoding, so we have to keep an eye on what character is on what position. Now we do this for two glyphs:
We create a glyph, pack it into an hbox (horizontal box), pack that hbox into a vbox (a vertical box) and write this box to the main vertical list. The char number (86 in the example above) is based on an internal encoding, so we have to keep an eye on what character is on what position. Now we do this for two glyphs:

Revision as of 21:04, 3 July 2011

Use TeX's power without TeX macros

With LuaTeX you have access to most of TeX's capabilities for creating PDF documents. Most of the LuaTeX reference manual is about this topic.

The underlying idea is to create TeX's internal data structure as TeX would do by transforming macros and primitives into something called nodes. These nodes are then transformed into instructions for the PDF file (PDF objects). So the following pages deal with node creation and the structures.

Let's start with a rule node. When you write \hrule width 20pt height 10pt depth 10pt in TeX, a node of type rule will be created, which has three fields: height, depth and width. We can visualize this node by a simple box like this:


The same can be done directly in Lua:

r = node.new("rule")
r.width  = 20 * 65536
r.height = 10 * 65536
r.depth  = 10 * 65536

Some remarks: the size is in scaled points, and 65536 (=216) scaled points are 1 point, so 20pt becomes 20 * 216. Not shown in the example are (among others) the next and prev fields. These fields hold a pointer (the node address) to the following and the previous nodes. A node that exist by itself has its prev and next fields set to nil.

A slightly more complicated example is when you create two rules and pack then into a vertical box. This would look like this in plain TeX:

\vbox {\hrule width 20pt height 10pt depth 10pt
           \hrule width 20pt height 10pt depth 10pt }

This would generate a node list like:


The same can be done directly in Lua:

r1 = node.new("rule")
r1.width  = 20 * 65536
r1.height = 10 * 65536
r1.depth  = 10 * 65536

r2 = node.new("rule")
r2.width  = 20 * 65536
r2.height = 10 * 65536
r2.depth  = 10 * 65536

r1.next = r2
r2.prev = r1

vbox = node.vpack(r1)

The result is a vbox containing the two rule nodes. The nodes are connected by their next and prev pointers, so the vbox contains them both. The size (width, height and depth) is adjusted automatically.

This also applies to all other things you can put on a TeX page: glyphs, horizontal boxes, math, images, hyperlinks, glue .... For a complete list of the nodes, see the reference manual Chapter 8 "Nodes".

Beyond reports and articles

Perhaps the question that comes into your mind is: why should I bother writing Lua code instead of TeX code? For many documents, including articles, reports and books, the classic way to create a PDF with TeX is to use LaTeX or ConTeXt. But there are applications that use TeX as an 100% automatic PDF generation software. If you want to create data sheets or product listings, your normal input is not a classic document, but most likely an Excel spreadsheet or an XML file, perhaps extracted from a database. You won't have any TeX markup within these documents. So: if you don't have TeX markup in these documents, you don't need the TeX interpreter to read these files. More likely is an XML parser or a spreadsheet 2 ... converter.

When you use regular LaTeX or ConTeXt code to put the contents of the database in your PDF (database publishing), you run into several problems:

  • Exact positioning of items on the page can be troublesome
  • catcodes / command escapes are to be considered and surely get into your way
  • TeX is no fun to program in (well it is, but only for some people) - calculations, tests, exception handling, control flow
  • The packages that exist may reach their limit soon. You need a super flexible image-between-columns-with-parshape package? Write it yourself.

With Lua you have a very decent programming language that is fun to program in. When you use the Lua interface to TeX, you have full control of what is happening, which is usually the level of control you need for high quality output.

One approach is to create a stub TeX file and let Lua do the rest:


The file myprogram.lua then contains all the code to a) read the source file(s), b) extract the information and create the appropriate nodes and c) instruct TeX to create PDF pages from these nodes.

In a future version of LuaTeX you will be able to run the Lua interpreter without the stub TeX file above.

A first example

The example document given below creates two pages by using Lua code alone. You will learn how to access TeX's boxes and counters from the Lua side, shipout a page into the PDF file, create horizontal and vertical boxes (hbox and vbox), create new nodes and manipulate the nodes links structure. The example covers the following node types: rule, whatsit, vlist, hlist and action.

In the example code we use black squares as the contents of the pages and not normal text, because character handling takes more code and will be covered later on.

Save the following file into myprogram.lua and use the TeX stub from above (or get the source here).

  -- this will hold the items that go onto the page
  local pagelist
  -- Call tex.shipout() with the contents of the pagelist
  function shipout()

    local vbox,b = node.vpack(pagelist) -- we ignore the badness 'b'
    tex.box[666] = vbox

    pagelist = nil
    -- Not strictly necessary. TeX holds the current page number in counter 0.
    -- TeX displays the contents of this counter when it puts a page into
    -- the pdf (tex.shipout()). If we don't change the counter, TeX will
    -- display [1] [1], instead of [1] [2] for our two page document.
    tex.count[0] = tex.count[0] + 1

  function add_to_page( list )
    -- We attach the nodelist 'list' to the end of the pagelist
    -- if pagelist doesn't exist, 'list' is our new pagelist
    -- if it exists, we go to the end with node.tail() and adjust
    -- the prev and next pointers, so list becomes part
    -- of pagelist.
    if not pagelist then pagelist = list
      local tail = node.tail(pagelist)
      tail.next = list
      list.prev  = tail

-- This creates a new square rule and returns the pointer to it.
function mkrule( size )
  local r = node.new("rule")
  r.width  = size 
  r.height = size / 2
  r.depth  = size / 2
  return r

  local destcounter = 0
  -- Create a pdf anchor (dest object). It returns a whatsit node and the 
  -- number of the anchor, so it can be used in a pdf link or an outline.
  function mkdest()
    destcounter = destcounter + 1
    local d = node.new("whatsit","pdf_dest")
    d.named_id = 0
    d.dest_id = destcounter
    d.dest_type = 3

    return d, destcounter

-- Take a list of nodes and put them into an hbox. The prev and next fields
-- of the nodes will be set automatically. Return a pointer to the hbox.
function hpack( ... )
  local start, tmp, cur
  start = select(1,...)
  tmp = start
  for i=2,select("#",...) do
    cur = select(i,...)
    tmp.next = cur
    cur.prev = tmp
    tmp = cur
  local h,b = node.hpack(start) -- ignore badness
  return h

local tenpt = 10 * 2^16
-- page 1
local n,dest = mkdest() -- dest is needed for the link to this anchor

add_to_page(mkrule(2 * tenpt))

-- The pagelist contains a pdf dest node (a link destination) and a rule of size 20pt x 20pt.

-- page 2
-- This is the page with the link to the anchor (dest) on page one. A
-- link consists of three nodes: a pdf_start_link, a pdf_end_link and and 
-- action node that specifies the action to perform when the user clicks on
-- the link.
-- The pdf link must be inside a horizontal box, that's why we hpack() it.
-- The link_attr (link attributes) is optional, here it draws a yellowish border
-- around the link.
local start_link = node.new("whatsit","pdf_start_link")
local end_link   = node.new("whatsit","pdf_end_link")

start_link.width     = tenpt
start_link.height    = tenpt / 2
start_link.depth     = tenpt / 2
start_link.link_attr = "/C [0.9 1 0] /Border [0 0 2]"

start_link.action = node.new("action")
start_link.action.action_type = 1
start_link.action.action_id   = dest

local rule = mkrule(tenpt)
local hbox = hpack(start_link, rule, end_link)

-- This pagelist consists of an hbox whose contents is "start_link",
-- the 10pt x 10pt rule and the "end_link" node. 


-- Just to show you that you can get some memory usage statistics:

The idea is to create nodes, collect them in a list (here: pagelist) and use tex.shipout() to put the page into the PDF.

And what about real content?

Black squares won't make anyone happy. So we need a typeset paragraph. For that we start with a single glyph and a glue first. A glyph node has this structure:


whereas a glue is more complicated:


The glue item comes in pairs: a glue node and a glue_spec node. The information about the shrink/stretch values go into the glue_spec node. For example, the TeX glue 2pt plus 1fill has a width of 2*216, a stretch of 216 and a stretch_order of 3.

local g = node.new("glyph")
g.font = font.current()
g.lang = tex.language
g.char = 86 -- V

local hbox = node.hpack(g)
local vbox = node.vpack(hbox)


We create a glyph, pack it into an hbox (horizontal box), pack that hbox into a vbox (a vertical box) and write this box to the main vertical list. The char number (86 in the example above) is based on an internal encoding, so we have to keep an eye on what character is on what position. Now we do this for two glyphs:

local g1 = node.new("glyph")
g1.font = font.current()
g1.lang = tex.language
g1.char = 86

local g2 = node.new("glyph")
g2.font = font.current()
g2.lang = tex.language
g2.char = 97

g1.next = g2
g2.prev = g1

local hbox = node.hpack(g1)
local vbox = node.vpack(hbox)


Pretty much the same as above. The glyphs g1 and g2 are chained together by setting the next and prev pointer to each other, otherwise only glyph g1 gets into the hbox.

If you take a close look at the PDF, you see that the two glyphs are too far away, a (negative) kern should be inserted. This can be done by inserting a kern-node manually or you can ask TeX to do that for you. The last lines of the example above should then read:

local head,tail,success = node.kerning(g1)
local hbox = node.hpack(head)
local vbox = node.vpack(hbox)


(todo: glyph subtype, ligaturing)