Editing Process input buffer

From LuaTeXWiki

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 1: Line 1:
= Syntax =
+
=The process_input_buffer callback=
  
 
This [[Callbacks|callback]] is called whenever TeX needs a new input line from the file it is currently reading. The argument passed to the function registered there is a string representing the original input line; the function should return another string, to be processed by TeX, or <tt>nil</tt>; in the latter case, the original string is processed. So the function should be defined according to the following blueprint:
 
This [[Callbacks|callback]] is called whenever TeX needs a new input line from the file it is currently reading. The argument passed to the function registered there is a string representing the original input line; the function should return another string, to be processed by TeX, or <tt>nil</tt>; in the latter case, the original string is processed. So the function should be defined according to the following blueprint:
Line 14: Line 14:
  
  
= Examples =
+
=Encoding=
 
 
== Encoding ==
 
  
 
LuaTeX understands UTF-8 and UTF-8 only (ASCII is a part of it, however). A line containing invalid characters will produce an error message: <em>String contains an invalid utf-8 sequence.</em> Thus documents with different encoding must be converted, and this can be done in <tt>process_input_buffer</tt>.
 
LuaTeX understands UTF-8 and UTF-8 only (ASCII is a part of it, however). A line containing invalid characters will produce an error message: <em>String contains an invalid utf-8 sequence.</em> Thus documents with different encoding must be converted, and this can be done in <tt>process_input_buffer</tt>.
Line 27: Line 25:
 
</pre>
 
</pre>
  
=== Latin-1 ===
+
==Latin-1==
  
 
Reading a document written in Latin-1 (ISO-8859-1) is relatively straightforward, because the <tt>slnunicode</tt> library embedded in LuaTeX does the conversion at once.
 
Reading a document written in Latin-1 (ISO-8859-1) is relatively straightforward, because the <tt>slnunicode</tt> library embedded in LuaTeX does the conversion at once.
Line 51: Line 49:
 
The use of <tt>local</tt> variables ensures speed, and above all that those variables aren't defined outside the current chunk, for instance, the current <tt>\directlua</tt> call or the current Lua file; actually, the code could even be embedded between <tt>do</tt> and <tt>end</tt> and leave absolutely no trace whatsoever.
 
The use of <tt>local</tt> variables ensures speed, and above all that those variables aren't defined outside the current chunk, for instance, the current <tt>\directlua</tt> call or the current Lua file; actually, the code could even be embedded between <tt>do</tt> and <tt>end</tt> and leave absolutely no trace whatsoever.
  
=== Other 8-bit encodings ===
+
==Other 8-bit encodings==
  
 
When using other 8-bit encoding, the previous code won't work, because it defaults to Latin-1 only. Then one must convert each character one by one by setting up a table matching each input character with the Unicode value; that value can be passed to unicode.utf8.char to yield the desired character.
 
When using other 8-bit encoding, the previous code won't work, because it defaults to Latin-1 only. Then one must convert each character one by one by setting up a table matching each input character with the Unicode value; that value can be passed to unicode.utf8.char to yield the desired character.
Line 82: Line 80:
  
  
== TeX as a lightweight markup language ==
+
=TeX as a lightweight markup language=
  
 
The <tt>process_input_buffer</tt> can be put to an entirely different use, namely to preprocess input strings using some kind of lightweight markup and turn them into proper TeX.
 
The <tt>process_input_buffer</tt> can be put to an entirely different use, namely to preprocess input strings using some kind of lightweight markup and turn them into proper TeX.
Line 98: Line 96:
 
</pre>
 
</pre>
  
so as to produce: `This is in <em>italic</em> and this is in <strong>bold</strong>.' The following code does exactly that (the use of the percent sign and of <tt>\\</tt> without <tt>\noexpand</tt> means that this code is either in a Lua file or in the second version of <tt>\luacode</tt> as defined in [[Writing Lua in TeX#Backslash|Writing Lua in TeX]]; also, this code illustrates the registering of an anonymous function instead of a function variable as in the previous examples):
+
so as to produce: `This is in <em>italic</em> and this is in <strong>bold</strong>.' The following code does exactly that (the use of the percent sing and of <tt>\\</tt> without <tt>\noexpand</tt> means that this code is either in a Lua file or in the second version of <tt>\luacode</tt> as defined in [[Writing Lua in TeX#Backslash|Writing Lua in TeX]]; also, this code illustrates the registering of an anonymous function instead of a function variable as in the previous examples):
  
 
<pre>
 
<pre>
Line 111: Line 109:
 
</pre>
 
</pre>
  
What happens is that the original string is replaced with successive call to <tt>string.gsub</tt> (see [http://www.lua.org/manual/5.1/manual.html#pdf-string.gsub the Lua reference manual]), in which the captures in the patterns are replaced with themselves as arguments to the TeX function (the non-capture parts of the patterns are discarded). For instance, <tt>/a word/</tt> yields <tt>\italic{a word}</tt>. Note that with <tt>\bold</tt>, the asterisks in the pattern must be escaped with <tt>%</tt>, otherwise they would be interpreted as magic characters. The line can then be processed by TeX as usual.
+
What happens is that the original string is replaced with successive call to <tt>string.gsub</tt> (see [http://www.lua.org/manual/5.1/manual.html#pdf-string.gsub the Lua reference manual]), in which the captures in the patterns are replaced with themselves as arguments to the TeX function (the non-capture parts of the patterns are discarded). For instance, <tt>/a word/</tt> yields <tt>\italic{a word}</tt>. Note that with <tt>\bold</tt>, the asterisks in the pattern must be escaped with <tt>%</tt>, otherwise they would be interpreted as magic character. The line can then be processed by TeX as usual.
  
 
Only pairs of slashes or asterisks in the same line will be interpreted as markup, because lines are processed one by one and nothing is remembered from one line to the next (that can be implemented, but is a bit more complicated and dangerous). Hence, nothing will be in italics in the following example:
 
Only pairs of slashes or asterisks in the same line will be interpreted as markup, because lines are processed one by one and nothing is remembered from one line to the next (that can be implemented, but is a bit more complicated and dangerous). Hence, nothing will be in italics in the following example:

Please note that all contributions to LuaTeXWiki are considered to be released under the GNU Free Documentation License 1.3 (see LuaTeXWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel Editing help (opens in new window)