Difference between revisions of "Sort a token list"
From LuaTeXWiki
(Copied from the old bluwiki.com luatex wiki) |
(Copied from the old bluwiki.com luatex wiki) |
||
Line 1: | Line 1: | ||
− | + | = Sorting a list of tokens = | |
+ | |||
+ | Sorting in TeX can be done, as demonstrated in the paper "Sorting in TeX's mouth" by Bernd Raichle, but it is slow. Using Lua, sorting can be done much quicker. | ||
+ | |||
+ | The following code demonstrates sorting in PlainTeX-LuaTeX: | ||
+ | |||
+ | % We `preload' the necessary functions into the Lua state | ||
+ | % | ||
+ | % Note that comments inside \directlua must be TeX comments (`%'); | ||
+ | % Lua comments (`--') do not work correctly | ||
+ | % | ||
+ | % The code for transforming a string into a table and back | ||
+ | % has been copied from the Lua users wiki (page `MakingLuaLikePhp') | ||
+ | function explode(str,divider) | ||
+ | local result = {} | ||
+ | % Sanity check: | ||
+ | if divider == "" then | ||
+ | table.insert(result,str) | ||
+ | return result | ||
+ | end | ||
+ | % We search for the start of <divider> or an opening/closing | ||
+ | % brace. That way, we can jump through <str> instead of having to | ||
+ | % check every character | ||
+ | local search = "([{}" .. string.sub(divider,1,1) .. "])" | ||
+ | % The current search position | ||
+ | local curpos = 0 | ||
+ | % The position of the last divider | ||
+ | local divpos = 0 | ||
+ | % The number of open left braces | ||
+ | local braces = 0 | ||
+ | % The length of <divider> | ||
+ | local divlen = string.len(divider) | ||
+ | |||
+ | % Search for whatever comes first | ||
+ | for st,sp,first in function() return string.find(str,search,curpos) end | ||
+ | do | ||
+ | % Braces? Then increase/decrease the brace count | ||
+ | if first == "{" then | ||
+ | braces = braces + 1 | ||
+ | curpos = st + 1 | ||
+ | elseif first == "}" then | ||
+ | braces = braces - 1 | ||
+ | curpos = st + 1 | ||
+ | else | ||
+ | % Otherwise, if we are not inside braces, check if the | ||
+ | % position found is the start of <divider>. Then split | ||
+ | % the substring off <str> and store the first part in <result> | ||
+ | if braces == 0 and string.sub(str,st,st + divlen - 1) == divider then | ||
+ | table.insert(result,string.sub(str,divpos,st - 1)) | ||
+ | divpos = st + divlen | ||
+ | curpos = st + divlen | ||
+ | else | ||
+ | curpos = st + 1 | ||
+ | end | ||
+ | end | ||
+ | end | ||
+ | |||
+ | % Do not forget to add the last substring to the result table | ||
+ | if divpos < string.len(str) then | ||
+ | table.insert(result,string.sub(str,divpos)) | ||
+ | end | ||
+ | |||
+ | % Done | ||
+ | return result | ||
+ | end | ||
+ | |||
+ | function implode(t,div) | ||
+ | return table.concat(t,div) | ||
+ | end | ||
+ | } | ||
+ | |||
+ | % #1 = continuation (called with the sorted list as its only | ||
+ | % parameter), #2 = delimiter, #3 = <delimiter>-separated list | ||
+ | \def\sort#1#2#3{% | ||
+ | \directlua0{% | ||
+ | % We prevent expansion and then escape all characters special | ||
+ | % to lua in the parameters. This is necessary to prevent them | ||
+ | % from (a) executed right now and (b) being misunderstood by | ||
+ | % Lua. | ||
+ | % | ||
+ | % We convert the string into a table, delimited by a | ||
+ | % (unexpanded, escaped) delimiter: | ||
+ | local t = explode("\luaescapestring{\unexpanded{#3}}", | ||
+ | "\luaescapestring{\unexpanded{#2}}") | ||
+ | % We sort the table in-place: | ||
+ | table.sort(t) | ||
+ | % We output the table as a string. | ||
+ | % | ||
+ | % We use a single call to tex.print to prevent spaces in | ||
+ | % the output, since all tex.print calls except the last add | ||
+ | % an \endlinechar (LuaTeX manual 4.1.10.1): | ||
+ | tex.print("\luaescapestring{\unexpanded{#1}}{" .. implode(t," ") .. "}") | ||
+ | }% | ||
+ | } | ||
+ | |||
+ | |||
+ | % Output tokens without expansion | ||
+ | \def\typeout#1{\message{\unexpanded{[#1]}}} | ||
+ | |||
+ | |||
+ | % Example (text taken from Wikipedia, ``Hyperreality'') | ||
+ | \sort{\typeout}{ }{% | ||
+ | In semiotics and postmodern philosophy, the term hyperreality | ||
+ | characterizes the inability of consciousness to distinguish reality | ||
+ | from fantasy, especially \in technologically advanced postmodern | ||
+ | cultures. Hyperreality is a means of characterising the way | ||
+ | consciousness defines what is actually `real' in a world where a | ||
+ | multitude of media can radically shape and filter the original event | ||
+ | or experience being depicted. Some famous theorists of hyperreality | ||
+ | include Jean Baudrillard, \Albert Borgmann, Daniel Boorstin, and | ||
+ | Umberto Eco.} | ||
+ | |||
+ | \bye | ||
+ | |||
+ | Note: The above code correctly skips the delimiter if enclosed in braces. |
Latest revision as of 22:16, 7 December 2010
Sorting a list of tokens[edit]
Sorting in TeX can be done, as demonstrated in the paper "Sorting in TeX's mouth" by Bernd Raichle, but it is slow. Using Lua, sorting can be done much quicker.
The following code demonstrates sorting in PlainTeX-LuaTeX:
% We `preload' the necessary functions into the Lua state % % Note that comments inside \directlua must be TeX comments (`%'); % Lua comments (`--') do not work correctly % % The code for transforming a string into a table and back % has been copied from the Lua users wiki (page `MakingLuaLikePhp') function explode(str,divider) local result = {} % Sanity check: if divider == "" then table.insert(result,str) return result end % We search for the start of <divider> or an opening/closing % brace. That way, we can jump through <str> instead of having to % check every character local search = "([{}" .. string.sub(divider,1,1) .. "])" % The current search position local curpos = 0 % The position of the last divider local divpos = 0 % The number of open left braces local braces = 0 % The length of <divider> local divlen = string.len(divider) % Search for whatever comes first for st,sp,first in function() return string.find(str,search,curpos) end do % Braces? Then increase/decrease the brace count if first == "{" then braces = braces + 1 curpos = st + 1 elseif first == "}" then braces = braces - 1 curpos = st + 1 else % Otherwise, if we are not inside braces, check if the % position found is the start of <divider>. Then split % the substring off <str> and store the first part in <result> if braces == 0 and string.sub(str,st,st + divlen - 1) == divider then table.insert(result,string.sub(str,divpos,st - 1)) divpos = st + divlen curpos = st + divlen else curpos = st + 1 end end end % Do not forget to add the last substring to the result table if divpos < string.len(str) then table.insert(result,string.sub(str,divpos)) end % Done return result end function implode(t,div) return table.concat(t,div) end } % #1 = continuation (called with the sorted list as its only % parameter), #2 = delimiter, #3 = <delimiter>-separated list \def\sort#1#2#3{% \directlua0{% % We prevent expansion and then escape all characters special % to lua in the parameters. This is necessary to prevent them % from (a) executed right now and (b) being misunderstood by % Lua. % % We convert the string into a table, delimited by a % (unexpanded, escaped) delimiter: local t = explode("\luaescapestring{\unexpanded{#3}}", "\luaescapestring{\unexpanded{#2}}") % We sort the table in-place: table.sort(t) % We output the table as a string. % % We use a single call to tex.print to prevent spaces in % the output, since all tex.print calls except the last add % an \endlinechar (LuaTeX manual 4.1.10.1): tex.print("\luaescapestring{\unexpanded{#1}}{" .. implode(t," ") .. "}") }% } % Output tokens without expansion \def\typeout#1{\message{\unexpanded{[#1]}}} % Example (text taken from Wikipedia, ``Hyperreality) \sort{\typeout}{ }{% In semiotics and postmodern philosophy, the term hyperreality characterizes the inability of consciousness to distinguish reality from fantasy, especially \in technologically advanced postmodern cultures. Hyperreality is a means of characterising the way consciousness defines what is actually `real' in a world where a multitude of media can radically shape and filter the original event or experience being depicted. Some famous theorists of hyperreality include Jean Baudrillard, \Albert Borgmann, Daniel Boorstin, and Umberto Eco.} \bye
Note: The above code correctly skips the delimiter if enclosed in braces.