Difference between revisions of "Sort a token list"

From LuaTeXWiki
(Copied from the old bluwiki.com luatex wiki)
 
(Copied from the old bluwiki.com luatex wiki)
 
Line 1: Line 1:
http://luatex.bluwiki.com/go/Japanese_and_more_generally_CJK_typesetting
+
= Sorting a list of tokens =
 +
 
 +
Sorting in TeX can be done, as demonstrated in the paper "Sorting in TeX's mouth" by Bernd Raichle, but it is slow. Using Lua, sorting can be done much quicker.
 +
 
 +
The following code demonstrates sorting in PlainTeX-LuaTeX:
 +
 
 +
% We `preload' the necessary functions into the Lua state
 +
%
 +
% Note that comments inside \directlua must be TeX comments (`%');
 +
% Lua comments (`--') do not work correctly
 +
%
 +
% The code for transforming a string into a table and back
 +
% has been copied from the Lua users wiki (page `MakingLuaLikePhp')
 +
function explode(str,divider)
 +
local result = {}
 +
% Sanity check:
 +
if divider == "" then
 +
table.insert(result,str)
 +
return result
 +
end
 +
% We search for the start of <divider> or an opening/closing
 +
% brace. That way, we can jump through <str> instead of having to
 +
% check every character
 +
local search = "([{}" .. string.sub(divider,1,1) .. "])"
 +
% The current search position
 +
local curpos = 0
 +
% The position of the last divider
 +
local divpos = 0
 +
% The number of open left braces
 +
local braces = 0
 +
% The length of <divider>
 +
local divlen = string.len(divider)
 +
 +
% Search for whatever comes first
 +
for st,sp,first in function() return string.find(str,search,curpos) end
 +
do
 +
% Braces? Then increase/decrease the brace count
 +
if first == "{" then
 +
braces = braces + 1
 +
curpos = st + 1
 +
elseif first == "}" then
 +
braces = braces - 1
 +
curpos = st + 1
 +
else
 +
% Otherwise, if we are not inside braces, check if the
 +
% position found is the start of <divider>. Then split
 +
% the substring off <str> and store the first part in <result>
 +
if braces == 0 and string.sub(str,st,st + divlen - 1) == divider then
 +
table.insert(result,string.sub(str,divpos,st - 1))
 +
divpos = st + divlen
 +
curpos = st + divlen
 +
else
 +
curpos = st + 1
 +
end
 +
end
 +
end
 +
 +
% Do not forget to add the last substring to the result table
 +
if divpos < string.len(str) then
 +
table.insert(result,string.sub(str,divpos))
 +
end
 +
 +
% Done
 +
return result
 +
end
 +
 +
function implode(t,div)
 +
return table.concat(t,div)
 +
end
 +
}
 +
 +
% #1 = continuation (called with the sorted list as its only
 +
% parameter), #2 = delimiter, #3 = <delimiter>-separated list
 +
\def\sort#1#2#3{%
 +
\directlua0{%
 +
% We prevent expansion and then escape all characters special
 +
% to lua in the parameters. This is necessary to prevent them
 +
% from (a) executed right now and (b) being misunderstood by
 +
% Lua.
 +
%
 +
% We convert the string into a table, delimited by a
 +
% (unexpanded, escaped) delimiter:
 +
local t = explode("\luaescapestring{\unexpanded{#3}}",
 +
  "\luaescapestring{\unexpanded{#2}}")
 +
% We sort the table in-place:
 +
table.sort(t)
 +
% We output the table as a string.
 +
%
 +
% We use a single call to tex.print to prevent spaces in
 +
% the output, since all tex.print calls except the last add
 +
% an \endlinechar (LuaTeX manual 4.1.10.1):
 +
tex.print("\luaescapestring{\unexpanded{#1}}{" .. implode(t," ") .. "}")
 +
}%
 +
}
 +
 +
 +
% Output tokens without expansion
 +
\def\typeout#1{\message{\unexpanded{[#1]}}}
 +
 +
 +
% Example (text taken from Wikipedia, ``Hyperreality'')
 +
\sort{\typeout}{ }{%
 +
In semiotics and postmodern philosophy, the term hyperreality
 +
characterizes the inability of consciousness to distinguish reality
 +
from fantasy, especially \in technologically advanced postmodern
 +
cultures. Hyperreality is a means of characterising the way
 +
consciousness defines what is actually `real' in a world where a
 +
multitude of media can radically shape and filter the original event
 +
or experience being depicted. Some famous theorists of hyperreality
 +
include Jean Baudrillard, \Albert Borgmann, Daniel Boorstin, and
 +
Umberto Eco.}
 +
 +
\bye
 +
 
 +
Note: The above code correctly skips the delimiter if enclosed in braces.

Latest revision as of 22:16, 7 December 2010

Sorting a list of tokens[edit]

Sorting in TeX can be done, as demonstrated in the paper "Sorting in TeX's mouth" by Bernd Raichle, but it is slow. Using Lua, sorting can be done much quicker.

The following code demonstrates sorting in PlainTeX-LuaTeX:

% We `preload' the necessary functions into the Lua state
% 
% Note that comments inside \directlua must be TeX comments (`%');
% Lua comments (`--') do not work correctly
% 
% The code for transforming a string into a table and back
% has been copied from the Lua users wiki (page `MakingLuaLikePhp')
function explode(str,divider)
	local result = {}
	% Sanity check:
	if divider == "" then
		table.insert(result,str)
		return result
	end
	% We search for the start of <divider> or an opening/closing
	% brace. That way, we can jump through <str> instead of having to
	% check every character
	local search = "([{}" .. string.sub(divider,1,1) .. "])"
	% The current search position
	local curpos = 0
	% The position of the last divider
	local divpos = 0
	% The number of open left braces
	local braces = 0
	% The length of <divider>
	local divlen = string.len(divider)
	
	% Search for whatever comes first
	for st,sp,first in function() return string.find(str,search,curpos) end
	do
		% Braces? Then increase/decrease the brace count
		if first == "{" then
			braces = braces + 1
			curpos = st + 1
		elseif first == "}" then
			braces = braces - 1
			curpos = st + 1
		else
			% Otherwise, if we are not inside braces, check if the 
			% position found is the start of <divider>. Then split 
			% the substring off <str> and store the first part in <result>
			if braces == 0 and string.sub(str,st,st + divlen - 1) == divider then
				table.insert(result,string.sub(str,divpos,st - 1))
				divpos = st + divlen
				curpos = st + divlen
			else
				curpos = st + 1
			end
		end
	end
	
	% Do not forget to add the last substring to the result table
	if divpos < string.len(str) then
		table.insert(result,string.sub(str,divpos))
	end
	
	% Done
	return result
end

function implode(t,div)
	return table.concat(t,div)
end
}

% #1 = continuation (called with the sorted list as its only
% parameter), #2 = delimiter, #3 = <delimiter>-separated list
\def\sort#1#2#3{%
	\directlua0{%
		% We prevent expansion and then escape all characters special
		% to lua in the parameters. This is necessary to prevent them
		% from (a) executed right now and (b) being misunderstood by
		% Lua.
		% 
		% We convert the string into a table, delimited by a
		% (unexpanded, escaped) delimiter:
		local t = explode("\luaescapestring{\unexpanded{#3}}",
				  "\luaescapestring{\unexpanded{#2}}")
		% We sort the table in-place:
		table.sort(t)
		% We output the table as a string.
		% 
		% We use a single call to tex.print to prevent spaces in 
		% the output, since all tex.print calls except the last add 
		% an \endlinechar (LuaTeX manual 4.1.10.1):
		tex.print("\luaescapestring{\unexpanded{#1}}{" .. implode(t," ") .. "}")
	}%
}


% Output tokens without expansion
\def\typeout#1{\message{\unexpanded{[#1]}}}


% Example (text taken from Wikipedia, ``Hyperreality)
\sort{\typeout}{ }{%
In semiotics and postmodern philosophy, the term hyperreality
characterizes the inability of consciousness to distinguish reality
from fantasy, especially \in technologically advanced postmodern
cultures. Hyperreality is a means of characterising the way
consciousness defines what is actually `real' in a world where a
multitude of media can radically shape and filter the original event
or experience being depicted. Some famous theorists of hyperreality
include Jean Baudrillard, \Albert Borgmann, Daniel Boorstin, and
Umberto Eco.}

\bye

Note: The above code correctly skips the delimiter if enclosed in braces.