%query: read_tokens(o,o). % File : RDTOK.PL % Author : R.A.O'Keefe % Updated: 2 July 1984 % Purpose: Tokeniser in reasonably standard Prolog. /* This tokeniser is meant to complement the library READ routine. It recognises Dec-10 Prolog with the following exceptions: %( is not accepted as an alternative to { %) is not accepted as an alternative to ) NOLC convention is not supported (read_name could be made to do it) ,.. is not accepted as an alternative to | (hooray!) large integers are not read in as xwd(Top18Bits,Bottom18Bits) After a comma, "(" is read as ' (' rather than '('. This does the parser no harm at all, and the Dec-10 tokeniser's behaviour here doesn't actually buy you anything. This tokeniser guarantees never to return '(' except immediately after an atom, yielding ' (' every other where. In particular, radix notation is EXACTLY as in Dec-10 Prolog version 3.53. Some times might be of interest. Applied to an earlier version of this file: this code took 1.66 seconds the Dec-10 tokeniser took 1.28 seconds A Pascal version took 0.96 seconds The Dec-10 tokeniser was called via the old RDTOK interface, with which this file is compatible. One reason for the difference in speed is the way variables are looked up: this code uses a linear list, while the Dec-10 tokeniser uses some sort of tree. The Pascal version is the program WLIST which lists "words" and their frequencies. It uses a hash table. Another difference is the way characters are classified: the Dec-10 tokeniser and WLIST have a table which maps ASCII codes to character classes, and don't do all this comparison and and memberchking. We could do that without leaving standard Prolog, but what do you want from one evening's work? */ % :- public % read_tokens/2. % % :- mode % read_after_atom(+, ?, -), % read_digits(+, -, -), % read_fullstop(+, ?, -), % read_integer(+, -, -), % read_lookup(?, +), % read_name(+, -, -), % read_solidus(+, ?, -), % read_solidus(+, -), % read_string(-, +, -), % read_string(+, -, +, -), % more_string(+, +, -, -), % read_symbol(+, -, -), % read_tokens(?, ?), % read_tokens(+, ?, -). % % % % read_tokens(TokenList, Dictionary) % returns a list of tokens. It is needed to "prime" read_tokens/2 % with the initial blank, and to check for end of file. The % Dictionary is a list of AtomName=Variable pairs in no particular order. % The way end of file is handled is that everything else FAILS when it % hits character "-1", sometimes printing a warning. It might have been % an idea to return the atom 'end_of_file' instead of the same token list % that you'd have got from reading "end_of_file. ", but (1) this file is % for compatibility, and (b) there are good practical reasons for wanting % this behaviour. %:- entry(read_tokens(X,Y),[share([[X],[Y]]),free([X,Y])]). goal :- read_tokens(TokenList, Dict). read_tokens(TokenList, Dictionary) :- read_tokens(32, Dict, ListOfTokens), append(Dict, [], Dict), !, % fill in the "hole" at the end Dictionary = Dict, % unify explicitly so we'll read and TokenList = ListOfTokens. % then check even with filled in arguments read_tokens([atom(end_of_file)], []). % End Of File is all that can go wrong read_tokens(-1, _X, _Y) :- !, % -1 is the end-of-file character fail. % in every standard Prolog read_tokens(Ch, Dict, Tokens) :- Ch =< 32, % ignore layout. CR, LF, and the !, % ASCII newline character (10) get0(NextCh), % are all skipped here. read_tokens(NextCh, Dict, Tokens). read_tokens(37, Dict, Tokens) :- !, % %comment repeat, % skip characters to a line get0(Ch), % terminator (should we be ( Ch = 10 ; Ch = -1 ), % more thorough, e.g. ^L?)
content may be truncated. 'popout' for larger text window.