Scannerless parsing

An algorithm that performs tokenization and parsing in a single step
(Learn how and when to remove this message)

In computer science, scannerless parsing (also called lexerless parsing) performs tokenization (breaking a stream of characters into words) and parsing (arranging the words into phrases) in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. A language grammar is scannerless if it uses a single formalism to express both the lexical (word level) and phrase level structure of the language.

Dividing processing into a lexer followed by a parser is more modular; scannerless parsing is primarily used when a clear lexer–parser distinction is unneeded or unwanted. Examples of when this is appropriate include TeX, most wiki grammars, makefiles, simple application-specific scripting languages, and Raku.

Advantages

Disadvantages

Implementations

Notes

References

  1. ^ Economopoulos, Giorgios; Klint, Paul; Vinju, Jurgen (2009). "Faster Scannerless GLR Parsing" (PDF). Compiler Construction. Lecture Notes in Computer Science. Vol. 5501. pp. 126–141. doi:10.1007/978-3-642-00722-4_10. ISBN 978-3-642-00721-7.

Further reading

  • v
  • t
  • e
Top-down
Bottom-up
Mixed, other
Related topics