2 .\" Copyright (c) 2023 Benjamin Stürz
4 .\" Permission to use, copy, modify, and distribute this software for any
5 .\" purpose with or without fee is hereby granted, provided that the above
6 .\" copyright notice and this permission notice appear in all copies.
8 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
9 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
10 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
11 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
12 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
13 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
14 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
24 .Op Fl v Ar visibility
26 .Op Fl o Ar output-file
30 is a clone of lex(1) for generating lexers using Lexic.
31 See the FORMAT section for information about the format.
33 The options are as follows:
35 .It Fl v Ar visibility
36 Change the visibility of the lexer-internal data structures and functions. (default: private)
38 Set a Java package for the generated class.
39 .It Fl o Ar output-file
42 instead of a generated path.
45 is '-', then write to the standard output.
48 A valid file consists of three sections:
58 The declaration section can contain the following statements:
60 .It Sy %import Ar package
64 Declare an end of file token called
67 Note: Should only be specified once.
69 Declare an error token called
72 Note: Should only be specified once.
73 .It Sy name No = Sy syntax No ;
74 Define a variable called
78 that can be refered to from token definitions.
81 The definition section contains token definitions.
82 A token definition consists of a
85 .Ar syntax expression .
86 The name must consist of alphabetic characters.
88 The following syntax expressions are supported:
91 An identifier is a sequence of one or more alphabetic characters.
92 Match if the rule identified by
96 Match if the input matches all characters of
99 Match, if any of the fragments match.
104 character, then negate the matching result.
105 The range consists of fragments.
106 A fragment can be a single character, an escape sequence or a character range.
107 Only single-character escape sequences are supported.
108 A character range has the syntax a-b,
109 where a is the first character and b is the last.
113 Zero or one counts of
116 One or more counts of
119 Zero or more counts of
138 Match if either or both of
145 The code sections contains any arbitrary code that will be put into the generated file.
148 The following code is an example of a lexicgen file:
151 %import eu.bandm.tools.lexic.*
152 %import java.io.IOException
153 %import java.io.InputStreamReader
154 %import java.nio.charset.StandardCharsets
156 // Declare an end-of-file token called EOF.
159 // Declare an error token called Error.
162 Integer = "0" | [1-9][0-9]*;
163 Ident = [a-zA-Z_][a-zA-Z_0-9]*;
168 Whitespace : [ \en\et\er\ef]+;
169 // A File can be either an integer, or an identifier.
170 File : Integer | Ident;
175 public static void main(String[] args) {
176 // The construct() function is generated by lexicgen and returns a Lexer.
177 var lexer = construct();
179 try (var rdr = new InputStreamReader(System.in, StandardCharsets.UTF_8)) {
180 TokenSource<String, TokenType> tokens = lexer
181 .lex(CodePointSource.read(rdr, e -> {}))
182 .removeTypes(TokenType.Whitespace);
184 var token = tokens.get();
185 if (token.getType() == TokenType.EOF)
187 System.out.println(token);
189 } catch (IOException e) {}
197 .An Benjamin Stürz Aq Mt benni@stuerz.xyz