Blob


1 .\"
2 .\" Copyright (c) 2023 Benjamin Stürz
3 .\"
4 .\" Permission to use, copy, modify, and distribute this software for any
5 .\" purpose with or without fee is hereby granted, provided that the above
6 .\" copyright notice and this permission notice appear in all copies.
7 .\"
8 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
9 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
10 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
11 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
12 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
13 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
14 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
15 .\"
16 .Dd $Mdocdate$
17 .Dt LEXICGEN 1
18 .Os
19 .Sh NAME
20 .Nm lexicgen
21 .Nd lexic generator
22 .Sh SYNOPSIS
23 .Nm
24 .Op Fl v Ar visibility
25 .Op Fl p Ar package
26 .Op Fl o Ar output-file
27 .Ar input-file
28 .Sh DESCRIPTION
29 .Nm
30 is a clone of lex(1) for generating lexers using Lexic.
31 See the FORMAT section for information about the format.
32 .Pp
33 The options are as follows:
34 .Bl -tag -width -Ds
35 .It Fl v Ar visibility
36 Change the visibility of the lexer-internal data structures and functions. (default: private)
37 .It Fl p Ar package
38 Set a Java package for the generated class.
39 .It Fl o Ar output-file
40 Write the output to
41 .Ar output-file
42 instead of a generated path.
43 If
44 .Ar output-file
45 is '-', then write to the standard output.
46 .El
47 .Sh FORMAT
48 A valid file consists of three sections:
49 .Bl -bullet -compact
50 .It
51 Declarations
52 .It
53 Definitions
54 .It
55 and Code (optional)
56 .El
58 The declaration section can contain the following statements:
59 .Bl -tag -width -Ds
60 .It Sy %import Ar package
61 Import
62 .Ar package .
63 .It Sy %eof Ar name
64 Declare an end of file token called
65 .Ar name .
67 Note: Should only be specified once.
68 .It Sy %error Ar name
69 Declare an error token called
70 .Ar name .
72 Note: Should only be specified once.
73 .It Sy name No = Sy syntax No ;
74 Define a variable called
75 .Ar name
76 containing
77 .Ar syntax
78 that can be refered to from token definitions.
79 .El
81 The definition section contains token definitions.
82 A token definition consists of a
83 .Ar name
84 and a
85 .Ar syntax expression .
86 The name must consist of alphabetic characters.
88 The following syntax expressions are supported:
89 .Bl -tag -width -Ds
90 .It Sy ident
91 An identifier is a sequence of one or more alphabetic characters.
92 Match if the rule identified by
93 .Ar ident
94 matches.
95 .It Qq Sy string
96 Match if the input matches all characters of
97 .Ar string .
98 .It Bq Sy range
99 Match, if any of the fragments match.
100 If
101 .Ar range
102 starts with the
103 .Dq ^
104 character, then negate the matching result.
105 The range consists of fragments.
106 A fragment can be a single character, an escape sequence or a character range.
107 Only single-character escape sequences are supported.
108 A character range has the syntax a-b,
109 where a is the first character and b is the last.
110 .It Pq expr
111 Grouping.
112 .It expr?
113 Zero or one counts of
114 .Ar expr .
115 .It expr+
116 One or more counts of
117 .Ar expr .
118 .It expr*
119 Zero or more counts of
120 .Ar expr .
121 .It expr1 expr2
122 .Ar expr1
123 followed by
124 .Ar expr2 .
125 .It expr1 & expr2
126 Only match if both
127 .Ar expr1
128 and
129 .Ar expr2
130 match.
131 .It expr1 \e expr2
132 Match only if
133 .Ar expr1
134 matches, and
135 .Ar expr2
136 doesn't.
137 .It expr1 | expr2
138 Match if either or both of
139 .Ar expr1
140 or
141 .Ar expr2
142 match.
143 .El
145 The code sections contains any arbitrary code that will be put into the generated file.
147 .Sh EXAMPLES
148 The following code is an example of a lexicgen file:
149 .Bd -literal
150 // Declarations
151 %import eu.bandm.tools.lexic.*
152 %import java.io.IOException
153 %import java.io.InputStreamReader
154 %import java.nio.charset.StandardCharsets
156 // Declare an end-of-file token called EOF.
157 %eof EOF
159 // Declare an error token called Error.
160 %error Error
162 Integer = "0" | [1-9][0-9]*;
163 Ident = [a-zA-Z_][a-zA-Z_0-9]*;
165 %%
166 // Definitions
168 Whitespace : [ \en\et\er\ef]+;
169 // A File can be either an integer, or an identifier.
170 File : Integer | Ident;
172 %%
173 // Code
175 public static void main(String[] args) {
176 // The construct() function is generated by lexicgen and returns a Lexer.
177 var lexer = construct();
179 try (var rdr = new InputStreamReader(System.in, StandardCharsets.UTF_8)) {
180 TokenSource<String, TokenType> tokens = lexer
181 .lex(CodePointSource.read(rdr, e -> {}))
182 .removeTypes(TokenType.Whitespace);
183 while (true) {
184 var token = tokens.get();
185 if (token.getType() == TokenType.EOF)
186 break;
187 System.out.println(token);
189 } catch (IOException e) {}
191 .Ed
192 .Sh EXIT STATUS
193 .Ex -std
194 .Sh SEE ALSO
195 .Xr lex 1
196 .Sh AUTHORS
197 .An Benjamin Stürz Aq Mt benni@stuerz.xyz