.\" .\" Copyright (c) 2023 Benjamin Stürz .\" .\" Permission to use, copy, modify, and distribute this software for any .\" purpose with or without fee is hereby granted, provided that the above .\" copyright notice and this permission notice appear in all copies. .\" .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. .\" .Dd $Mdocdate$ .Dt LEXICGEN 1 .Os .Sh NAME .Nm lexicgen .Nd lexic generator .Sh SYNOPSIS .Nm .Op Fl v Ar visibility .Op Fl p Ar package .Op Fl o Ar output-file .Ar input-file .Sh DESCRIPTION .Nm is a clone of lex(1) for generating lexers using Lexic. See the FORMAT section for information about the format. .Pp The options are as follows: .Bl -tag -width -Ds .It Fl v Ar visibility Change the visibility of the lexer-internal data structures and functions. (default: private) .It Fl p Ar package Set a Java package for the generated class. .It Fl o Ar output-file Write the output to .Ar output-file instead of a generated path. If .Ar output-file is '-', then write to the standard output. .El .Sh FORMAT A valid file consists of three sections: .Bl -bullet -compact .It Declarations .It Definitions .It and Code (optional) .El The declaration section can contain the following statements: .Bl -tag -width -Ds .It Sy %import Ar package Import .Ar package . .It Sy %eof Ar name Declare an end of file token called .Ar name . Note: Should only be specified once. .It Sy %error Ar name Declare an error token called .Ar name . Note: Should only be specified once. .It Sy name No = Sy syntax No ; Define a variable called .Ar name containing .Ar syntax that can be refered to from token definitions. .El The definition section contains token definitions. A token definition consists of a .Ar name and a .Ar syntax expression . The name must consist of alphabetic characters. The following syntax expressions are supported: .Bl -tag -width -Ds .It Sy ident An identifier is a sequence of one or more alphabetic characters. Match if the rule identified by .Ar ident matches. .It Qq Sy string Match if the input matches all characters of .Ar string . .It Bq Sy range Match, if any of the fragments match. If .Ar range starts with the .Dq ^ character, then negate the matching result. The range consists of fragments. A fragment can be a single character, an escape sequence or a character range. Only single-character escape sequences are supported. A character range has the syntax a-b, where a is the first character and b is the last. .It Pq expr Grouping. .It expr? Zero or one counts of .Ar expr . .It expr+ One or more counts of .Ar expr . .It expr* Zero or more counts of .Ar expr . .It expr1 expr2 .Ar expr1 followed by .Ar expr2 . .It expr1 & expr2 Only match if both .Ar expr1 and .Ar expr2 match. .It expr1 \e expr2 Match only if .Ar expr1 matches, and .Ar expr2 doesn't. .It expr1 | expr2 Match if either or both of .Ar expr1 or .Ar expr2 match. .El The code sections contains any arbitrary code that will be put into the generated file. .Sh EXAMPLES The following code is an example of a lexicgen file: .Bd -literal // Declarations %import eu.bandm.tools.lexic.* %import java.io.IOException %import java.io.InputStreamReader %import java.nio.charset.StandardCharsets // Declare an end-of-file token called EOF. %eof EOF // Declare an error token called Error. %error Error Integer = "0" | [1-9][0-9]*; Ident = [a-zA-Z_][a-zA-Z_0-9]*; %% // Definitions Whitespace : [ \en\et\er\ef]+; // A File can be either an integer, or an identifier. File : Integer | Ident; %% // Code public static void main(String[] args) { // The construct() function is generated by lexicgen and returns a Lexer. var lexer = construct(); try (var rdr = new InputStreamReader(System.in, StandardCharsets.UTF_8)) { TokenSource tokens = lexer .lex(CodePointSource.read(rdr, e -> {})) .removeTypes(TokenType.Whitespace); while (true) { var token = tokens.get(); if (token.getType() == TokenType.EOF) break; System.out.println(token); } } catch (IOException e) {} } .Ed .Sh EXIT STATUS .Ex -std .Sh SEE ALSO .Xr lex 1 .Sh AUTHORS .An Benjamin Stürz Aq Mt benni@stuerz.xyz