| This directory contains data needed by Bison. |
| |
| # Directory Content |
| ## Skeletons |
| Bison skeletons: the general shapes of the different parser kinds, that are |
| specialized for specific grammars by the bison program. |
| |
| Currently, the supported skeletons are: |
| |
| - yacc.c |
| It used to be named bison.simple: it corresponds to C Yacc |
| compatible LALR(1) parsers. |
| |
| - lalr1.cc |
| Produces a C++ parser class. |
| |
| - lalr1.java |
| Produces a Java parser class. |
| |
| - glr.c |
| A Generalized LR C parser based on Bison's LALR(1) tables. |
| |
| - glr.cc |
| A Generalized LR C++ parser. Actually a C++ wrapper around glr.c. |
| |
| These skeletons are the only ones supported by the Bison team. Because the |
| interface between skeletons and the bison program is not finished, *we are |
| not bound to it*. In particular, Bison is not mature enough for us to |
| consider that "foreign skeletons" are supported. |
| |
| ## m4sugar |
| This directory contains M4sugar, sort of an extended library for M4, which |
| is used by Bison to instantiate the skeletons. |
| |
| ## xslt |
| This directory contains XSLT programs that transform Bison's XML output into |
| various formats. |
| |
| - bison.xsl |
| A library of routines used by the other XSLT programs. |
| |
| - xml2dot.xsl |
| Conversion into GraphViz's dot format. |
| |
| - xml2text.xsl |
| Conversion into text. |
| |
| - xml2xhtml.xsl |
| Conversion into XHTML. |
| |
| # Implementation Notes About the Skeletons |
| |
| "Skeleton" in Bison parlance means "backend": a skeleton is fed by the bison |
| executable with LR tables, facts about the symbols, etc. and they generate |
| the output (say parser.cc, parser.hh, location.hh, etc.). They are only in |
| charge of generating the parser and its auxiliary files, they do not |
| generate the XML output, the parser.output reports, nor the graphical |
| rendering. |
| |
| The bits of information passing from bison to the backend is named |
| "muscles". Muscles are passed to M4 via its standard input: it's a set of |
| m4 definitions. To see them, use `--trace=muscles`. |
| |
| Except for muscles, whose names are generated by bison, the skeletons have |
| no constraint at all on the macro names: there is no technical/theoretical |
| limitation, as long as you generate the output, you can do what you want. |
| However, of course, that would be a bad idea if, say, the C and C++ |
| skeletons used different approaches and had completely different |
| implementations. That would be a maintenance nightmare. |
| |
| Below, we document some of the macros that we use in several of the |
| skeletons. If you are to write a new skeleton, please, implement them for |
| your language. Overall, be sure to follow the same patterns as the existing |
| skeletons. |
| |
| ## Vocabulary |
| |
| We use "formal arguments", or "formals" for short, to denote the declared |
| parameters of a function (e.g., `int argc, const char **argv`). Yes, this |
| is somewhat contradictory with `param` in the `%param` directives. |
| |
| We use "effective arguments", or "args" for short, to denote the values |
| passed in function calls (e.g., `argc, argv`). |
| |
| ## Symbols |
| |
| ### `b4_symbol(NUM, FIELD)` |
| In order to unify the handling of the various aspects of symbols (tag, type |
| name, whether terminal, etc.), bison.exe defines one macro per (token, |
| field), where field can `has_id`, `id`, etc.: see |
| `prepare_symbol_definitions()` in `src/output.c`. |
| |
| NUM can be: |
| - `empty` to denote the "empty" pseudo-symbol when it exists, |
| - `eof`, `error`, or `undef` |
| - a symbol number. |
| |
| FIELD can be: |
| |
| - `has_id`: 0 or 1 |
| Whether the symbol has an `id`. |
| |
| - `id`: string (e.g., `exp`, `NUM`, or `TOK_NUM` with api.token.prefix) |
| If `has_id`, the name of the token kind (prefixed by api.token.prefix if |
| defined), otherwise empty. Guaranteed to be usable as a C identifier. |
| This is used to define the token kind (i.e., the enum used by the return |
| value of yylex). Should be named `token_kind`. |
| |
| - `tag`: string |
| A human readable representation of the symbol. Can be `'foo'`, |
| `'foo.id'`, `'"foo"'` etc. |
| |
| - `code`: integer |
| The token code associated to the token kind `id`. |
| The external number as used by yylex. Can be ASCII code when a character, |
| some number chosen by bison, or some user number in the case of `%token |
| FOO <NUM>`. Corresponds to `yychar` in `yacc.c`. |
| |
| - `is_token`: 0 or 1 |
| Whether this is a terminal symbol. |
| |
| - `kind_base`: string (e.g., `YYSYMBOL_exp`, `YYSYMBOL_NUM`) |
| The base of the symbol kind, i.e., the enumerator of this symbol (token or |
| nonterminal) which is mapped to its `number`. |
| |
| - `kind`: string |
| Same as `kind_base`, but possibly with a prefix in some languages. E.g., |
| EOF's `kind_base` and `kind` are `YYSYMBOL_YYEOF` in C, but are |
| `S_YYEMPTY` and `symbol_kind::S_YYEMPTY` in C++. |
| |
| - `number`: integer |
| The code associated to the `kind`. |
| The internal number (computed from the external number by yytranslate). |
| Corresponds to yytoken in yacc.c. This is the same number that serves as |
| key in b4_symbol(NUM, FIELD). |
| |
| In bison, symbols are first assigned increasing numbers in order of |
| appearance (but tokens first, then nterms). After grammar reduction, |
| unused nterms are then renumbered to appear last (i.e., first tokens, then |
| used nterms and finally unused nterms). This final number NUM is the one |
| contained in this field, and it is the one used as key in `b4_symbol(NUM, |
| FIELD)`. |
| |
| The code of the rule actions, however, is emitted before we know what |
| symbols are unused, so they use the original numbers. To avoid confusion, |
| they actually use "orig NUM" instead of just "NUM". bison also emits |
| definitions for `b4_symbol(orig NUM, number)` that map from original |
| numbers to the new ones. `b4_symbol` actually resolves `orig NUM` in the |
| other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the |
| symbols whose original number was 42. |
| |
| - `has_type`: 0, 1 |
| Whether has a semantic value. |
| |
| - `type_tag`: string |
| When api.value.type=union, the generated name for the union member. |
| yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc. |
| |
| - `type`: string |
| If it has a semantic value, its type tag, or, if variant are used, |
| its type. |
| In the case of api.value.type=union, type is the real type (e.g. int). |
| |
| - `slot`: string |
| If it has a semantic value, the name of the union member (i.e., bounces to |
| either `type_tag` or `type`). It would be better to fix our mess and |
| always use `type` for the true type of the member, and `type_tag` for the |
| name of the union member. |
| |
| - `has_printer`: 0, 1 |
| - `printer`: string |
| - `printer_file`: string |
| - `printer_line`: integer |
| - `printer_loc`: location |
| If the symbol has a printer, everything about it. |
| |
| - `has_destructor`, `destructor`, `destructor_file`, `destructor_line`, `destructor_loc` |
| Likewise. |
| |
| ### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])` |
| Expansion of $$, $1, $<TYPE-TAG>3, etc. |
| |
| The semantic value from a given VAL. |
| - `VAL`: some semantic value storage (typically a union). e.g., `yylval` |
| - `SYMBOL-NUM`: the symbol number from which we extract the type tag. |
| - `TYPE-TAG`, the user forced the `<TYPE-TAG>`. |
| |
| The result can be used safely, it is put in parens to avoid nasty precedence |
| issues. |
| |
| ### `b4_lhs_value(SYMBOL-NUM, [TYPE])` |
| Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`. |
| |
| ### `b4_rhs_data(RULE-LENGTH, POS)` |
| The data corresponding to the symbol `#POS`, where the current rule has |
| `RULE-LENGTH` symbols on RHS. |
| |
| ### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])` |
| Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols |
| on RHS. |
| |
| <!-- |
| |
| Local Variables: |
| mode: markdown |
| fill-column: 76 |
| ispell-dictionary: "american" |
| End: |
| |
| Copyright (C) 2002, 2008-2015, 2018-2022, 2025 Free Software Foundation, |
| Inc. |
| |
| This file is part of GNU Bison. |
| |
| This program is free software: you can redistribute it and/or modify |
| it under the terms of the GNU General Public License as published by |
| the Free Software Foundation, either version 3 of the License, or |
| (at your option) any later version. |
| |
| This program is distributed in the hope that it will be useful, |
| but WITHOUT ANY WARRANTY; without even the implied warranty of |
| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| GNU General Public License for more details. |
| |
| You should have received a copy of the GNU General Public License |
| along with this program. If not, see <https://www.gnu.org/licenses/>. |
| |
| --> |