| This is the todo list for texi2any |
| |
| Copyright 2012-2025 Free Software Foundation. |
| |
| Copying and distribution of this file, with or without modification, |
| are permitted in any medium without royalty provided the copyright |
| notice and this notice are preserved. |
| |
| |
| Before next release |
| =================== |
| |
| update libintl-perl before the release |
| |
| system.h macros are duplicated in C/main/utils.h. Ok? |
| |
| Update paths to htmlxref.d/Texinfo_*.cnf in these files and in README-hacking |
| and in texinfo.texi |
| |
| document command_name in texi2any HTML customization API |
| |
| |
| Bugs |
| ==== |
| |
| HTML API |
| ======== |
| |
| Issues |
| ------ |
| |
| Some private function used in conversion |
| _convert_printindex_command |
| _new_document_context |
| _convert_def_line_type |
| _set_code_context |
| _pop_code_context |
| |
| |
| Missing documentation |
| ===================== |
| |
| Tree documentation in ParserNonXS.pm |
| ------------------------------------ |
| |
| elided_rawpreformatted, elided_brace_command_arg types. |
| |
| 'comment_at_end' in info hash |
| |
| alias_of in info hash |
| |
| source marks. |
| |
| special_unit_element type (only in HTML code) |
| |
| Other |
| ----- |
| |
| Document *XS_EXTERNAL_FORMATTING *XS_EXTERNAL_CONVERSION? |
| |
| No documentation of Texinfo::Options hashes used in converters: |
| multiple_at_command_options, converter_cmdline_options, |
| converter_customization_options, unique_at_command_options. |
| |
| |
| Texinfo tree reader |
| =================== |
| |
| Go through misc_args extra as if going through regular arguments and not |
| as if it was an attribute. |
| |
| Treat spaces in info as tree elements, not as attributes. With ignorable |
| as a category. Maybe treat verb delimiter the same. And also braces |
| of types with implicit braces? |
| |
| Categories: |
| ignorable text |
| text |
| element begin |
| element end |
| |
| accessors for all: |
| * parent |
| * source marks |
| * type? |
| |
| accessors for elements: |
| * command_name/and or enum |
| * attributes (extra/some string info) |
| * associated_unit? |
| |
| |
| Delayed bugs/features |
| ===================== |
| |
| There is disagreement on the strategy to use for libraries versioning. |
| For Gavin, the code that can link against libraries should be the code |
| with the exact same version than the libraries. |
| For Patrice it is possible to explicitly say that mixing versions is |
| not supported, but still follow the libtool versioning scheme such that |
| if better and compatible libraries are installed they will be prefered |
| to same version library. |
| |
| Could libraries be compiled with Perl C compiler or other C compiler |
| and be mixed at the library level? Ask on the list. |
| |
| Gavin on using CSV files and a Perl script to generate code for Perl and C. |
| For something like html_style_commands_element.csv, it could potentially |
| make the code more impenetrable. Suppose for example someone wants to |
| find which part of the code handles the @sansserif command. If they |
| search HTML.pm for 'sansserif' that string isn't there. As this file |
| is shorter, there is less benefit to avoiding duplicating its contents. |
| However, the purpose, structure and meaning of this file is quite clear. |
| (Files such as HTML.pm are also not self-contained, accessing information |
| in files such as Commands.pm, so having another file to access does not |
| really change the situation.) |
| |
| For shorter files like default_special_unit_info.csv and |
| default_direction_strings.csv it is less clear that it offers a net |
| benefit. To be honest, the function of these files is not particularly |
| clear to me other than one of them is something to do with node direction |
| pointers. I don't someone looking at these files for the first time |
| would have an easy time figuring out what they are for. |
| |
| |
| Make building "source marks" optional? |
| |
| |
| hyphenation: should only appear in toplevel. |
| |
| |
| Some dubious nesting could be warned against. The parsers context |
| command stacks could be used for that. |
| |
| Some erroneous constructs not already warned against: |
| |
| @table in @menu |
| |
| @example |
| @heading A heading |
| @end example |
| Example in heading/heading_in_example. |
| |
| @group outside of @example (maybe there is no need for command stack for |
| this one if @group can only appear directly |
| in @example). |
| |
| There is no warning with a block command between @def* and @def*x, |
| only for paragraph. Not sure that this can be detected with |
| the context command stack. |
| |
| @defun a b c d e f |
| |
| @itemize @minus |
| truc |
| @item t |
| @end itemize |
| |
| @defunx g h j k l m |
| |
| @end defun |
| |
| |
| Modules included in tta/maintain/lib/ are stable, but still need |
| to be updated from time to time. |
| |
| Unicode::EastAsianWidth \p{InFullwidth} could be replaced |
| by native \p{East_Asian_Width=Fullwidth} + \p{East_Asian_Width=Wide} |
| when the oldest supported Perl version is 5.12.0 (released in 2010). |
| |
| |
| Transliteration/protection with iconv in C leads to a result different from Perl |
| for some characters. It seems that the iconv result depends on the locale, and |
| there are quite a bit of ? output, probably when there is no obvious |
| transliteration. In those cases, the Unidecode transliterations are not |
| necessarily very good, either. |
| |
| |
| Sorting indices in C with strxfrm_l using an utf-8 locale with |
| LC_COLLATE_MASK on Debian GNU/Linux with glibc is quite consistent with Perl |
| for number and letters, but leads to a different output than with Perl for non |
| alphanumeric characters. It is because in Perl we set |
| 'variable' => 'Non-Ignorable' to set Variable Weighting to Non-ignorable (see |
| http://www.unicode.org/reports/tr10/#Variable_Weighting). |
| For spaces, the output with Non-Ignorable Variable Weighting looks better for |
| index sorting, as it allows to have spaces and punctuation marks sort before |
| letters. Right now, the XS code calls Perl to get the sorting |
| collation strings with Non-Ignorable Variable Weighting. The |
| undocumented XS_STRXFRM_COLLATION_LOCALE customization variable can be used |
| to specify a locale and use it with strxfrm_l to sort, but it is only |
| for testing and should not be kept in the long term, the plan is to replace by |
| C code that sets Variable Weighting to Non-ignorable and before that keep |
| calling Perl. |
| Related glibc enhancement request: |
| request for Non-Ignorable Variable Weighting Unicode collation |
| https://sourceware.org/bugzilla/show_bug.cgi?id=31658 |
| |
| |
| Missing tests |
| ============= |
| |
| There is a test of translation in parser in a translation in converter, in |
| |
| tta/perl/t/init_files_tests.t translation_in_parser_in_translation |
| |
| It would be nice to also have a translation in parser in a translation |
| in parser. That would mean having a po/gmo file where the string |
| translated in the parser for @def* indices, for instance "{name} of {class}" |
| is translated to a string including @def* commands, like |
| @deftypeop a b c d e f |
| AA |
| @end deftypeop |
| |
| @documentlanguage fr |
| |
| @deftypemethod g h i j k l |
| BB |
| @end deftypemethod |
| |
| |
| Unit test of end_line_count for Texinfo/Convert/Paragraph.pm .... containers. |
| |
| anchor in flushright, on an empty line, with a current byte offset. |
| |
| |
| Future features |
| =============== |
| |
| Add the possibility to add text to a parsed document by restarting |
| parsing, when called as parse_texi_piece or parse_texi_line, by |
| storing the parser document state not already in document in document. |
| There would be a customization variable to set the parser to be |
| restartable, and then parse_texi_piece and parse_texi_line could pass |
| a document to retrieve the parsing state. This should probably |
| wait for a clear use case. Currently, the parser is never reused |
| for different documents in the main codes, only in specific tests. |
| |
| |
| From Gavin on the preamble_before_beginning implementation: |
| Another way might be to add special input code to trim off and return |
| a file prelude. This would moves the handling of this from the "parser" code |
| to the "input" code. This would avoid the problematic "pushing back" of input |
| and would be a clean way of doing this. It would isolate the handling of |
| the "\input" line from the other parsing code. |
| |
| I understand that the main purpose of the preamble_before_beginning element |
| is not to lose information so that the original Texinfo file could be |
| regenerated. If that's the case, maybe the input code could return |
| all the text in this preamble as one long string - it wouldn't have to be |
| line by line. |
| |
| |
| See message/thread from Reißner Ernst: Feature request: api docs |
| https://lists.gnu.org/archive/html/bug-texinfo/2022-02/msg00000.html |
| |
| Right now VERBOSE is almost not used. |
| |
| Should we warn if output is on STDOUT and OUTPUT_ENCODING_NAME != MESSAGE_OUTPUT_ENCODING_NAME? |
| |
| Handle better @exdent in html? (there is a FIXME in the code) |
| |
| For plaintext, implement an effect of NO_TOP_NODE_OUTPUT |
| * if true, output some title, possibly based on titlepage |
| and do not output the Top node. |
| * if false, current output is ok |
| Default is false. |
| |
| In Plaintext, @quotation text could have the right margin narrowed to be more |
| in line with other output formats. |
| |
| |
| DocBook |
| ------- |
| |
| deftypevr, deftypecv: use type and not returnvalue for the type |
| |
| also informalfigure in @float |
| |
| also use informaltable or table, for multitable? |
| |
| Add an @abstract command or similar to Texinfo? |
| And put in DocBook <abstract>? Beware that DocBook abstract is quite |
| limited in term of content, only a title and paragraphs. Although block |
| commands can be in paragraphs in DocBook, it is not the case for Texinfo, |
| so it would be very limited. |
| |
| what about @titlefont in docbook? |
| |
| maybe use simpara instead of para. Indeed, simpara is for paragraph without |
| block element within, and it should be that way in generated output. |
| |
| * in docbook, when there is only one section <article> should be better |
| than book. Maybe the best way to do that would be passing the |
| information that there is only one section to the functions formatting |
| the page header and page footer. |
| |
| there is a mark= attribute for itemizedlist element for the initial mark |
| of each item but the standard "does not specify a set of appropriate keywords" |
| so it cannot be used. |
| |
| |
| Manual tests |
| ============ |
| |
| Some tests are interesting but are not in the test suite for various |
| reasons. It is not really expected to have much regressions with these |
| tests. They are shown here for information. It was up to date in |
| March 2024, it may drift away as tests files names or content change. |
| From tta/perl directory. |
| |
| |
| Tests in non utf8 locale |
| ------------------------ |
| |
| In practice these tests were tested in latin1. They are not |
| in the main test suite because a latin1 locale cannot be expected |
| to be present reliably. |
| |
| Tests with correct or acceptable results |
| **************************************** |
| |
| File not found error message with accented characters in its name: |
| ./texi2any.pl not_éxisting.texi |
| |
| t/formats_encodings.t manual_simple_utf8_with_error |
| utf8 manual with errors involving non ascii strings |
| ./texi2any.pl ./t/input_files/manual_simple_utf8_with_error.texi |
| |
| t/formats_encodings.t manual_simple_latin1_with_error |
| latin1 manual with errors involving non ascii strings |
| ./texi2any.pl ./t/input_files/manual_simple_latin1_with_error.texi |
| |
| tests/formatting cpp_lines |
| CPP directive with non ascii characters, utf8 manual |
| ./texi2any.pl -I ./t/include/ ./t/input_files/cpp_lines.texi |
| accentêd:7: warning: là ng is not a valid language code |
| The file is UTF-8 encoded, the @documentencoding is obeyed which leads |
| in the Parser, to an UTF-8 encoding of include file name, and not to the latin1 |
| encoding which should be used for the output messages encoding. |
| This output is by (Gavin) design. |
| |
| many_input_files/output_dir_file_non_ascii.sh |
| non ascii output directory, utf8 manual |
| ./texi2any.pl -o encodé/ ./t/input_files/simplest.texi |
| |
| test of non ascii included file name in utf8 locale is already in formatting: |
| formatting/osé_utf8.texi:@include included_akçentêd.texi |
| ./texi2any.pl --force -I ../tests/ ../tests/input/non_ascii/os*_utf8.texi |
| The file name is utf-8 encoded in messages, which is expected as we do not |
| decode/encode file names from the command line for messages |
| osé_utf8.texi:15: warning: undefined flag: vùr |
| |
| t/80include.t cpp_line_latin1 |
| CPP directive with non ascii characters, latin1 manual |
| ./texi2any.pl --force ./t/input_files/cpp_line_latin1.texi |
| |
| Need to have recoded file name to latin1 OK, see ../tests/README |
| tests/encoded manual_include_accented_file_name_latin1 |
| ./texi2any.pl --force -I ../tests/built_input/ ../tests/encoded/manual_include_accented_file_name_latin1.texi |
| |
| latin1 encoded and latex2html in latin1 locale |
| ./texi2any.pl --html --init ext/latex2html.pm ../tests/tex_html/tex_encode_latin1.texi |
| |
| latin1 encoded and tex4ht in latin1 locale |
| ./texi2any.pl --html --init ext/tex4ht.pm ../tests/tex_html/tex_encode_latin1.texi |
| |
| cp -p ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi |
| ./texi2any.pl --html --init ext/tex4ht.pm tex_encodé_latin1.texi |
| Firefox can't find tex_encod%uFFFD_latin1_html/Chapter.html (?) |
| Opened from within the directory, works well. |
| |
| epub for utf8 encoded manual in latin1 locale |
| ./texi2any.pl --force -I ../tests/ --init ext/epub3.pm ../tests/input/non_ascii/os*_utf8.texi |
| |
| epub for latin1 encoded manual in latin1 locale |
| cp ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi |
| ./texi2any.pl --init ext/epub3.pm tex_encodé_latin1.texi |
| |
| ./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8.texi |
| output file name is in latin1, but the encoding inside is utf8 consistent |
| with the document encoding. |
| |
| ./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8_no_setfilename.texi |
| output file name is utf8 because the utf8 encoded input file name |
| is decoded using the locale latin1 encoding keeping the 8bit characters |
| from the utf8 encoding, and the encoding inside is utf8 |
| consistent with the document encoding. |
| |
| ./texi2any.pl --force -I ../tests/ -o encodé/raw.txt -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8.texi |
| encodé/raw.txt file name encoded in latin1, and the encoding inside is utf8 |
| consistent with the document encoding. |
| |
| ./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext -c 'SUBDIR=subdîr' ../tests/input/non_ascii/os*_utf8.texi |
| subdîr/osé_utf8.txt file name encoded in latin1, and the encoding inside is utf8 |
| consistent with the document encoding. |
| |
| ./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o résultat/encodé.txt ./t/input_files/simplest_no_node_section.texi |
| résultat/encodé.txt file name encoded in latin1. |
| |
| ./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o char_latin1_latin1_in_refs_tree.txt ./t/input_files/char_latin1_latin1_in_refs.texi |
| char_latin1_latin1_in_refs_tree.txt content encoded in latin1 |
| |
| utf8 encoded manual name and latex2html in latin1 locale |
| ./texi2any.pl --verbose -c 'COMMAND_LINE_ENCODING=utf-8' --html --init ext/latex2html.pm -c 'L2H_CLEAN 0' ../tests/input/non_ascii/tex_encod*_utf8.texi |
| COMMAND_LINE_ENCODING=utf-8 is required in order to have the |
| input file name correctly decoded as document_name which is used |
| in init file to set the file names. |
| |
| latin1 encoded manual name and latex2html in latin1 locale |
| cp ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi |
| ./texi2any.pl -c 'L2H_CLEAN 0' --html --init ext/latex2html.pm tex_encodé_latin1.texi |
| |
| Tests with incorrect results, though not bugs |
| ********************************************* |
| |
| utf8 encoded manual name and latex2html in latin1 locale |
| ./texi2any.pl --html --init ext/latex2html.pm -c 'L2H_CLEAN 0' ../tests/input/non_ascii/tex_encod*_utf8.texi |
| No error, but the file names are like |
| tex_encodé_utf8_html/tex_encodÃ'$'\203''©_utf8_l2h.html |
| That's in particular because the document_name is incorrect because it is |
| decoded as if it was latin1. |
| |
| utf8 encoded manual name and tex4ht in latin1 locale |
| ./texi2any.pl --html --init ext/tex4ht.pm ../tests/input/non_ascii/tex_encod*_utf8.texi |
| html file generated by tex4ht with content="text/html; charset=iso-8859-1">, |
| with character encoded in utf8 <img src="tex_encodé_utf8_tex4ht_tex0x.png" ...> |
| |
| |
| Tests in utf8 locales |
| --------------------- |
| |
| The archive epub file is not tested in the automated tests. |
| |
| epub for utf8 encoded manual in utf8 locale |
| ./texi2any.pl --force -I ../tests/ --init ext/epub3.pm ../tests/input/non_ascii/osé_utf8.texi |
| |
| The following tests require latin1 encoded file names. Note that it |
| could be done automatically now with |
| tta/maintain/copy_change_file_name_encoding.pl. |
| However, there is already a test with an include file in latin1, it |
| is enough. |
| Create the latin1 encoded from a latin1 console: |
| cp ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi |
| Run from an UTF-8 locale console. The resulting file has a ? in the name |
| but result is otherwise ok. |
| ./texi2any.pl tex_encod*_latin1.texi |
| |
| The following tests not important enough to have regression test |
| ./texi2any.pl --force -I ../tests/ -o encodé/raw.txt -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8.texi |
| ./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext -c 'SUBDIR=subdîr' ../tests/input/non_ascii/os*_utf8.texi |
| |
| Test more interesting in non utf8 locale |
| ./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o résultat/encodé.txt ./t/input_files/simplest_no_node_section.texi |
| résultat/encodé.txt file name encoded in utf8 |
| |
| ./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o char_latin1_latin1_in_refs_tree.txt ./t/input_files/char_latin1_latin1_in_refs.texi |
| char_latin1_latin1_in_refs_tree.txt content encoded in latin1 |
| |
| |
| Notes on classes names in HTML |
| ============================== |
| |
| In january 2022 the classes in HTML elements were normalized. There are no |
| rules, but here is descriptions of the choices made at that time in case one |
| want to use the same conventions. The objective was to have the link between |
| @-commands and classes easy to understand, avoid ambiguities, and have ways to |
| style most of the output. |
| |
| The class names without hyphen were only used for @-commands, with one |
| class attribute on an element maximum for each @-command appearing in the |
| Texinfo source. It was also attempted to have such a class for all |
| the @-commands with an effect on output, though the coverage was not perfect, |
| sometime it is not easy to select an element that would correspond to the |
| most logical association with the @-command (case of @*ref @-commands with |
| both a <cite> and a <a href> for example). |
| |
| Class names <command>-* with <command> a Texinfo @-command name were |
| only used for classes marking elements within an @-command but in other |
| elements that the main element for that @-command, in general sub elements. |
| For example, a @flushright lead to a <div class="flushright"> where the |
| @flushright command is and to <p class="flushright-paragraph"> for the |
| paragraphs within the @flushright. |
| |
| Class names *-<command> with <command> a Texinfo @-command name were |
| reserved for uses related to @-command <command>. For example |
| classes like summary-letter-printindex, cp-entries-printindex or |
| cp-letters-header-printindex for the different parts of the @printindex |
| formatting. |
| |
| def- and -def are used for classes related to @def*, in general without |
| the specific command name used. |
| |
| For the classes not associated with @-commands, the names were selected to |
| correspond to the role in the document rather than to the formatting style. |
| |
| |
| In HTML, some @-commands do not have an element with a class associated, or the |
| association is not perfect. There is @author in @quotation, @-command affected |
| by @definfoenclose. @pxref and similar @-commands have no class for references |
| to external nodes, and don't have the 'See ' in the element for references to |
| internal nodes. In general, it is because gdt() is used instead of direct |
| HTML. |
| |
| |
| Notes on protection of punctuation in nodes (done) |
| ================================================== |
| |
| This is implemented, in tta/perl/Texinfo/Transformations.pm in _new_node for |
| Texinfo generation, and in Info with INFO_SPECIAL_CHARS_QUOTE. *[nN]ote |
| is not protected, though, but it is not clear it would be right to do. |
| There is a warning with @strong{note...}. |
| |
| Automatic generation of node names from section names. To be protected: |
| * in every case |
| ( at the beginning |
| * In @node line |
| commas |
| * In menu entry |
| * if there is a label |
| tab comma dot |
| * if there is no label |
| : |
| * In @ref |
| commas |
| |
| In Info |
| |
| in cross-references. First : is searched. if followed by a : the node |
| name is found and there is no label. When parsing a node a filename |
| with ( is searched for. Nested parentheses are taken into account. |
| |
| Nodes: |
| * in every case |
| ( at the beginning |
| * in Node line |
| commas |
| * In menu entry and *Note |
| * if there is a label |
| tab comma dot |
| * if there is no label |
| : |
| |
| Labels in Info (not index entries, in index entries the last : not in |
| a uoted node should be used to determine the end of the |
| index entry). |
| : |
| |
| * at the beginning of a line in a @menu |
| *note more or less everywhere |
| |
| |
| Interrogations and remarks |
| ========================== |
| |
| For converters in C, agreed with Gavin that it is better not to |
| translate a perl tree in input, but access directly the C tree that |
| was setup by the XS parser. |
| |
| There is no forward looking code anymore, so maybe a lex/yacc parser |
| could be used for the main loop. More simply, a binary tokenizer, at |
| least, could make for a notable speedup. |
| |
| From vincent Belaïche. About svg image files in HTML: |
| |
| I don't think that supporting svg would be easy: its seems that to embed an |
| svg picture you need to declare the width x height of the frame in |
| which you embed it, and this information cannot be derived quite |
| straightforwardly from the picture. |
| With @image you can declare width and height but this is intended for |
| scaling. I am not sure whether or not that these arguments can be used |
| for the purpose of defining that frame... |
| What I did in 5x5 is that coded the height of the frame directly in |
| the macro @FIGURE with which I embed the figure, without going through |
| an argument. |
| The @FIGURE @macro is, for html: |
| @macro FIGURE {F,W} |
| @html |
| <div align="center"> |
| <embed src="5x5_\F\.svg" height="276" |
| type="image/svg+xml" |
| pluginspage="http://www.adobe.com/svg/viewer/install/" /></div> |
| @end html |
| @end macro |
| |
| |
| In general, external information for cross-references to other manuals from |
| htmlxref.cnf or htmlxref.d/*.cnf files should be used to determine |
| the split of a reference manual, and if no information is set, it |
| is a good idea to generate both a mono and a split manual. Therefore the |
| following situation is not something that needs to be supported/implemented, |
| however we keep the information on the javascript code here. |
| |
| If a manual is split and the person generating the manual wants that |
| references to a mono manual to be redirected to the split files, it should |
| be possible to create a manual.html file that redirects to the |
| manual_html/node.html files using the following javascript function: |
| |
| function redirect() { |
| switch (location.hash) { |
| case "#Node1": |
| location.replace("manual_html/Node1.html#Node1"); break; |
| case "#Node2" : |
| location.replace("manual_html/Node2.html#Node2"); break; |
| ... |
| default:; |
| } |
| } |
| |
| And, in the <body> tag of manual.html: |
| <body onLoad="redirect();"> |
| |
| |
| Need to make sure that a fix needed |
| ----------------------------------- |
| |
| In HTML, HEADERS is used. But not in other modules, especially not in |
| Plaintext.pm or Info.pm, this is determined by the module used (Plaintext.pm |
| or Info.pm). No idea whether it is right or wrong. |
| |
| def/end_of_lines_protected_in_footnote.pl the footnote is |
| (1) -- category: deffn_name arguments arg2 more args with end of line |
| and not |
| (1) |
| -- category: deffn_name arguments arg2 more args with end of line |
| It happens this way because the paragraph starts right after the footnote |
| number. |
| |
| in HTML, the argument of a quotation is ignored if the quotation is empty, |
| as in |
| @quotation thing |
| @end quotation |
| Is it really a bug? |
| |
| In @copying things like some raw formats may be expanded. However it is |
| not clear that it should be the same than in the main converter. Maybe a |
| specific list of formats could be passed to Texinfo::Convert::Text::convert, |
| which would be different (for example Info and Plaintext even if converting |
| HTML). Not clear that it is a good idea. Also this requires a test, to begin |
| with. |
| |
| Punctuation and spaces before @image do not lead to a doubling of space. |
| In fact @image is completly formatted outside of usual formatting containers. |
| Not sure what should be the right way? |
| test in info_test/image_and_punctuation |
| |
| in info_tests/error_in_footnote there is an error message for each |
| listoffloats; Line numbers are right, though, so maybe this is not |
| an issue. |
| |
| converters_tests/things_before_setfilename there is no error |
| for anchor and footnote before setfilename. It is not clear that |
| there should be, though. |
| |
| In Info, image special directive on sectioning command line length |
| is taken into account for the underlying characters line count inserted |
| below the section title. There is no reason to underline the image |
| special directive. Since the image rendering and length of replacement |
| text depends on the Info viewer, however, there is no way to know in |
| advance the lenght of text to underline (if any). It is therefore unclear |
| what would be the correct underlying characters count. |
| An example in formats_encodings/at_commands_in_refs. |
| |
| When using Perl modules, many strings in debugging output are internal |
| Perl strings not encoded before being output, leading to |
| 'Wide character in print' messages (in C those strings are always encoded |
| in UTF-8). Not clear that it is an issue. For example with |
| export TEXINFO_XS=omit |
| /usr/bin/perl -w ./../perl/texi2any.pl --force --conf-dir ./../perl/t/init/ --conf-dir ./../perl/init --conf-dir ./../perl/ext -I ./coverage/ -I coverage// -I ./ -I . -I built_input --error-limit=1000 -c TEST=1 --output coverage//out_parser/formatting_macro_expand/ --macro-expand=coverage//out_parser/formatting_macro_expand/formatting.texi -c TEXINFO_OUTPUT_FORMAT=structure ./coverage//formatting.texi --debug=1 2>t.err |
| |
| |
| HTML5 validation tidy errors that do not need fixing |
| ---------------------------------------------------- |
| |
| # to get only errors: |
| tidy -qe *.html |
| |
| Some can also be validation errors in other HTML versions. |
| |
| missing </a> before <a> |
| discarding unexpected </a> |
| nested <a> which happens for @url in @xref, which is valid Texinfo. |
| |
| Warning: <a> anchor "..." already defined |
| Should only happen with multiple insertcopying. |
| |
| Warning: trimming empty <code> |
| Normally happens only for invalid Texinfo, missing @def* name, empty |
| @def* line... |
| |
| <td> attribute "width" not allowed for HTML5 |
| <th> attribute "width" not allowed for HTML5 |
| These attributes are obsolete (though the elements are |
| still part of the language), and must not be used by authors. |
| The CSS replacement would be style="width: 40%". |
| However, width is kept as an attribute in texi2any @multitable output and not |
| as CSS because it is not style, but table or even line specific formatting. |
| If the _INLINE_STYLE_WIDTH undocumented option is set, CSS is used. |
| It is set for EPUB. |
| See |
| https://lists.gnu.org/archive/html/bug-texinfo/2024-09/msg00065.html |
| |
| |
| Specialized synopsis in DocBook |
| ------------------------------- |
| |
| Use of specialized synopsis in DocBook is not a priority and it is not even |
| obvious that it is interesting to do so. The following notes explain the |
| possibilities and issues extensively. |
| |
| Instead of synopsis it might seem to be relevant to use specialized synopsis, |
| funcsynopsis/funcprototype for deftype* and some def*, and other for object |
| oriented. There are many issues such that this possibility do not appear |
| appealing at all. |
| |
| 1) there is no possibility to have a category. So the category must be |
| added somewhere as a role= or in the *synopsisinfo, or this should only |
| be used for specialized @def, like @defun. |
| |
| 2) @defmethod and @deftypemethod cannot really be mapped to methodsynopsis |
| as the class name is not associated with the method as in Texinfo, but |
| instead the method should be in a class in docbook. |
| |
| 3) From the docbook reference for funcsynopsis |
| "For the most part, the processing application is expected to |
| generate all of the parentheses, semicolons, commas, and so on |
| required in the rendered synopsis. The exception to this rule is |
| that the spacing and other punctuation inside a parameter that is a |
| pointer to a function must be provided in the source markup." |
| |
| So this mean it is language specific (C, as said in the docbook doc) |
| and one have to remove the parentheses, semicolons, commas. |
| |
| See also the mails from Per Bothner bug-texinfo, Sun, 22 Jul 2012 01:45:54. |
| |
| specialized @def, without a need for category: |
| @defun and @deftypefun |
| <funcsynopsis><funcprototype><funcdef>TYPE <function>NAME</function><paramdef><parameter>args</parameter></paramdef></funcprototype></funcsynopsis> |
| |
| specialized @def, without a need for category, but without DocBook synopsis |
| because of missing class: |
| @defmethod, @deftypemethod: methodsynopsis cannot be used since the class |
| is not available |
| @defivar and @deftypeivar: fieldsynopsis cannot be used since the class |
| is not available |
| |
| Generic @def with a need for a category |
| For deffn deftypefn (and defmac?, defspec?), the possibilities of |
| funcsynopsis, with a category added could be used: |
| <funcsynopsis><funcprototype><funcdef role=...>TYPE <function>NAME</function></funcdef><paramdef>PARAMTYPE <parameter>PARAM</parameter></paramdef></funcprototype></funcsynopsis> |
| |
| Alternatively, use funcsynopsisinfo for the category. |
| |
| Generic @def with a need for a category, but without DocBook synopsis because |
| of missing class: |
| @defop and @deftypeop: methodsynopsis cannot be used since the class |
| is not available |
| defcv, deftypecv: fieldsynopsis cannot be used since the class |
| is not available |
| |
| Remaining @def without DocBook synopsis because there is no equivalent, |
| and requires a category |
| defvr (defvar, defopt), deftypevr (deftypevar) |
| deftp |
| |
| |
| Solaris 11 |
| ========== |
| |
| # recent Test::Deep requires perl 5.12 |
| cpan> o conf urllist push http://backpan.perl.org/ |
| cpan RJBS/Test-Deep-1.127.tar.gz |
| |
| Also possible to install Texinfo dependencies with openCSW, like |
| pkgutil -y -i CSWhelp2man CSWpm-data-compare CSWpm-test-deep |
| |
| The system perl may not be suitable to build XS modules, and the |
| system gawk may be too old, openCSW may be needed. For example: |
| ./configure PERL=/opt/csw/bin/perl GAWK=/opt/csw/bin/gawk CFLAGS='-g' |
| |
| ./configure PERL=/opt/csw/bin/perl GAWK=/opt/csw/bin/gawk CFLAGS='-g' LDFLAGS=-L/opt/csw/lib/ CPPFLAGS='-I/opt/csw/include/' PERL_EXT_LDFLAGS=-L/opt/csw/lib/ LIBS=-liconv |
| |
| |
| Misc notes |
| ========== |
| |
| # differences between dist in and out source after maintainer-clean |
| rm -f texinfo-7.*.*.tar.gz |
| ./autogen.sh |
| ./configure |
| make maintainer-clean |
| ./autogen.sh |
| ./configure |
| make dist |
| rm -rf in_source_dist_contents |
| mkdir in_source_dist_contents |
| (cd in_source_dist_contents && tar xvf ../texinfo-7.*.*.tar.gz) |
| make maintainer-clean |
| rm -rf bb |
| mkdir bb |
| cd bb |
| ../configure |
| make dist |
| tar xvf texinfo-7.*.*.tar.gz |
| diff -u -r ../in_source_dist_contents/texinfo-7.*.*/ texinfo-7.*.*/ > ../build_in_out_source_differences.diff |
| |
| Test validity of Texinfo XML or docbook |
| export XML_CATALOG_FILES=~/src/texinfo/tta/maintain/catalog.xml |
| xmllint --nonet --noout --valid commands.xml |
| |
| tidy does not seems to be released and/or maintained anymore. It incorrectly |
| emits an error for ol type attribute. |
| tidy -qe *.html |
| |
| profiling: package on debian: |
| libdevel-nytprof-perl |
| In doc: |
| perl -d:NYTProf ../tta/perl/texi2any.pl texinfo.texi --html |
| perl -d:NYTProf ../tta/perl/texi2any.pl texinfo.texi |
| nytprofhtml |
| # firefox nytprof/index.html |
| |
| Test with 8bit locale: |
| export LANG=fr_FR; export LANGUAGE=fr_FR; export LC_ALL=fr_FR |
| xterm & |
| |
| Turkish locale, interesting as ASCII upper-case letter I can become |
| a (non-ASCII) dotless i when lower casing. (Eli recommendation). |
| export LANG=tr_TR.UTF-8; export LANGUAGE=tr_TR.UTF-8; export LC_ALL=tr_TR.UTF-8 |
| |
| On ExtUtils::Embed flags (the documentation tends to be inaccurate): |
| ldopts: ccdlflags ldflags "libperl and perllibs through MakeMaker + static_ext" |
| ccopts: ccflags perl_inc |
| |
| convert to pdf from docbook |
| xsltproc -o intermediate-fo-file.fo /usr/share/xml/docbook/stylesheet/docbook-xsl/fo/docbook.xsl texinfo.xml |
| fop -r -pdf texinfo-dbk.pdf -fo intermediate-fo-file.fo |
| |
| dblatex -o texinfo-dblatex.pdf texinfo.xml |
| |
| Open a specific info file in Emacs Info reader: C-u C-h i |
| |
| In tta/tests/, generate Texinfo file for Texinfo TeX coverage |
| ../perl/texi2any.pl --force --error=100000 -c TEXINFO_OUTPUT_FORMAT=plaintexinfo -D valid layout/formatting.texi > formatting_valid.texi |
| |
| From doc/ |
| texi2pdf -I ../tta/tests/layout/ ../tta/tests/formatting_valid.texi |
| |
| To generate valgrind .supp rules: --gen-suppressions=all --log-file=gen_supp_rules.log |
| |
| mkdir -p val_res |
| PERL_DESTRUCT_LEVEL=2 |
| export PERL_DESTRUCT_LEVEL |
| for file in t/*.t ; do bfile=`basename $file .t`; echo $bfile; valgrind --suppressions=../texi2any.supp -q perl -w $file > val_res/$bfile.out 2>&1 ; done |
| |
| With memory leaks |
| for file in t/*.t ; do bfile=`basename $file .t`; echo $bfile; valgrind --suppressions=../texi2any.supp -q --leak-check=full perl -w $file > val_res/$bfile.out 2>&1 ; done |
| for file in t/*.t ; do bfile=`basename $file .t`; echo $bfile; valgrind -q --leak-check=full perl -w $file > val_res/$bfile.out 2>&1 ; done |
| (for file in t/z_misc/*.t ; do bfile=`basename $file .t`; echo z_misc/$bfile; valgrind -q --leak-check=full perl -w $file ; done) > val_res/z_misc.out 2>&1 |
| |
| For tests in tta/tests, a way to have valgrind call prependend is to add, |
| in tta/defs: |
| prepended_command='valgrind --leak-check=full -q' |
| prepended_command='valgrind --leak-check=full -q --suppressions=../texi2any.supp' |
| |
| Before the code reorganization and the separation of linked against Gnulib |
| and code linked against Perl, in some cases, memory that was not |
| released/freed, but that should still be accessible at the end of conversion |
| was shown by valgrind as being leaked, typically static/global |
| variables freed upon reuse, or left unreleased on purpose (parser conf, for |
| example) and some symbols were shown as ???. It is unclear what the problem |
| was, the documentation of valgrind hints that the memory appearing as leaked |
| could have come a dlclosed object. In that case adding --keep-debuginfo=yes |
| showed the missing symbols as seen in the valgrind documentation. |
| |
| rm -rf t/check_debug_differences/ |
| mkdir t/check_debug_differences/ |
| for file in t/*.t ; do bfile=`basename $file .t`; perl -w $file -d 1 2>t/check_debug_differences/XS_$bfile.err ; done |
| export TEXINFO_XS_PARSER=0 |
| for file in t/*.t ; do bfile=`basename $file .t`; perl -w $file -d 1 2>t/check_debug_differences/PL_$bfile.err ; done |
| for file in t/*.t ; do bfile=`basename $file .t`; sed 's/^XS|//' t/check_debug_differences/XS_$bfile.err | diff -u t/check_debug_differences/PL_$bfile.err - > t/check_debug_differences/debug_$bfile.diff; done |
| |
| Full check, including XS conversion and memory leaks in debug: |
| PERL_DESTRUCT_LEVEL=2 |
| export PERL_DESTRUCT_LEVEL |
| for file in t/*.t ; do bfile=`basename $file .t`; valgrind -q --leak-check=full --keep-debuginfo=yes perl -w $file -d 1 2>t/check_debug_differences/XS_$bfile.err ; done |
| |
| Check of XS interface to Perl |
| export TEXINFO_XS_EXTERNAL_FORMATTING=1 |
| export TEXINFO_XS_EXTERNAL_CONVERSION=1 |
| |
| |
| Analysing memory use: |
| valgrinf massif useful-heap approximate distribution in 2024 (obsolete) |
| mkdir -p massif |
| valgrind --tool=massif --massif-out-file=massif/massif_info.out perl -w texi2any.pl ../doc/texinfo.texi |
| ms_print massif/massif_info.out > massif/ms_print_info.out |
| 16M Perl |
| 36M C tree |
| 50M Perl tree (visible in detailed use, but difficult to do the |
| imputation right, some may correspond with other uses |
| of Perl memory) |
| 5M (approximate, not visible in the detailed use, based on difference |
| in use over time) conversion |
| |
| With full XS (7.2 64M, with text separate 58.5M, without info_info 56M |
| with integer extra keys 54M, with source marks as pointers 52.3M) |
| mkdir -p massif |
| valgrind --tool=massif --massif-out-file=massif/massif_html.out perl -w texi2any.pl --html ../../doc/texinfo.texi |
| ms_print massif/massif_html.out > massif/ms_print_html.out |
| useful-heap |
| 25M = 13.1 + 5.8 + 2.9 + 2.5 + 0.7 Perl |
| 17.8M Tree |
| 6 + 5 = 11M new_element |
| 3.5M reallocate_list |
| 0.5M get_associated_info_key (below threshold in later reports) |
| 2.8M = 0.8 + 0.7 +1.3 text |
| 5.2M = 3.8 (text) + 0.7 (text printindex) + 0.7: conversion, |
| mainly text in convert_output_output_unit* |
| (+1.3M by approximate difference with total) |
| (7.5 + 1.3) - (3.8 + 0.7 + 0.7 + 0.8 +1.3) = 1.5 M Text not imputed |
| 3. - 0.5 = 2.5M remaining not imputed (- get_associated_info_key) |
| 52M TOTAL (for 52.3M reported) |
| |
| |
| Using callgrind to find the time used by functions |
| |
| valgrind --tool=callgrind perl -w texi2any.pl ../../doc/texinfo.texi --html |
| # to avoid cycles (some remain in Perl only code) that mess up the graph: |
| valgrind --tool=callgrind --separate-callers=3 --separate-recs=10 perl -w texi2any.pl ../../doc/texinfo.texi --html |
| valgrind --tool=callgrind --separate-callers=4 --separate-recs=11 perl -w texi2any.pl ../../doc/texinfo.texi |
| kcachegrind callgrind.out.XXXXXX |
| |
| This is obsolete with output overriding, although the distribution changed |
| very little after reattributing the shares. |
| For the Texinfo manual with full XS, in 2024, Perl uses 22% of the time |
| (for html), now only for code hopefully called once. The switch to |
| global locales for setlocale calling that is needed for Perl takes also 4%. |
| Calling Perl getSortKey uses about 28% (more on sorting and C below). |
| Decomposition of the time used for the Texinfo manual with full XS |
| (in percent): |
| parser: 11.5 |
| index sorting: 30 |
| main conversion to HTML: 24.8 = 54.8 - 30 |
| |
| node redirections: 2.6 |
| prepare conversion units: 2.3 |
| remove document: 1.8 |
| associate internal references: 0.53 |
| prepare unit directions: 0.41 |
| setup indices sort strings: 0.36 |
| reset converter: 0.23 |
| structuring transformation1: 0.19 |
| structuring transformation2: 0.19 |
| remaining Texinfo XS code: 0.35 |
| = 8.95 |
| Perl: 22.5 = 7 + 15.2 + (75.57 - 54.8 - 11.5 - 8.95) |
| SUM: 98 |
| |
| |
| Setting flags |
| # some features are only enabled at -O2, but sometime -O0 is better |
| # for debugging with valgrind |
| our_CFLAGS='-g -O0 -Wformat-security -Wstrict-prototypes -Wall -Wno-parentheses -Wno-unused-parameter -Wextra' |
| # keep cpp expanded files |
| # -save-temps |
| # Without -Wstack-protector there is no message on functions not protected. |
| # All these are in gnulib or gettext for now. |
| # -fno-omit-frame-pointer is better for debugging with valgrind, but has |
| # some speed penalty |
| our_CFLAGS='-g -O2 -D_FORTIFY_SOURCE=2 -Wformat-security -Wstrict-prototypes -Wall -Wno-parentheses -Wno-missing-braces -Wno-unused-parameter -fstack-protector-all -Wextra -fno-omit-frame-pointer' |
| our_CFLAGS='-g -O2 -Wformat-security -Wstrict-prototypes -Wall -Wno-parentheses -Wno-missing-braces -Wno-unused-parameter -Wextra' |
| ./configure --enable-additional-checks "CFLAGS=$our_CFLAGS" "PERL_EXT_CFLAGS=$our_CFLAGS" |
| ./configure --enable-c-texi2any --enable-additional-checks "CFLAGS=$our_CFLAGS" "PERL_EXT_CFLAGS=$our_CFLAGS" |
| unset our_CFLAGS |
| |