| This is the todo list for texi2any |
| |
| Copyright 2012-2026 Free Software Foundation. |
| |
| Copying and distribution of this file, with or without modification, |
| are permitted in any medium without royalty provided the copyright |
| notice and this notice are preserved. |
| |
| |
| Before next release |
| =================== |
| |
| Document command_is_in_referred_command_stack in texi2any_api |
| |
| verify input_files/cpp_lines parse_refold.pl |
| |
| Bugs |
| ==== |
| |
| HTML API |
| ======== |
| |
| Issues |
| ------ |
| |
| Some private function used in conversion |
| _convert_printindex_command |
| _new_document_context |
| _convert_def_line_type |
| _set_code_context |
| _pop_code_context |
| |
| |
| Missing documentation |
| ===================== |
| |
| Document the "language" used for the tta/perl/t/*.t tests results for the tree, |
| node and sectioning relations, output units, floats, indices and error |
| messages? |
| |
| Tree documentation in ParserNonXS.pm |
| ------------------------------------ |
| |
| Change strategy of documentation, document extra and info as arguments |
| as in the SWIG interface? |
| |
| Document the tree by showing the relations of inclusion? |
| |
| elided_rawpreformatted, elided_brace_command_arg types. |
| linemacro_arg |
| |
| alias_of in info hash |
| |
| source marks. |
| |
| special_unit_element type (only in HTML code) |
| |
| Node, headings and section relations are explicitly and correctly documented in |
| the HTML customization manual. However, they are also mentioned in |
| Texinfo::Structuring POD without being clearly introduced in this POD. See |
| also FIXMEs in ParserNonXS.pm in =begin comment blocks. It is not clear |
| whether there need to be something more explicit in the POD documentation, |
| what is in the HTML customization manual may be sufficient. |
| |
| extra node_number, heading_number and section_number are not documented |
| anywhere. |
| |
| Other |
| ----- |
| |
| Document *XS_EXTERNAL_FORMATTING *XS_EXTERNAL_CONVERSION? |
| |
| No documentation of Texinfo::Options hashes used in converters: |
| multiple_at_command_options, converter_cmdline_options, |
| converter_customization_options, unique_at_command_options. |
| |
| |
| Delayed bugs/features |
| ===================== |
| |
| According to libtool documentation and Bruno experience, on Windows, |
| variable objects exported/imported from dll need to be declared, at |
| least when they are accessed from one library, which needs not happen |
| when they are const: |
| https://mail.gnu.org/archive/html/bug-texinfo/2026-02/msg00139.html |
| https://mail.gnu.org/archive/html/bug-texinfo/2026-02/msg00150.html |
| Some variables that could be in that situaton, although they did not |
| make cygwin in github CI nor mingw Eli tests fail. They are all set in |
| libtexinfo. |
| The list: |
| C/main/customization_options.h: txi_base_options, txi_base_sorted_options, |
| txi_options_command_map => libtexinfo-convert (converter.c, format_html.c...) |
| C/main/document.h: txi_paths_info => libtexinfo-convert (convert_html.c...), |
| libtexinfo-main (texinfo.c) |
| C/main/translations.h: translation_cache |
| => libtexinfo-convert (converter.c, convert_to_text.c) |
| C/main/unicode.h: unicode_character_brace_no_arg_commands |
| => libtexinfo-convert (html_prepare_converter.c) |
| C/main/utils.h: output_conversions, input_conversions |
| => libtexinfo-convert (html_prepare_converter.c...), |
| libtexinfo-main (swig_interface.c) |
| C/main/utils.h: null_device_names |
| => texi2any.c |
| |
| On Windows, the DLL are not found by texi2any.exe. Accoring to Eli, the |
| easiest way to fix could be be to copy the DLLs into the directory where |
| texi2any.exe is installed (in addition to the directory where they are already |
| copied by "make install". That is, on Windows there will be two copies of each |
| DLL installed, not one. |
| https://lists.gnu.org/archive/html/bug-texinfo/2026-02/msg00090.html |
| |
| Do not add parent in tree in the first place instead of removing afterwards, |
| in gdt? Or, possibly even better mark parent as weak reference with |
| Scalar::Util weaken as explained in man perlref (Gavin). |
| in perlapi for XS: sv_rvweaken |
| |
| inline contents_child_by_index in C? |
| |
| The Perl/C interface when no XS code is used for structuring works because |
| all the C data is built as Perl structure at one point after which C code |
| is not called anymore to modify the tree or other Document data. At that |
| point the F_DOCM_* flags should remain unset as there is no modification in |
| C data needing to be built to Perl. Therefore all the calls to XS interfaces |
| afterwards, mainly on a Document, returns the Perl data. This is not |
| fundamentaly problematic, as it works. However, the design of the Document XS |
| interface was that the Perl structure could be built on demand, and in that |
| case, it is not possible. |
| |
| When the structure code is done in C and conversion is done in Perl, there |
| is no need to rebuild all the document data to Perl. This is probably |
| because there is no Perl code modifying the document data without also |
| modifying the underlying C code, ie no case of missing XS interface in |
| code run in tests at that point, before conversion. But it is probably |
| more by chance than by design, if code changing the structure of the tree |
| that lack XS to change the C data was called at that point, the C and Perl |
| data would diverge. |
| |
| There is disagreement on the strategy to use for libraries versioning. |
| For Gavin, the code that can link against libraries should be the code |
| with the exact same version than the libraries. |
| For Patrice it is possible to explicitly say that mixing versions is |
| not supported, but still follow the libtool versioning scheme such that |
| if better and compatible libraries are installed they will be prefered |
| to same version library. |
| |
| Could libraries be compiled with Perl C compiler or other C compiler |
| and be mixed at the library level? Ask on the list. |
| |
| Gavin on using CSV files and a Perl script to generate code for Perl and C. |
| For something like html_style_commands_element.csv, it could potentially |
| make the code more impenetrable. Suppose for example someone wants to |
| find which part of the code handles the @sansserif command. If they |
| search HTML.pm for 'sansserif' that string isn't there. As this file |
| is shorter, there is less benefit to avoiding duplicating its contents. |
| However, the purpose, structure and meaning of this file is quite clear. |
| (Files such as HTML.pm are also not self-contained, accessing information |
| in files such as Commands.pm, so having another file to access does not |
| really change the situation.) |
| |
| For shorter files like default_special_unit_info.csv and |
| default_direction_strings.csv it is less clear that it offers a net |
| benefit. To be honest, the function of these files is not particularly |
| clear to me other than one of them is something to do with node direction |
| pointers. I don't someone looking at these files for the first time |
| would have an easy time figuring out what they are for. |
| |
| |
| Make building "source marks" optional? |
| |
| |
| hyphenation: should only appear in toplevel. |
| |
| |
| Some dubious nesting could be warned against. The parsers context |
| command stacks could be used for that. |
| |
| Some erroneous constructs not already warned against: |
| |
| @table in @menu |
| |
| @example |
| @heading A heading |
| @end example |
| Example in heading/heading_in_example. |
| |
| @group outside of @example (maybe there is no need for command stack for |
| this one if @group can only appear directly |
| in @example). |
| |
| There is no warning with a block command between @def* and @def*x, |
| only for paragraph. Not sure that this can be detected with |
| the context command stack. |
| |
| @defun a b c d e f |
| |
| @itemize @minus |
| truc |
| @item t |
| @end itemize |
| |
| @defunx g h j k l m |
| |
| @end defun |
| |
| |
| Modules included in tta/maintain/lib/ are stable, but still need |
| to be updated from time to time. |
| |
| Unicode::EastAsianWidth \p{InFullwidth} could be replaced |
| by native \p{East_Asian_Width=Fullwidth} + \p{East_Asian_Width=Wide} |
| when the oldest supported Perl version is 5.12.0 (released in 2010). |
| |
| |
| Transliteration/protection with iconv in C leads to a result different from Perl |
| for some characters. It seems that the iconv result depends on the locale, and |
| there are quite a bit of ? output, probably when there is no obvious |
| transliteration. In those cases, the Unidecode transliterations are not |
| necessarily very good, either. |
| |
| |
| Sorting indices in C with strxfrm_l using an utf-8 locale with |
| LC_COLLATE_MASK on Debian GNU/Linux with glibc is quite consistent with Perl |
| for number and letters, but leads to a different output than with Perl for non |
| alphanumeric characters. It is because in Perl we set |
| 'variable' => 'Non-Ignorable' to set Variable Weighting to Non-ignorable (see |
| http://www.unicode.org/reports/tr10/#Variable_Weighting). |
| For spaces, the output with Non-Ignorable Variable Weighting looks better for |
| index sorting, as it allows to have spaces and punctuation marks sort before |
| letters. Right now, the XS code calls Perl to get the sorting |
| collation strings with Non-Ignorable Variable Weighting. The |
| undocumented XS_STRXFRM_COLLATION_LOCALE customization variable can be used |
| to specify a locale and use it with strxfrm_l to sort, but it is only |
| for testing and should not be kept in the long term, the plan is to replace by |
| C code that sets Variable Weighting to Non-ignorable and before that keep |
| calling Perl. |
| Related glibc enhancement request: |
| request for Non-Ignorable Variable Weighting Unicode collation |
| https://sourceware.org/bugzilla/show_bug.cgi?id=31658 |
| |
| |
| Missing tests |
| ============= |
| |
| There is a test of translation in parser in a translation in converter, in |
| |
| tta/perl/t/init_files_tests.t translation_in_parser_in_translation |
| |
| It would be nice to also have a translation in parser in a translation |
| in parser. That would mean having a po/gmo file where the string |
| translated in the parser for @def* indices, for instance "{name} of {class}" |
| is translated to a string including @def* commands, like |
| @deftypeop a b c d e f |
| AA |
| @end deftypeop |
| |
| @documentlanguage fr |
| |
| @deftypemethod g h i j k l |
| BB |
| @end deftypemethod |
| |
| |
| Unit test of end_line_count for Texinfo/Convert/Paragraph.pm .... containers. |
| |
| anchor in flushright, on an empty line, with a current byte offset. |
| |
| |
| Future features |
| =============== |
| |
| Add the possibility to add text to a parsed document by restarting |
| parsing, when called as parse_texi_piece or parse_texi_line, by |
| storing the parser document state not already in document in document. |
| There would be a customization variable to set the parser to be |
| restartable, and then parse_texi_piece and parse_texi_line could pass |
| a document to retrieve the parsing state. This should probably |
| wait for a clear use case. Currently, the parser is never reused |
| for different documents in the main codes, only in specific tests. |
| |
| |
| From Gavin on the preamble_before_beginning implementation: |
| Another way might be to add special input code to trim off and return |
| a file prelude. This would moves the handling of this from the "parser" code |
| to the "input" code. This would avoid the problematic "pushing back" of input |
| and would be a clean way of doing this. It would isolate the handling of |
| the "\input" line from the other parsing code. |
| |
| I understand that the main purpose of the preamble_before_beginning element |
| is not to lose information so that the original Texinfo file could be |
| regenerated. If that's the case, maybe the input code could return |
| all the text in this preamble as one long string - it wouldn't have to be |
| line by line. |
| |
| |
| See message/thread from Reißner Ernst: Feature request: api docs |
| https://lists.gnu.org/archive/html/bug-texinfo/2022-02/msg00000.html |
| |
| Right now VERBOSE is almost not used. |
| |
| Should we warn if output is on STDOUT and OUTPUT_ENCODING_NAME != MESSAGE_OUTPUT_ENCODING_NAME? |
| |
| Handle better @exdent in html? (there is a FIXME in the code) |
| |
| For plaintext, implement an effect of NO_TOP_NODE_OUTPUT |
| * if true, output some title, possibly based on titlepage |
| and do not output the Top node. |
| * if false, current output is ok |
| Default is false. |
| |
| In Plaintext, @quotation text could have the right margin narrowed to be more |
| in line with other output formats. |
| |
| |
| DocBook |
| ------- |
| |
| deftypevr, deftypecv: use type and not returnvalue for the type |
| |
| also informalfigure in @float |
| |
| also use informaltable or table, for multitable? |
| |
| Add an @abstract command or similar to Texinfo? |
| And put in DocBook <abstract>? Beware that DocBook abstract is quite |
| limited in term of content, only a title and paragraphs. Although block |
| commands can be in paragraphs in DocBook, it is not the case for Texinfo, |
| so it would be very limited. |
| |
| what about @titlefont in docbook? |
| |
| maybe use simpara instead of para. Indeed, simpara is for paragraph without |
| block element within, and it should be that way in generated output. |
| |
| * in docbook, when there is only one section <article> should be better |
| than book. Maybe the best way to do that would be passing the |
| information that there is only one section to the functions formatting |
| the page header and page footer. |
| |
| there is a mark= attribute for itemizedlist element for the initial mark |
| of each item but the standard "does not specify a set of appropriate keywords" |
| so it cannot be used. |
| |
| |
| Manual tests |
| ============ |
| |
| Some tests are interesting but are not in the test suite for various |
| reasons. It is not really expected to have much regressions with these |
| tests. They are shown here for information. It was up to date in |
| March 2024, it may drift away as tests files names or content change. |
| From tta/perl directory. |
| |
| |
| Tests in non utf8 locale |
| ------------------------ |
| |
| In practice these tests were tested in latin1. They are not |
| in the main test suite because a latin1 locale cannot be expected |
| to be present reliably. |
| |
| Tests with correct or acceptable results |
| **************************************** |
| |
| File not found error message with accented characters in its name: |
| ./texi2any.pl not_éxisting.texi |
| |
| t/formats_encodings.t manual_simple_utf8_with_error |
| utf8 manual with errors involving non ascii strings |
| ./texi2any.pl ./t/input_files/manual_simple_utf8_with_error.texi |
| |
| t/formats_encodings.t manual_simple_latin1_with_error |
| latin1 manual with errors involving non ascii strings |
| ./texi2any.pl ./t/input_files/manual_simple_latin1_with_error.texi |
| |
| tests/formatting cpp_lines |
| CPP directive with non ascii characters, utf8 manual |
| ./texi2any.pl -I ./t/include/ ./t/input_files/cpp_lines.texi |
| accentêd:7: warning: làng is not a valid language code |
| The file is UTF-8 encoded, the @documentencoding is obeyed which leads |
| in the Parser, to an UTF-8 encoding of include file name, and not to the latin1 |
| encoding which should be used for the output messages encoding. |
| This output is by (Gavin) design. |
| |
| many_input_files/output_dir_file_non_ascii.sh |
| non ascii output directory, utf8 manual |
| ./texi2any.pl -o encodé/ ./t/input_files/simplest.texi |
| |
| test of non ascii included file name in utf8 locale is already in formatting: |
| formatting/osé_utf8.texi:@include included_akçentêd.texi |
| ./texi2any.pl --force -I ../tests/ ../tests/input/non_ascii/os*_utf8.texi |
| The file name is utf-8 encoded in messages, which is expected as we do not |
| decode/encode file names from the command line for messages |
| osé_utf8.texi:15: warning: undefined flag: vùr |
| |
| t/80include.t cpp_line_latin1 |
| CPP directive with non ascii characters, latin1 manual |
| ./texi2any.pl --force ./t/input_files/cpp_line_latin1.texi |
| |
| Need to have recoded file name to latin1 OK, see ../tests/README |
| tests/encoded manual_include_accented_file_name_latin1 |
| ./texi2any.pl --force -I ../tests/built_input/ ../tests/encoded/manual_include_accented_file_name_latin1.texi |
| |
| latin1 encoded and latex2html in latin1 locale |
| ./texi2any.pl --html --init ext/latex2html.pm ../tests/tex_html/tex_encode_latin1.texi |
| |
| latin1 encoded and tex4ht in latin1 locale |
| ./texi2any.pl --html --init ext/tex4ht.pm ../tests/tex_html/tex_encode_latin1.texi |
| |
| cp -p ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi |
| ./texi2any.pl --html --init ext/tex4ht.pm tex_encodé_latin1.texi |
| Firefox can't find tex_encod%uFFFD_latin1_html/Chapter.html (?) |
| Opened from within the directory, works well. |
| |
| epub for utf8 encoded manual in latin1 locale |
| ./texi2any.pl --force -I ../tests/ --init ext/epub3.pm ../tests/input/non_ascii/os*_utf8.texi |
| |
| epub for latin1 encoded manual in latin1 locale |
| cp ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi |
| ./texi2any.pl --init ext/epub3.pm tex_encodé_latin1.texi |
| |
| ./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8.texi |
| output file name is in latin1, but the encoding inside is utf8 consistent |
| with the document encoding. |
| |
| ./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8_no_setfilename.texi |
| output file name is utf8 because the utf8 encoded input file name |
| is decoded using the locale latin1 encoding keeping the 8bit characters |
| from the utf8 encoding, and the encoding inside is utf8 |
| consistent with the document encoding. |
| |
| ./texi2any.pl --force -I ../tests/ -o encodé/raw.txt -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8.texi |
| encodé/raw.txt file name encoded in latin1, and the encoding inside is utf8 |
| consistent with the document encoding. |
| |
| ./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext -c 'SUBDIR=subdîr' ../tests/input/non_ascii/os*_utf8.texi |
| subdîr/osé_utf8.txt file name encoded in latin1, and the encoding inside is utf8 |
| consistent with the document encoding. |
| |
| ./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o résultat/encodé.txt ./t/input_files/simplest_no_node_section.texi |
| résultat/encodé.txt file name encoded in latin1. |
| |
| ./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o char_latin1_latin1_in_refs_tree.txt ./t/input_files/char_latin1_latin1_in_refs.texi |
| char_latin1_latin1_in_refs_tree.txt content encoded in latin1 |
| |
| utf8 encoded manual name and latex2html in latin1 locale |
| ./texi2any.pl --verbose -c 'COMMAND_LINE_ENCODING=utf-8' --html --init ext/latex2html.pm -c 'L2H_CLEAN 0' ../tests/input/non_ascii/tex_encod*_utf8.texi |
| COMMAND_LINE_ENCODING=utf-8 is required in order to have the |
| input file name correctly decoded as document_name which is used |
| in init file to set the file names. |
| |
| latin1 encoded manual name and latex2html in latin1 locale |
| cp ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi |
| ./texi2any.pl -c 'L2H_CLEAN 0' --html --init ext/latex2html.pm tex_encodé_latin1.texi |
| |
| Tests with incorrect results, though not bugs |
| ********************************************* |
| |
| utf8 encoded manual name and latex2html in latin1 locale |
| ./texi2any.pl --html --init ext/latex2html.pm -c 'L2H_CLEAN 0' ../tests/input/non_ascii/tex_encod*_utf8.texi |
| No error, but the file names are like |
| tex_encodé_utf8_html/tex_encodÃ'$'\203''©_utf8_l2h.html |
| That's in particular because the document_name is incorrect because it is |
| decoded as if it was latin1. |
| |
| utf8 encoded manual name and tex4ht in latin1 locale |
| ./texi2any.pl --html --init ext/tex4ht.pm ../tests/input/non_ascii/tex_encod*_utf8.texi |
| html file generated by tex4ht with content="text/html; charset=iso-8859-1">, |
| with character encoded in utf8 <img src="tex_encodé_utf8_tex4ht_tex0x.png" ...> |
| |
| |
| Tests in utf8 locales |
| --------------------- |
| |
| The archive epub file is not tested in the automated tests. |
| |
| epub for utf8 encoded manual in utf8 locale |
| ./texi2any.pl --force -I ../tests/ --init ext/epub3.pm ../tests/input/non_ascii/osé_utf8.texi |
| |
| The following tests require latin1 encoded file names. Note that it |
| could be done automatically now with |
| tta/maintain/copy_change_file_name_encoding.pl. |
| However, there is already a test with an include file in latin1, it |
| is enough. |
| Create the latin1 encoded from a latin1 console: |
| cp ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi |
| Run from an UTF-8 locale console. The resulting file has a ? in the name |
| but result is otherwise ok. |
| ./texi2any.pl tex_encod*_latin1.texi |
| |
| The following tests not important enough to have regression test |
| ./texi2any.pl --force -I ../tests/ -o encodé/raw.txt -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8.texi |
| ./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext -c 'SUBDIR=subdîr' ../tests/input/non_ascii/os*_utf8.texi |
| |
| Test more interesting in non utf8 locale |
| ./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o résultat/encodé.txt ./t/input_files/simplest_no_node_section.texi |
| résultat/encodé.txt file name encoded in utf8 |
| |
| ./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o char_latin1_latin1_in_refs_tree.txt ./t/input_files/char_latin1_latin1_in_refs.texi |
| char_latin1_latin1_in_refs_tree.txt content encoded in latin1 |
| |
| |
| Texinfo manuals translations |
| ============================ |
| |
| An possible system, for translation only, no special handling of |
| cross-references is ti use po4a, possibly with Texinfo SWIG interface |
| based new texinfoparser format. |
| |
| Discussions |
| ----------- |
| |
| Main TODO |
| Support installation of manuals in different languages, along these lines: |
| . support a LINGUAS file or variable saying which subdirs LL in the |
| source to descend into (under doc/). |
| . within each subdir LL, install the info files into $infodir/LL, |
| and run install-info on $infodir/LL/dir. |
| . info (both emacs and standalone) should read $infodir/$LANG/dir |
| as the first dir file, and likewise read info files first from |
| $infodir/$LANG, before falling back to $infodir. |
| . consider ways to avoid installing images in both places. |
| In fact, images probably need to be installed in a subdir |
| $infodir/MANUAL/ in the first place, to avoid conflicts of having |
| the same image name in different manuals. |
| For a test case, see texinfo cvs, with its one translated manual |
| (info-fr.texi). |
| From Wojciech Polak. |
| This is only one possibility, and should not necessarily need a |
| change to the info readers, changing installation and INFOPATH |
| could be enough. There is another often used possibility, naming |
| manuals with a suffix for the language. |
| |
| https://lists.gnu.org/archive/html/bug-texinfo/2008-07/msg00005.html |
| https://lists.gnu.org/archive/html/bug-texinfo/2008-07/msg00022.html |
| using GUILE to store parsed Texinfo input, so one can write Scheme |
| code to play |
| - generating skeletons of Texinfo files to start a translation (it's |
| very useful to mention original node names in English in the |
| translation), |
| "linking" the same nodes in different languages, |
| so that e.g. a cross-reference to a node untranslated in some language |
| can be replaced by a cross-reference to the same node in a different |
| language. |
| - making statistics about translations (e.g. which parts of a document |
| are translated), which is useful for huge documents that cannot be |
| easily translated in one go, or looking for nodes that exist in one |
| lanugage but not other (if you allow translators to write new |
| documentation in their own language). |
| |
| https://lists.gnu.org/archive/html/bug-texinfo/2014-09/msg00002.html |
| https://lists.gnu.org/archive/html/bug-texinfo/2014-09/msg00010.html |
| Currently the usual practice is that a translated document is just another |
| document with some -xx ending in the base name (where xx refers to the |
| language, e.g. fr for French). |
| In my opinion an alternative method would be that translated document should |
| rather bare exactly the same name but just be in a language specific directory |
| named xx, with some fall backs to another language (meaning ../oo, where oo |
| refers to the other language) when manual does not exists. |
| I think that the alternative method is closer to what people usually do for Web |
| sites, in which each language is its own tree replica (file/directory names are |
| same, but only file contents differ). |
| About node name translation for presentation to |
| user (there should remain a node true name that should remain the same whatever |
| the translation. |
| |
| https://lists.gnu.org/archive/html/bug-texinfo/2025-02/msg00079.html |
| About a second argument for anchor for the text displayed |
| * For translators, having the same anchor name as in the original |
| document helps a lot in translation. And vice versa, it helps |
| maintainers who don't speak the particular language to still do |
| various maintenance tasks easier. |
| |
| * It helps avoid issues with transliteration. All redirection file |
| names are in a single language, namely English. |
| |
| https://lists.gnu.org/archive/html/bug-texinfo/2025-02/msg00080.html |
| My feeling is that another solution would be better, for example a macro |
| that separates the association to the english label from the output |
| itself, based on the document language. Something like |
| |
| @macro translateanchor {en, name} |
| @anchor{\name\} |
| @end macro |
| |
| @macro translateref {en, name} |
| \name\ |
| @end macro |
| |
| @translateanchor{in english, en français} |
| |
| @ref{@translateref{in english, en français}, ...}. |
| |
| it could probably |
| be possible to do something using customization functions to get the |
| macro call arguments by using the source marks |
| |
| https://lists.gnu.org/archive/html/bug-texinfo/2025-05/msg00008.html |
| entry in 'dir' |
| |
| * Guix: (guix.es). Gestión del software instalado y la |
| configuración del sistema. |
| ... |
| * Guix: (guix). Manage installed software and system |
| configuration. |
| |
| I don't know if any changes are needed to Texinfo to support installation |
| of Info files in multiple languages. It is something that could |
| perhaps be implemented with existing Info features, such as search paths |
| and installing Info files in different directories depending on the language. |
| |
| https://lists.gnu.org/archive/html/help-texinfo/2024-01/msg00009.html |
| @dircategory |
| what about using “Translated manuals” for now, |
| since the number of translated manuals is very low? |
| |
| The day when we have a reasonable amount of translated manuals, it will |
| make sense to have them under the original English manuals. I guess it |
| won’t be too difficult to make the change. |
| |
| Would @author be also used for the translator? |
| |
| For ex. emacs-gnutls.txi has |
| @author by Ted Zlatanov |
| |
| In French that would be |
| @author par Ted Zlatanov |
| @author traduit en français par Achille Talon |
| |
| https://lists.gnu.org/archive/html/help-texinfo/2024-01/msg00057.html |
| # Translating the Emacs manuals |
| |
| ## Location |
| |
| Translated manuals sources are located in the ’doc/lang’ directory, under the |
| directory whose name corresponds to the translated language. |
| |
| For ex. French manual sources are found under ’doc/lang/fr’. |
| |
| The structure of the language folders should match the structure of the English |
| manuals (i.e. ’misc’, ’man’, ’lispref’, ’lispintro’, ’emacs’). |
| |
| ## Format |
| |
| The translated manuals should be in the same format as the English sources: |
| TexInfo. |
| |
| ### TexInfo specific issues |
| |
| Until Emacs/TexInfo provide better solutions, here are a few rules to follow: |
| |
| - ’@node’ items should be translated but should be accompanied by an ’@anchor’ |
| that contains the original English ’@node’ contents. |
| |
| - You should add a ’@documentlanguage’ directive that includes your language. |
| |
| For ex. ’@documentlanguage zh’ |
| |
| - ’@author’ can be used for the translator’s name. |
| |
| Fr ex. `@author traduit en français par Achile Talon` |
| |
| ## Committing the files |
| |
| Like other source files, translations should be committed to a separate branch |
| for revision. Ideally, the branch name should be suggestive of what they |
| contain. |
| |
| For ex: ’origin/translations/emacs-lisp-intro-ar.texi’ |
| |
| Before committing the files for revision, ensure that they have been properly |
| checked for spelling/grammar/typography by at least using the tools that Emacs |
| provides. |
| |
| You should also make sure that the TexInfo files build properly on your system. |
| |
| Once the files are committed, announce the commit to the emacs-devel list so |
| that fellow translators can check the file and review it. |
| |
| ## Discussions about translation issues |
| |
| Translation-related discussions are welcome on the emacs-devel list. |
| Discussions specific to your language do not have to take place in English. |
| |
| ## Notes about the original document |
| |
| During the course of the translation, you will find parts of the original |
| document that needs to be updated or otherwise fixed. If you do not intend to |
| modify the original documents right away, do not add notes to the original |
| documents but rather keep such notes inside your translation as TODO items |
| until you action them. |
| |
| ## Translation teams |
| |
| The number of words in the Emacs manuals is above 2,000,000 words. While one |
| individual could theoretically translate all the files, it is more practical to |
| work in language teams. |
| |
| If you have a small group of translators willing to help, make sure that the |
| files are properly reviewed before committing them (see above.) |
| |
| ## Translation processes |
| |
| Emacs does not provide yet tools that significantly help the translation |
| process. A few ideal functions would be: |
| |
| - automatic lookup of a list of glossary items when starting to work on a |
| translation “unit” (paragraph or otherwise), such glossary terms should be |
| easily insertable at point |
| |
| - automatic lookup of past translations to check for similarity and improve |
| homogeneity over the whole document set, such past translation matches should |
| be easily insertable at point |
| |
| Although the PO format has not been developed with documentation in mind, it is |
| well known among free software translation teams and you can easily use the |
| ’po4a’ utility to convert TexInfo to PO for work in translation tools that |
| support the PO format. |
| |
| https://lists.gnu.org/archive/html/bug-texinfo/2014-09/msg00010.html |
| ---> related to docstrings in Emacs |
| One second thing was internationalisation for *DocStrings*, in that context I |
| was suggesting that docstring could contain some link to some manual variable |
| or function description, so switching the Help language based on configured |
| locale would just mean switching to the correct manual translation. |
| |
| |
| LilyPond translations system |
| ---------------------------- |
| |
| LilyPond has a very elaborate system: |
| https://lilypond.org/doc/v2.25/Documentation/contributor-big-page#cross_002dreferences |
| This system handles both translations and inclusion of LilyPond |
| snippets in output manuals. |
| |
| Cross-references link target @rchanges: Changes file |
| All these commands also have a @...named version, which allows to specify the |
| displayed text for the reference as a second argument. This is mainly used in |
| translations, for example @rlearningnamed{I'm hearing voices, J'entends des |
| voix}. |
| |
| Documentation/en/macros.itexi |
| @macro rchanges{TEXT} |
| @ref{\TEXT\,,,changes-big-page,Changes} |
| @end macro |
| |
| @macro rchangesnamed{TEXT,DISPLAY} |
| @ref{\TEXT\,,\DISPLAY\,changes-big-page,Changes} |
| @end macro |
| |
| https://lilypond.org/doc/v2.25/Documentation/contributor-big-page#translating-the-website-and-other-texinfo-documentation |
| |
| Node names are not translated, only the section titles are. That is, every |
| piece in the original file like |
| |
| @node Foo bar |
| @section_command Bar baz |
| |
| should be translated as |
| |
| @node Foo bar |
| @section_command translation of Bar baz |
| |
| you should at least write the node definition in the expected source file and |
| define all its parent nodes; for each node you have defined this way but have |
| not translated, insert a line that contains @untranslated. That is, you should |
| end up for each untranslated node with something like |
| |
| @node Foo bar |
| @section_command translation of Bar baz |
| |
| @untranslated |
| |
| ... |
| However, some music snippets containing text that shows in the rendered music, |
| and sometimes translating this text really helps the user to understand the |
| documentation; in this case, and only in this case, you may as an exception |
| translate text in the music snippet, and then you must add a line immediately |
| before the @lilypond block, starting with |
| |
| @c KEEP LY |
| |
| When you encounter |
| |
| @lilypondfile[...,texidoc,...]{filename.ly} |
| |
| in the source, open Documentation/snippets/filename.ly, translate the texidoc header field it contains, wrap it with texidocMY-LANGUAGE = "...", and write it into Documentation/MY-LANGUAGE/texidocs/filename.texidoc. Additionally, you may translate the snippet’s title in the doctitle header field in case doctitle is a fragment option used in @lilypondfile; you can do this exactly the same way as texidoc. For instance, Documentation/es/texidocs/filename.texidoc may contain |
| |
| doctitlees = "Spanish title baz" |
| texidoces = " |
| Spanish translation blah |
| " |
| |
| https://lilypond.org/doc/v2.25/Documentation/contributor-big-page#documentation-translation-maintenance |
| |
| make ISOLANG=MY-LANGUAGE check-translation |
| |
| At the beginning of each translated file (except for .po files), there is a |
| committish (i.e., an SHA-1 tag consisting of 40 hexadecimal digits that |
| uniquely identify a specific commit in a git repository) to represent the |
| revision of the sources that you have used to translate this file from the file |
| in English. |
| |
| When you have pulled and updated a translation, it is very important to update |
| this committish in the files you have completely updated (and only these); |
| |
| |
| Notes on classes names in HTML |
| ============================== |
| |
| In january 2022 the classes in HTML elements were normalized. There are no |
| rules, but here is descriptions of the choices made at that time in case one |
| want to use the same conventions. The objective was to have the link between |
| @-commands and classes easy to understand, avoid ambiguities, and have ways to |
| style most of the output. |
| |
| The class names without hyphen were only used for @-commands, with one |
| class attribute on an element maximum for each @-command appearing in the |
| Texinfo source. It was also attempted to have such a class for all |
| the @-commands with an effect on output, though the coverage was not perfect, |
| sometime it is not easy to select an element that would correspond to the |
| most logical association with the @-command (case of @*ref @-commands with |
| both a <cite> and a <a href> for example). |
| |
| Class names <command>-* with <command> a Texinfo @-command name were |
| only used for classes marking elements within an @-command but in other |
| elements that the main element for that @-command, in general sub elements. |
| For example, a @flushright lead to a <div class="flushright"> where the |
| @flushright command is and to <p class="flushright-paragraph"> for the |
| paragraphs within the @flushright. |
| |
| Class names *-<command> with <command> a Texinfo @-command name were |
| reserved for uses related to @-command <command>. For example |
| classes like summary-letter-printindex, cp-entries-printindex or |
| cp-letters-header-printindex for the different parts of the @printindex |
| formatting. |
| |
| def- and -def are used for classes related to @def*, in general without |
| the specific command name used. |
| |
| For the classes not associated with @-commands, the names were selected to |
| correspond to the role in the document rather than to the formatting style. |
| |
| |
| In HTML, some @-commands do not have an element with a class associated, or the |
| association is not perfect. There is @author in @quotation, @-command affected |
| by @definfoenclose. @pxref and similar @-commands have no class for references |
| to external nodes, and don't have the 'See ' in the element for references to |
| internal nodes. In general, it is because gdt() is used instead of direct |
| HTML. |
| |
| |
| Notes on protection of punctuation in nodes (done) |
| ================================================== |
| |
| This is implemented, in tta/perl/Texinfo/TransformationsNonXS.pm in |
| _new_node for Texinfo generation, and in Info with INFO_SPECIAL_CHARS_QUOTE. |
| *[nN]ote is not protected, though, but it is not clear it would be right to |
| do. There is a warning with @strong{note...}. |
| |
| Automatic generation of node names from section names. To be protected: |
| * in every case |
| ( at the beginning |
| * In @node line |
| commas |
| * In menu entry |
| * if there is a label |
| tab comma dot |
| * if there is no label |
| : |
| * In @ref |
| commas |
| |
| In Info |
| |
| in cross-references. First : is searched. if followed by a : the node |
| name is found and there is no label. When parsing a node a filename |
| with ( is searched for. Nested parentheses are taken into account. |
| |
| Nodes: |
| * in every case |
| ( at the beginning |
| * in Node line |
| commas |
| * In menu entry and *Note |
| * if there is a label |
| tab comma dot |
| * if there is no label |
| : |
| |
| Labels in Info (not index entries, in index entries the last : not in |
| a uoted node should be used to determine the end of the |
| index entry). |
| : |
| |
| * at the beginning of a line in a @menu |
| *note more or less everywhere |
| |
| |
| Interrogations and remarks |
| ========================== |
| |
| In general, it is better to avoid extra information in the tree that |
| can be retrieved, or that is better in separate data, such as node, |
| sectioning commands and heading relations. In particular to avoid |
| referring to elements in other parts of the tree. Right now, there |
| isn't any extra information referring to elements in other parts of |
| the tree (no extra element, contents nor directions). It would |
| be better to keep it like this, but we keep the corresponding code in |
| case we need those extra categories back again. |
| |
| For converters in C, agreed with Gavin that, in general, it is better not to |
| translate a perl tree in input, but access directly the C tree that was setup |
| by the XS parser. When using the TreeElements interface, being able to |
| create a C element based on a Perl element is needed, though. |
| |
| There is no forward looking code anymore, so maybe a lex/yacc parser |
| could be used for the main loop. More simply, a binary tokenizer, at |
| least, could make for a notable speedup. |
| |
| From vincent Belaïche. About svg image files in HTML: |
| |
| I don't think that supporting svg would be easy: its seems that to embed an |
| svg picture you need to declare the width x height of the frame in |
| which you embed it, and this information cannot be derived quite |
| straightforwardly from the picture. |
| With @image you can declare width and height but this is intended for |
| scaling. I am not sure whether or not that these arguments can be used |
| for the purpose of defining that frame... |
| What I did in 5x5 is that coded the height of the frame directly in |
| the macro @FIGURE with which I embed the figure, without going through |
| an argument. |
| The @FIGURE @macro is, for html: |
| @macro FIGURE {F,W} |
| @html |
| <div align="center"> |
| <embed src="5x5_\F\.svg" height="276" |
| type="image/svg+xml" |
| pluginspage="http://www.adobe.com/svg/viewer/install/" /></div> |
| @end html |
| @end macro |
| |
| |
| In general, external information for cross-references to other manuals from |
| htmlxref.cnf or htmlxref.d/*.cnf files should be used to determine |
| the split of a reference manual, and if no information is set, it |
| is a good idea to generate both a mono and a split manual. Therefore the |
| following situation is not something that needs to be supported/implemented, |
| however we keep the information on the javascript code here. |
| |
| If a manual is split and the person generating the manual wants that |
| references to a mono manual to be redirected to the split files, it should |
| be possible to create a manual.html file that redirects to the |
| manual_html/node.html files using the following javascript function: |
| |
| function redirect() { |
| switch (location.hash) { |
| case "#Node1": |
| location.replace("manual_html/Node1.html#Node1"); break; |
| case "#Node2" : |
| location.replace("manual_html/Node2.html#Node2"); break; |
| ... |
| default:; |
| } |
| } |
| |
| And, in the <body> tag of manual.html: |
| <body onLoad="redirect();"> |
| |
| |
| Need to make sure that a fix needed |
| ----------------------------------- |
| |
| In HTML, HEADERS is used. But not in other modules, especially not in |
| Plaintext.pm or Info.pm, this is determined by the module used (Plaintext.pm |
| or Info.pm). No idea whether it is right or wrong. |
| |
| def/end_of_lines_protected_in_footnote.pl the footnote is |
| (1) -- category: deffn_name arguments arg2 more args with end of line |
| and not |
| (1) |
| -- category: deffn_name arguments arg2 more args with end of line |
| It happens this way because the paragraph starts right after the footnote |
| number. |
| |
| in HTML, the argument of a quotation is ignored if the quotation is empty, |
| as in |
| @quotation thing |
| @end quotation |
| Is it really a bug? |
| |
| In @copying things like some raw formats may be expanded. However it is |
| not clear that it should be the same than in the main converter. Maybe a |
| specific list of formats could be passed to Texinfo::Convert::Text::convert, |
| which would be different (for example Info and Plaintext even if converting |
| HTML). Not clear that it is a good idea. Also this requires a test, to begin |
| with. |
| |
| Punctuation and spaces before @image do not lead to a doubling of space. |
| In fact @image is completly formatted outside of usual formatting containers. |
| Not sure what should be the right way? |
| test in info_test/image_and_punctuation |
| |
| in info_tests/error_in_footnote there is an error message for each |
| listoffloats; Line numbers are right, though, so maybe this is not |
| an issue. |
| |
| converters_tests/things_before_setfilename there is no error |
| for anchor and footnote before setfilename. It is not clear that |
| there should be, though. |
| |
| In Info, image special directive on sectioning command line length |
| is taken into account for the underlying characters line count inserted |
| below the section title. There is no reason to underline the image |
| special directive. Since the image rendering and length of replacement |
| text depends on the Info viewer, however, there is no way to know in |
| advance the lenght of text to underline (if any). It is therefore unclear |
| what would be the correct underlying characters count. |
| An example in formats_encodings/at_commands_in_refs. |
| |
| When using Perl modules, many strings in debugging output are internal |
| Perl strings not encoded before being output, leading to |
| 'Wide character in print' messages (in C those strings are always encoded |
| in UTF-8). Not clear that it is an issue. For example with |
| export TEXINFO_XS=omit |
| /usr/bin/perl -w ./../perl/texi2any.pl --force --conf-dir ./../perl/t/init/ --conf-dir ./../perl/init --conf-dir ./../perl/ext -I ./coverage/ -I coverage// -I ./ -I . -I built_input --error-limit=1000 -c TEST=1 --output coverage//out_parser/formatting_macro_expand/ --macro-expand=coverage//out_parser/formatting_macro_expand/formatting.texi -c TEXINFO_OUTPUT_FORMAT=structure ./coverage//formatting.texi --debug=1 2>t.err |
| |
| |
| HTML5 validation tidy errors that do not need fixing |
| ---------------------------------------------------- |
| |
| # to get only errors: |
| tidy -qe *.html |
| |
| Some can also be validation errors in other HTML versions. |
| |
| missing </a> before <a> |
| discarding unexpected </a> |
| nested <a> which happens for @url in @xref, which is valid Texinfo. |
| |
| Warning: <a> anchor "..." already defined |
| Should only happen with multiple insertcopying. |
| |
| Warning: trimming empty <code> |
| Normally happens only for invalid Texinfo, missing @def* name, empty |
| @def* line... |
| |
| <td> attribute "width" not allowed for HTML5 |
| <th> attribute "width" not allowed for HTML5 |
| These attributes are obsolete (though the elements are |
| still part of the language), and must not be used by authors. |
| The CSS replacement would be style="width: 40%". |
| However, width is kept as an attribute in texi2any @multitable output and not |
| as CSS because it is not style, but table or even line specific formatting. |
| If the _INLINE_STYLE_WIDTH undocumented option is set, CSS is used. |
| It is set for EPUB. |
| See |
| https://lists.gnu.org/archive/html/bug-texinfo/2024-09/msg00065.html |
| |
| |
| Specialized synopsis in DocBook |
| ------------------------------- |
| |
| Use of specialized synopsis in DocBook is not a priority and it is not even |
| obvious that it is interesting to do so. The following notes explain the |
| possibilities and issues extensively. |
| |
| Instead of synopsis it might seem to be relevant to use specialized synopsis, |
| funcsynopsis/funcprototype for deftype* and some def*, and other for object |
| oriented. There are many issues such that this possibility do not appear |
| appealing at all. |
| |
| 1) there is no possibility to have a category. So the category must be |
| added somewhere as a role= or in the *synopsisinfo, or this should only |
| be used for specialized @def, like @defun. |
| |
| 2) @defmethod and @deftypemethod cannot really be mapped to methodsynopsis |
| as the class name is not associated with the method as in Texinfo, but |
| instead the method should be in a class in docbook. |
| |
| 3) From the docbook reference for funcsynopsis |
| "For the most part, the processing application is expected to |
| generate all of the parentheses, semicolons, commas, and so on |
| required in the rendered synopsis. The exception to this rule is |
| that the spacing and other punctuation inside a parameter that is a |
| pointer to a function must be provided in the source markup." |
| |
| So this mean it is language specific (C, as said in the docbook doc) |
| and one have to remove the parentheses, semicolons, commas. |
| |
| See also the mails from Per Bothner bug-texinfo, Sun, 22 Jul 2012 01:45:54. |
| |
| specialized @def, without a need for category: |
| @defun and @deftypefun |
| <funcsynopsis><funcprototype><funcdef>TYPE <function>NAME</function><paramdef><parameter>args</parameter></paramdef></funcprototype></funcsynopsis> |
| |
| specialized @def, without a need for category, but without DocBook synopsis |
| because of missing class: |
| @defmethod, @deftypemethod: methodsynopsis cannot be used since the class |
| is not available |
| @defivar and @deftypeivar: fieldsynopsis cannot be used since the class |
| is not available |
| |
| Generic @def with a need for a category |
| For deffn deftypefn (and defmac?, defspec?), the possibilities of |
| funcsynopsis, with a category added could be used: |
| <funcsynopsis><funcprototype><funcdef role=...>TYPE <function>NAME</function></funcdef><paramdef>PARAMTYPE <parameter>PARAM</parameter></paramdef></funcprototype></funcsynopsis> |
| |
| Alternatively, use funcsynopsisinfo for the category. |
| |
| Generic @def with a need for a category, but without DocBook synopsis because |
| of missing class: |
| @defop and @deftypeop: methodsynopsis cannot be used since the class |
| is not available |
| defcv, deftypecv: fieldsynopsis cannot be used since the class |
| is not available |
| |
| Remaining @def without DocBook synopsis because there is no equivalent, |
| and requires a category |
| defvr (defvar, defopt), deftypevr (deftypevar) |
| deftp |
| |
| |
| Solaris 11 |
| ========== |
| |
| # recent Test::Deep requires perl 5.12 |
| cpan> o conf urllist push http://backpan.perl.org/ |
| cpan RJBS/Test-Deep-1.127.tar.gz |
| |
| Also possible to install Texinfo dependencies with openCSW, like |
| pkgutil -y -i CSWhelp2man CSWpm-data-compare CSWpm-test-deep |
| |
| The system perl may not be suitable to build XS modules, and the |
| system gawk may be too old, openCSW may be needed. For example: |
| ./configure PERL=/opt/csw/bin/perl GAWK=/opt/csw/bin/gawk CFLAGS='-g' LDFLAGS=-L/opt/csw/lib/ CPPFLAGS='-I/opt/csw/include/' PERL_EXT_CFLAGS='-g' PERL_EXT_CPPFLAGS='-I/opt/csw/include/' PERL_EXT_LDFLAGS=-L/opt/csw/lib/ |
| |
| Misc notes |
| ========== |
| |
| # differences between dist in and out source after maintainer-clean |
| rm -f texinfo-7.*.*.tar.gz |
| ./autogen.sh |
| ./configure |
| make maintainer-clean |
| ./autogen.sh |
| ./configure |
| make dist |
| rm -rf in_source_dist_contents |
| mkdir in_source_dist_contents |
| (cd in_source_dist_contents && tar xvf ../texinfo-7.*.*.tar.gz) |
| make maintainer-clean |
| rm -rf bb |
| mkdir bb |
| cd bb |
| ../configure |
| make dist |
| tar xvf texinfo-7.*.*.tar.gz |
| diff -u -r ../in_source_dist_contents/texinfo-7.*.*/ texinfo-7.*.*/ > ../build_in_out_source_differences.diff |
| |
| Test validity of Texinfo XML or docbook |
| export XML_CATALOG_FILES=~/src/texinfo/tta/maintain/catalog.xml |
| xmllint --nonet --noout --valid commands.xml |
| |
| tidy does not seems to be released and/or maintained anymore. It incorrectly |
| emits an error for ol type attribute. |
| tidy -qe *.html |
| |
| profiling: package on debian: |
| libdevel-nytprof-perl |
| In doc: |
| perl -d:NYTProf ../tta/perl/texi2any.pl texinfo.texi --html |
| perl -d:NYTProf ../tta/perl/texi2any.pl texinfo.texi |
| nytprofhtml |
| # firefox nytprof/index.html |
| |
| Test with 8bit locale: |
| export LANG=fr_FR; export LANGUAGE=fr_FR; export LC_ALL=fr_FR |
| xterm & |
| |
| Turkish locale, interesting as ASCII upper-case letter I can become |
| a (non-ASCII) dotless i when lower casing. (Eli recommendation). |
| export LANG=tr_TR.UTF-8; export LANGUAGE=tr_TR.UTF-8; export LC_ALL=tr_TR.UTF-8 |
| |
| On ExtUtils::Embed flags (the documentation tends to be inaccurate): |
| ldopts: ccdlflags ldflags "libperl and perllibs through MakeMaker + static_ext" |
| ccopts: ccflags perl_inc |
| |
| convert to pdf from docbook |
| xsltproc -o intermediate-fo-file.fo /usr/share/xml/docbook/stylesheet/docbook-xsl/fo/docbook.xsl texinfo.xml |
| fop -r -pdf texinfo-dbk.pdf -fo intermediate-fo-file.fo |
| |
| dblatex -o texinfo-dblatex.pdf texinfo.xml |
| |
| Open a specific info file in Emacs Info reader: C-u C-h i |
| |
| Count reference of the SV |
| Devel::Peek::SvREFCNT. Not actually used because it cannot load in an eval. |
| Instead implemented in |
| Texinfo::ManipulateTree::SvREFCNT |
| Count reference of the object pointed to |
| Devel::Refcount::refcount |
| |
| In tta/tests/, generate Texinfo file for Texinfo TeX coverage |
| ../perl/texi2any.pl --force --error=100000 -c TEXINFO_OUTPUT_FORMAT=plaintexinfo -D valid layout/formatting.texi > formatting_valid.texi |
| |
| From doc/ |
| texi2pdf -I ../tta/tests/layout/ ../tta/tests/formatting_valid.texi |
| |
| To generate valgrind .supp rules: --gen-suppressions=all --log-file=gen_supp_rules.log |
| |
| mkdir -p val_res |
| PERL_DESTRUCT_LEVEL=2 |
| export PERL_DESTRUCT_LEVEL |
| for file in t/*.t ; do bfile=`basename $file .t`; echo $bfile; valgrind --suppressions=../texi2any.supp -q perl -w $file > val_res/$bfile.out 2>&1 ; done |
| |
| With memory leaks |
| for file in t/*.t ; do bfile=`basename $file .t`; echo $bfile; valgrind --suppressions=../texi2any.supp -q --leak-check=full perl -w $file > val_res/$bfile.out 2>&1 ; done |
| for file in t/*.t ; do bfile=`basename $file .t`; echo $bfile; valgrind -q --leak-check=full perl -w $file > val_res/$bfile.out 2>&1 ; done |
| (for file in t/z_misc/*.t ; do bfile=`basename $file .t`; echo z_misc/$bfile; valgrind -q --leak-check=full perl -w $file ; done) > val_res/z_misc.out 2>&1 |
| |
| For tests in tta/tests, a way to have valgrind call prependend is to add, |
| in tta/defs: |
| prepended_command='valgrind --leak-check=full -q' |
| prepended_command='valgrind --leak-check=full -q --suppressions=../texi2any.supp' |
| |
| Before the code reorganization and the separation of linked against Gnulib |
| and code linked against Perl, in some cases, memory that was not |
| released/freed, but that should still be accessible at the end of conversion |
| was shown by valgrind as being leaked, typically static/global |
| variables freed upon reuse, or left unreleased on purpose (parser conf, for |
| example) and some symbols were shown as ???. It is unclear what the problem |
| was, the documentation of valgrind hints that the memory appearing as leaked |
| could have come a dlclosed object. In that case adding --keep-debuginfo=yes |
| showed the missing symbols as seen in the valgrind documentation. |
| |
| rm -rf t/check_debug_differences/ |
| mkdir t/check_debug_differences/ |
| for file in t/*.t ; do bfile=`basename $file .t`; perl -w $file -d 1 2>t/check_debug_differences/XS_$bfile.err ; done |
| export TEXINFO_XS_PARSER=0 |
| for file in t/*.t ; do bfile=`basename $file .t`; perl -w $file -d 1 2>t/check_debug_differences/PL_$bfile.err ; done |
| for file in t/*.t ; do bfile=`basename $file .t`; sed 's/^XS|//' t/check_debug_differences/XS_$bfile.err | diff -u t/check_debug_differences/PL_$bfile.err - > t/check_debug_differences/debug_$bfile.diff; done |
| |
| Full check, including XS conversion and memory leaks in debug: |
| PERL_DESTRUCT_LEVEL=2 |
| export PERL_DESTRUCT_LEVEL |
| for file in t/*.t ; do bfile=`basename $file .t`; valgrind -q --leak-check=full --keep-debuginfo=yes perl -w $file -d 1 2>t/check_debug_differences/XS_$bfile.err ; done |
| |
| Check of XS interface to Perl |
| export TEXINFO_XS_EXTERNAL_FORMATTING=1 |
| export TEXINFO_XS_EXTERNAL_CONVERSION=1 |
| |
| |
| Analysing memory use: |
| valgrinf massif useful-heap approximate distribution in 2024 (obsolete) |
| mkdir -p massif |
| valgrind --tool=massif --massif-out-file=massif/massif_info.out perl -w texi2any.pl ../doc/texinfo.texi |
| ms_print massif/massif_info.out > massif/ms_print_info.out |
| 16M Perl |
| 36M C tree |
| 50M Perl tree (visible in detailed use, but difficult to do the |
| imputation right, some may correspond with other uses |
| of Perl memory) |
| 5M (approximate, not visible in the detailed use, based on difference |
| in use over time) conversion |
| |
| With full XS (7.2 64M, with text separate 58.5M, without info_info 56M |
| with integer extra keys 54M, with source marks as pointers 52.3M) |
| mkdir -p massif |
| valgrind --tool=massif --massif-out-file=massif/massif_html.out perl -w texi2any.pl --html ../../doc/texinfo.texi |
| ms_print massif/massif_html.out > massif/ms_print_html.out |
| useful-heap |
| 25M = 13.1 + 5.8 + 2.9 + 2.5 + 0.7 Perl |
| 17.8M Tree |
| 6 + 5 = 11M new_element |
| 3.5M reallocate_list |
| 0.5M get_associated_info_key (below threshold in later reports) |
| 2.8M = 0.8 + 0.7 +1.3 text |
| 5.2M = 3.8 (text) + 0.7 (text printindex) + 0.7: conversion, |
| mainly text in convert_output_output_unit* |
| (+1.3M by approximate difference with total) |
| (7.5 + 1.3) - (3.8 + 0.7 + 0.7 + 0.8 +1.3) = 1.5 M Text not imputed |
| 3. - 0.5 = 2.5M remaining not imputed (- get_associated_info_key) |
| 52M TOTAL (for 52.3M reported) |
| |
| |
| Using callgrind to find the time used by functions |
| |
| valgrind --tool=callgrind perl -w texi2any.pl ../../doc/texinfo.texi --html |
| # to avoid cycles (some remain in Perl only code) that mess up the graph: |
| valgrind --tool=callgrind --separate-callers=3 --separate-recs=10 perl -w texi2any.pl ../../doc/texinfo.texi --html |
| valgrind --tool=callgrind --separate-callers=4 --separate-recs=11 perl -w texi2any.pl ../../doc/texinfo.texi |
| kcachegrind callgrind.out.XXXXXX |
| |
| This is obsolete with output overriding, although the distribution changed |
| very little after reattributing the shares. |
| For the Texinfo manual with full XS, in 2024, Perl uses 22% of the time |
| (for html), now only for code hopefully called once. The switch to |
| global locales for setlocale calling that is needed for Perl takes also 4%. |
| Calling Perl getSortKey uses about 28% (more on sorting and C below). |
| Decomposition of the time used for the Texinfo manual with full XS |
| (in percent): |
| parser: 11.5 |
| index sorting: 30 |
| main conversion to HTML: 24.8 = 54.8 - 30 |
| |
| node redirections: 2.6 |
| prepare conversion units: 2.3 |
| remove document: 1.8 |
| associate internal references: 0.53 |
| prepare unit directions: 0.41 |
| setup indices sort strings: 0.36 |
| reset converter: 0.23 |
| structuring transformation1: 0.19 |
| structuring transformation2: 0.19 |
| remaining Texinfo XS code: 0.35 |
| = 8.95 |
| Perl: 22.5 = 7 + 15.2 + (75.57 - 54.8 - 11.5 - 8.95) |
| SUM: 98 |
| |
| |
| Setting flags |
| # some features are only enabled at -O2, but sometime -O0 is better |
| # for debugging with valgrind |
| our_CFLAGS='-g -O0 -Wformat-security -Wstrict-prototypes -Wall -Wno-parentheses -Wno-unused-parameter -Wextra' |
| # keep cpp expanded files |
| # -save-temps |
| # Without -Wstack-protector there is no message on functions not protected. |
| # All these are in gnulib or gettext for now. |
| # -fno-omit-frame-pointer is better for debugging with valgrind, but has |
| # some speed penalty |
| our_CFLAGS='-g -O2 -D_FORTIFY_SOURCE=2 -Wformat-security -Wstrict-prototypes -Wall -Wno-parentheses -Wno-missing-braces -Wno-unused-parameter -fstack-protector-all -Wextra -fno-omit-frame-pointer' |
| our_CFLAGS='-g -O2 -Wformat-security -Wstrict-prototypes -Wall -Wno-parentheses -Wno-missing-braces -Wno-unused-parameter -Wextra' |
| ./configure --with-swig --enable-additional-checks "CFLAGS=$our_CFLAGS" "PERL_EXT_CFLAGS=$our_CFLAGS" |
| ./configure --with-swig --enable-using-c-texi2any --enable-additional-checks "CFLAGS=$our_CFLAGS" "PERL_EXT_CFLAGS=$our_CFLAGS" |
| unset our_CFLAGS |
| |
| # test non-XS build |
| ./configure --disable-perl-xs --with-swig --enable-additional-checks "CFLAGS=$our_CFLAGS" PERL_EXT_CFLAGS=fail |
| or |
| ./configure --with-swig --enable-additional-checks "CFLAGS=$our_CFLAGS" PERL_EXT_CC=fail PERL_EXT_CFLAGS=fail |