blob: d097a709095f4b61a70d9ffa903e9e7924469041 [file]
This is the todo list for texi2any
Copyright 2012-2025 Free Software Foundation.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
Before next release
===================
update libintl-perl before the release
system.h macros are duplicated in C/main/utils.h. Ok?
Update paths to htmlxref.d/Texinfo_*.cnf in these files and in README-hacking
and in texinfo.texi
document command_name in texi2any HTML customization API
change name of substitute
Bugs
====
HTML API
========
Issues
------
Some private function used in conversion
_convert_printindex_command
_new_document_context
_convert_def_line_type
_set_code_context
_pop_code_context
Missing documentation
=====================
Tree documentation in ParserNonXS.pm
------------------------------------
elided_rawpreformatted, elided_brace_command_arg types.
'comment_at_end' in info hash
alias_of in info hash
source marks.
special_unit_element type (only in HTML code)
Other
-----
Document *XS_EXTERNAL_FORMATTING *XS_EXTERNAL_CONVERSION?
No documentation of Texinfo::Options hashes used in converters:
multiple_at_command_options, converter_cmdline_options,
converter_customization_options, unique_at_command_options.
Texinfo tree reader
===================
Go through misc_args extra as if going through regular arguments and not
as if it was an attribute.
Treat spaces in info as tree elements, not as attributes. With ignorable
as a category. Maybe treat verb delimiter the same. And also braces
of types with implicit braces?
Categories:
ignorable text
text
element begin
element end
accessors for all:
* parent
* source marks
* type?
accessors for elements:
* command_name/and or enum
* attributes (extra/some string info)
* associated_unit?
Delayed bugs/features
=====================
There is disagreement on the strategy to use for libraries versioning.
For Gavin, the code that can link against libraries should be the code
with the exact same version than the libraries.
For Patrice it is possible to explicitly say that mixing versions is
not supported, but still follow the libtool versioning scheme such that
if better and compatible libraries are installed they will be prefered
to same version library.
Could libraries be compiled with Perl C compiler or other C compiler
and be mixed at the library level? Ask on the list.
Gavin on using CSV files and a Perl script to generate code for Perl and C.
For something like html_style_commands_element.csv, it could potentially
make the code more impenetrable. Suppose for example someone wants to
find which part of the code handles the @sansserif command. If they
search HTML.pm for 'sansserif' that string isn't there. As this file
is shorter, there is less benefit to avoiding duplicating its contents.
However, the purpose, structure and meaning of this file is quite clear.
(Files such as HTML.pm are also not self-contained, accessing information
in files such as Commands.pm, so having another file to access does not
really change the situation.)
For shorter files like default_special_unit_info.csv and
default_direction_strings.csv it is less clear that it offers a net
benefit. To be honest, the function of these files is not particularly
clear to me other than one of them is something to do with node direction
pointers. I don't someone looking at these files for the first time
would have an easy time figuring out what they are for.
Make building "source marks" optional?
hyphenation: should only appear in toplevel.
Some dubious nesting could be warned against. The parsers context
command stacks could be used for that.
Some erroneous constructs not already warned against:
@table in @menu
@example
@heading A heading
@end example
Example in heading/heading_in_example.
@group outside of @example (maybe there is no need for command stack for
this one if @group can only appear directly
in @example).
There is no warning with a block command between @def* and @def*x,
only for paragraph. Not sure that this can be detected with
the context command stack.
@defun a b c d e f
@itemize @minus
truc
@item t
@end itemize
@defunx g h j k l m
@end defun
Modules included in tta/maintain/lib/ are stable, but still need
to be updated from time to time.
Unicode::EastAsianWidth \p{InFullwidth} could be replaced
by native \p{East_Asian_Width=Fullwidth} + \p{East_Asian_Width=Wide}
when the oldest supported Perl version is 5.12.0 (released in 2010).
Transliteration/protection with iconv in C leads to a result different from Perl
for some characters. It seems that the iconv result depends on the locale, and
there are quite a bit of ? output, probably when there is no obvious
transliteration. In those cases, the Unidecode transliterations are not
necessarily very good, either.
Sorting indices in C with strxfrm_l using an utf-8 locale with
LC_COLLATE_MASK on Debian GNU/Linux with glibc is quite consistent with Perl
for number and letters, but leads to a different output than with Perl for non
alphanumeric characters. It is because in Perl we set
'variable' => 'Non-Ignorable' to set Variable Weighting to Non-ignorable (see
http://www.unicode.org/reports/tr10/#Variable_Weighting).
For spaces, the output with Non-Ignorable Variable Weighting looks better for
index sorting, as it allows to have spaces and punctuation marks sort before
letters. Right now, the XS code calls Perl to get the sorting
collation strings with Non-Ignorable Variable Weighting. The
undocumented XS_STRXFRM_COLLATION_LOCALE customization variable can be used
to specify a locale and use it with strxfrm_l to sort, but it is only
for testing and should not be kept in the long term, the plan is to replace by
C code that sets Variable Weighting to Non-ignorable and before that keep
calling Perl.
Related glibc enhancement request:
request for Non-Ignorable Variable Weighting Unicode collation
https://sourceware.org/bugzilla/show_bug.cgi?id=31658
Missing tests
=============
There is a test of translation in parser in a translation in converter, in
tta/perl/t/init_files_tests.t translation_in_parser_in_translation
It would be nice to also have a translation in parser in a translation
in parser. That would mean having a po/gmo file where the string
translated in the parser for @def* indices, for instance "{name} of {class}"
is translated to a string including @def* commands, like
@deftypeop a b c d e f
AA
@end deftypeop
@documentlanguage fr
@deftypemethod g h i j k l
BB
@end deftypemethod
Unit test of end_line_count for Texinfo/Convert/Paragraph.pm .... containers.
anchor in flushright, on an empty line, with a current byte offset.
Future features
===============
Add the possibility to add text to a parsed document by restarting
parsing, when called as parse_texi_piece or parse_texi_line, by
storing the parser document state not already in document in document.
There would be a customization variable to set the parser to be
restartable, and then parse_texi_piece and parse_texi_line could pass
a document to retrieve the parsing state. This should probably
wait for a clear use case. Currently, the parser is never reused
for different documents in the main codes, only in specific tests.
From Gavin on the preamble_before_beginning implementation:
Another way might be to add special input code to trim off and return
a file prelude. This would moves the handling of this from the "parser" code
to the "input" code. This would avoid the problematic "pushing back" of input
and would be a clean way of doing this. It would isolate the handling of
the "\input" line from the other parsing code.
I understand that the main purpose of the preamble_before_beginning element
is not to lose information so that the original Texinfo file could be
regenerated. If that's the case, maybe the input code could return
all the text in this preamble as one long string - it wouldn't have to be
line by line.
See message/thread from Reißner Ernst: Feature request: api docs
https://lists.gnu.org/archive/html/bug-texinfo/2022-02/msg00000.html
Right now VERBOSE is almost not used.
Should we warn if output is on STDOUT and OUTPUT_ENCODING_NAME != MESSAGE_OUTPUT_ENCODING_NAME?
Handle better @exdent in html? (there is a FIXME in the code)
For plaintext, implement an effect of NO_TOP_NODE_OUTPUT
* if true, output some title, possibly based on titlepage
and do not output the Top node.
* if false, current output is ok
Default is false.
In Plaintext, @quotation text could have the right margin narrowed to be more
in line with other output formats.
DocBook
-------
deftypevr, deftypecv: use type and not returnvalue for the type
also informalfigure in @float
also use informaltable or table, for multitable?
Add an @abstract command or similar to Texinfo?
And put in DocBook <abstract>? Beware that DocBook abstract is quite
limited in term of content, only a title and paragraphs. Although block
commands can be in paragraphs in DocBook, it is not the case for Texinfo,
so it would be very limited.
what about @titlefont in docbook?
maybe use simpara instead of para. Indeed, simpara is for paragraph without
block element within, and it should be that way in generated output.
* in docbook, when there is only one section <article> should be better
than book. Maybe the best way to do that would be passing the
information that there is only one section to the functions formatting
the page header and page footer.
there is a mark= attribute for itemizedlist element for the initial mark
of each item but the standard "does not specify a set of appropriate keywords"
so it cannot be used.
Manual tests
============
Some tests are interesting but are not in the test suite for various
reasons. It is not really expected to have much regressions with these
tests. They are shown here for information. It was up to date in
March 2024, it may drift away as tests files names or content change.
From tta/perl directory.
Tests in non utf8 locale
------------------------
In practice these tests were tested in latin1. They are not
in the main test suite because a latin1 locale cannot be expected
to be present reliably.
Tests with correct or acceptable results
****************************************
File not found error message with accented characters in its name:
./texi2any.pl not_éxisting.texi
t/formats_encodings.t manual_simple_utf8_with_error
utf8 manual with errors involving non ascii strings
./texi2any.pl ./t/input_files/manual_simple_utf8_with_error.texi
t/formats_encodings.t manual_simple_latin1_with_error
latin1 manual with errors involving non ascii strings
./texi2any.pl ./t/input_files/manual_simple_latin1_with_error.texi
tests/formatting cpp_lines
CPP directive with non ascii characters, utf8 manual
./texi2any.pl -I ./t/include/ ./t/input_files/cpp_lines.texi
accentêd:7: warning: làng is not a valid language code
The file is UTF-8 encoded, the @documentencoding is obeyed which leads
in the Parser, to an UTF-8 encoding of include file name, and not to the latin1
encoding which should be used for the output messages encoding.
This output is by (Gavin) design.
many_input_files/output_dir_file_non_ascii.sh
non ascii output directory, utf8 manual
./texi2any.pl -o encodé/ ./t/input_files/simplest.texi
test of non ascii included file name in utf8 locale is already in formatting:
formatting/osé_utf8.texi:@include included_akçentêd.texi
./texi2any.pl --force -I ../tests/ ../tests/input/non_ascii/os*_utf8.texi
The file name is utf-8 encoded in messages, which is expected as we do not
decode/encode file names from the command line for messages
osé_utf8.texi:15: warning: undefined flag: vùr
t/80include.t cpp_line_latin1
CPP directive with non ascii characters, latin1 manual
./texi2any.pl --force ./t/input_files/cpp_line_latin1.texi
Need to have recoded file name to latin1 OK, see ../tests/README
tests/encoded manual_include_accented_file_name_latin1
./texi2any.pl --force -I ../tests/built_input/ ../tests/encoded/manual_include_accented_file_name_latin1.texi
latin1 encoded and latex2html in latin1 locale
./texi2any.pl --html --init ext/latex2html.pm ../tests/tex_html/tex_encode_latin1.texi
latin1 encoded and tex4ht in latin1 locale
./texi2any.pl --html --init ext/tex4ht.pm ../tests/tex_html/tex_encode_latin1.texi
cp -p ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi
./texi2any.pl --html --init ext/tex4ht.pm tex_encodé_latin1.texi
Firefox can't find tex_encod%uFFFD_latin1_html/Chapter.html (?)
Opened from within the directory, works well.
epub for utf8 encoded manual in latin1 locale
./texi2any.pl --force -I ../tests/ --init ext/epub3.pm ../tests/input/non_ascii/os*_utf8.texi
epub for latin1 encoded manual in latin1 locale
cp ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi
./texi2any.pl --init ext/epub3.pm tex_encodé_latin1.texi
./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8.texi
output file name is in latin1, but the encoding inside is utf8 consistent
with the document encoding.
./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8_no_setfilename.texi
output file name is utf8 because the utf8 encoded input file name
is decoded using the locale latin1 encoding keeping the 8bit characters
from the utf8 encoding, and the encoding inside is utf8
consistent with the document encoding.
./texi2any.pl --force -I ../tests/ -o encodé/raw.txt -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8.texi
encodé/raw.txt file name encoded in latin1, and the encoding inside is utf8
consistent with the document encoding.
./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext -c 'SUBDIR=subdîr' ../tests/input/non_ascii/os*_utf8.texi
subdîr/osé_utf8.txt file name encoded in latin1, and the encoding inside is utf8
consistent with the document encoding.
./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o résultat/encodé.txt ./t/input_files/simplest_no_node_section.texi
résultat/encodé.txt file name encoded in latin1.
./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o char_latin1_latin1_in_refs_tree.txt ./t/input_files/char_latin1_latin1_in_refs.texi
char_latin1_latin1_in_refs_tree.txt content encoded in latin1
utf8 encoded manual name and latex2html in latin1 locale
./texi2any.pl --verbose -c 'COMMAND_LINE_ENCODING=utf-8' --html --init ext/latex2html.pm -c 'L2H_CLEAN 0' ../tests/input/non_ascii/tex_encod*_utf8.texi
COMMAND_LINE_ENCODING=utf-8 is required in order to have the
input file name correctly decoded as document_name which is used
in init file to set the file names.
latin1 encoded manual name and latex2html in latin1 locale
cp ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi
./texi2any.pl -c 'L2H_CLEAN 0' --html --init ext/latex2html.pm tex_encodé_latin1.texi
Tests with incorrect results, though not bugs
*********************************************
utf8 encoded manual name and latex2html in latin1 locale
./texi2any.pl --html --init ext/latex2html.pm -c 'L2H_CLEAN 0' ../tests/input/non_ascii/tex_encod*_utf8.texi
No error, but the file names are like
tex_encodé_utf8_html/tex_encodÃ'$'\203''©_utf8_l2h.html
That's in particular because the document_name is incorrect because it is
decoded as if it was latin1.
utf8 encoded manual name and tex4ht in latin1 locale
./texi2any.pl --html --init ext/tex4ht.pm ../tests/input/non_ascii/tex_encod*_utf8.texi
html file generated by tex4ht with content="text/html; charset=iso-8859-1">,
with character encoded in utf8 <img src="tex_encodé_utf8_tex4ht_tex0x.png" ...>
Tests in utf8 locales
---------------------
The archive epub file is not tested in the automated tests.
epub for utf8 encoded manual in utf8 locale
./texi2any.pl --force -I ../tests/ --init ext/epub3.pm ../tests/input/non_ascii/osé_utf8.texi
The following tests require latin1 encoded file names. Note that it
could be done automatically now with
tta/maintain/copy_change_file_name_encoding.pl.
However, there is already a test with an include file in latin1, it
is enough.
Create the latin1 encoded from a latin1 console:
cp ../tests/tex_html/tex_encode_latin1.texi tex_encodé_latin1.texi
Run from an UTF-8 locale console. The resulting file has a ? in the name
but result is otherwise ok.
./texi2any.pl tex_encod*_latin1.texi
The following tests not important enough to have regression test
./texi2any.pl --force -I ../tests/ -o encodé/raw.txt -c TEXINFO_OUTPUT_FORMAT=rawtext ../tests/input/non_ascii/os*_utf8.texi
./texi2any.pl --force -I ../tests/ -c TEXINFO_OUTPUT_FORMAT=rawtext -c 'SUBDIR=subdîr' ../tests/input/non_ascii/os*_utf8.texi
Test more interesting in non utf8 locale
./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o résultat/encodé.txt ./t/input_files/simplest_no_node_section.texi
résultat/encodé.txt file name encoded in utf8
./texi2any.pl --set TEXINFO_OUTPUT_FORMAT=debugtree -o char_latin1_latin1_in_refs_tree.txt ./t/input_files/char_latin1_latin1_in_refs.texi
char_latin1_latin1_in_refs_tree.txt content encoded in latin1
Notes on classes names in HTML
==============================
In january 2022 the classes in HTML elements were normalized. There are no
rules, but here is descriptions of the choices made at that time in case one
want to use the same conventions. The objective was to have the link between
@-commands and classes easy to understand, avoid ambiguities, and have ways to
style most of the output.
The class names without hyphen were only used for @-commands, with one
class attribute on an element maximum for each @-command appearing in the
Texinfo source. It was also attempted to have such a class for all
the @-commands with an effect on output, though the coverage was not perfect,
sometime it is not easy to select an element that would correspond to the
most logical association with the @-command (case of @*ref @-commands with
both a <cite> and a <a href> for example).
Class names <command>-* with <command> a Texinfo @-command name were
only used for classes marking elements within an @-command but in other
elements that the main element for that @-command, in general sub elements.
For example, a @flushright lead to a <div class="flushright"> where the
@flushright command is and to <p class="flushright-paragraph"> for the
paragraphs within the @flushright.
Class names *-<command> with <command> a Texinfo @-command name were
reserved for uses related to @-command <command>. For example
classes like summary-letter-printindex, cp-entries-printindex or
cp-letters-header-printindex for the different parts of the @printindex
formatting.
def- and -def are used for classes related to @def*, in general without
the specific command name used.
For the classes not associated with @-commands, the names were selected to
correspond to the role in the document rather than to the formatting style.
In HTML, some @-commands do not have an element with a class associated, or the
association is not perfect. There is @author in @quotation, @-command affected
by @definfoenclose. @pxref and similar @-commands have no class for references
to external nodes, and don't have the 'See ' in the element for references to
internal nodes. In general, it is because gdt() is used instead of direct
HTML.
Notes on protection of punctuation in nodes (done)
==================================================
This is implemented, in tta/perl/Texinfo/Transformations.pm in _new_node for
Texinfo generation, and in Info with INFO_SPECIAL_CHARS_QUOTE. *[nN]ote
is not protected, though, but it is not clear it would be right to do.
There is a warning with @strong{note...}.
Automatic generation of node names from section names. To be protected:
* in every case
( at the beginning
* In @node line
commas
* In menu entry
* if there is a label
tab comma dot
* if there is no label
:
* In @ref
commas
In Info
in cross-references. First : is searched. if followed by a : the node
name is found and there is no label. When parsing a node a filename
with ( is searched for. Nested parentheses are taken into account.
Nodes:
* in every case
( at the beginning
* in Node line
commas
* In menu entry and *Note
* if there is a label
tab comma dot
* if there is no label
:
Labels in Info (not index entries, in index entries the last : not in
a uoted node should be used to determine the end of the
index entry).
:
* at the beginning of a line in a @menu
*note more or less everywhere
Interrogations and remarks
==========================
For converters in C, agreed with Gavin that it is better not to
translate a perl tree in input, but access directly the C tree that
was setup by the XS parser.
There is no forward looking code anymore, so maybe a lex/yacc parser
could be used for the main loop. More simply, a binary tokenizer, at
least, could make for a notable speedup.
From vincent Belaïche. About svg image files in HTML:
I don't think that supporting svg would be easy: its seems that to embed an
svg picture you need to declare the width x height of the frame in
which you embed it, and this information cannot be derived quite
straightforwardly from the picture.
With @image you can declare width and height but this is intended for
scaling. I am not sure whether or not that these arguments can be used
for the purpose of defining that frame...
What I did in 5x5 is that coded the height of the frame directly in
the macro @FIGURE with which I embed the figure, without going through
an argument.
The @FIGURE @macro is, for html:
@macro FIGURE {F,W}
@html
<div align="center">
<embed src="5x5_\F\.svg" height="276"
type="image/svg+xml"
pluginspage="http://www.adobe.com/svg/viewer/install/" /></div>
@end html
@end macro
In general, external information for cross-references to other manuals from
htmlxref.cnf or htmlxref.d/*.cnf files should be used to determine
the split of a reference manual, and if no information is set, it
is a good idea to generate both a mono and a split manual. Therefore the
following situation is not something that needs to be supported/implemented,
however we keep the information on the javascript code here.
If a manual is split and the person generating the manual wants that
references to a mono manual to be redirected to the split files, it should
be possible to create a manual.html file that redirects to the
manual_html/node.html files using the following javascript function:
function redirect() {
switch (location.hash) {
case "#Node1":
location.replace("manual_html/Node1.html#Node1"); break;
case "#Node2" :
location.replace("manual_html/Node2.html#Node2"); break;
...
default:;
}
}
And, in the <body> tag of manual.html:
<body onLoad="redirect();">
Need to make sure that a fix needed
-----------------------------------
In HTML, HEADERS is used. But not in other modules, especially not in
Plaintext.pm or Info.pm, this is determined by the module used (Plaintext.pm
or Info.pm). No idea whether it is right or wrong.
def/end_of_lines_protected_in_footnote.pl the footnote is
(1) -- category: deffn_name arguments arg2 more args with end of line
and not
(1)
-- category: deffn_name arguments arg2 more args with end of line
It happens this way because the paragraph starts right after the footnote
number.
in HTML, the argument of a quotation is ignored if the quotation is empty,
as in
@quotation thing
@end quotation
Is it really a bug?
In @copying things like some raw formats may be expanded. However it is
not clear that it should be the same than in the main converter. Maybe a
specific list of formats could be passed to Texinfo::Convert::Text::convert,
which would be different (for example Info and Plaintext even if converting
HTML). Not clear that it is a good idea. Also this requires a test, to begin
with.
Punctuation and spaces before @image do not lead to a doubling of space.
In fact @image is completly formatted outside of usual formatting containers.
Not sure what should be the right way?
test in info_test/image_and_punctuation
in info_tests/error_in_footnote there is an error message for each
listoffloats; Line numbers are right, though, so maybe this is not
an issue.
converters_tests/things_before_setfilename there is no error
for anchor and footnote before setfilename. It is not clear that
there should be, though.
In Info, image special directive on sectioning command line length
is taken into account for the underlying characters line count inserted
below the section title. There is no reason to underline the image
special directive. Since the image rendering and length of replacement
text depends on the Info viewer, however, there is no way to know in
advance the lenght of text to underline (if any). It is therefore unclear
what would be the correct underlying characters count.
An example in formats_encodings/at_commands_in_refs.
When using Perl modules, many strings in debugging output are internal
Perl strings not encoded before being output, leading to
'Wide character in print' messages (in C those strings are always encoded
in UTF-8). Not clear that it is an issue. For example with
export TEXINFO_XS=omit
/usr/bin/perl -w ./../perl/texi2any.pl --force --conf-dir ./../perl/t/init/ --conf-dir ./../perl/init --conf-dir ./../perl/ext -I ./coverage/ -I coverage// -I ./ -I . -I built_input --error-limit=1000 -c TEST=1 --output coverage//out_parser/formatting_macro_expand/ --macro-expand=coverage//out_parser/formatting_macro_expand/formatting.texi -c TEXINFO_OUTPUT_FORMAT=structure ./coverage//formatting.texi --debug=1 2>t.err
HTML5 validation tidy errors that do not need fixing
----------------------------------------------------
# to get only errors:
tidy -qe *.html
Some can also be validation errors in other HTML versions.
missing </a> before <a>
discarding unexpected </a>
nested <a> which happens for @url in @xref, which is valid Texinfo.
Warning: <a> anchor "..." already defined
Should only happen with multiple insertcopying.
Warning: trimming empty <code>
Normally happens only for invalid Texinfo, missing @def* name, empty
@def* line...
<td> attribute "width" not allowed for HTML5
<th> attribute "width" not allowed for HTML5
These attributes are obsolete (though the elements are
still part of the language), and must not be used by authors.
The CSS replacement would be style="width: 40%".
However, width is kept as an attribute in texi2any @multitable output and not
as CSS because it is not style, but table or even line specific formatting.
If the _INLINE_STYLE_WIDTH undocumented option is set, CSS is used.
It is set for EPUB.
See
https://lists.gnu.org/archive/html/bug-texinfo/2024-09/msg00065.html
Specialized synopsis in DocBook
-------------------------------
Use of specialized synopsis in DocBook is not a priority and it is not even
obvious that it is interesting to do so. The following notes explain the
possibilities and issues extensively.
Instead of synopsis it might seem to be relevant to use specialized synopsis,
funcsynopsis/funcprototype for deftype* and some def*, and other for object
oriented. There are many issues such that this possibility do not appear
appealing at all.
1) there is no possibility to have a category. So the category must be
added somewhere as a role= or in the *synopsisinfo, or this should only
be used for specialized @def, like @defun.
2) @defmethod and @deftypemethod cannot really be mapped to methodsynopsis
as the class name is not associated with the method as in Texinfo, but
instead the method should be in a class in docbook.
3) From the docbook reference for funcsynopsis
"For the most part, the processing application is expected to
generate all of the parentheses, semicolons, commas, and so on
required in the rendered synopsis. The exception to this rule is
that the spacing and other punctuation inside a parameter that is a
pointer to a function must be provided in the source markup."
So this mean it is language specific (C, as said in the docbook doc)
and one have to remove the parentheses, semicolons, commas.
See also the mails from Per Bothner bug-texinfo, Sun, 22 Jul 2012 01:45:54.
specialized @def, without a need for category:
@defun and @deftypefun
<funcsynopsis><funcprototype><funcdef>TYPE <function>NAME</function><paramdef><parameter>args</parameter></paramdef></funcprototype></funcsynopsis>
specialized @def, without a need for category, but without DocBook synopsis
because of missing class:
@defmethod, @deftypemethod: methodsynopsis cannot be used since the class
is not available
@defivar and @deftypeivar: fieldsynopsis cannot be used since the class
is not available
Generic @def with a need for a category
For deffn deftypefn (and defmac?, defspec?), the possibilities of
funcsynopsis, with a category added could be used:
<funcsynopsis><funcprototype><funcdef role=...>TYPE <function>NAME</function></funcdef><paramdef>PARAMTYPE <parameter>PARAM</parameter></paramdef></funcprototype></funcsynopsis>
Alternatively, use funcsynopsisinfo for the category.
Generic @def with a need for a category, but without DocBook synopsis because
of missing class:
@defop and @deftypeop: methodsynopsis cannot be used since the class
is not available
defcv, deftypecv: fieldsynopsis cannot be used since the class
is not available
Remaining @def without DocBook synopsis because there is no equivalent,
and requires a category
defvr (defvar, defopt), deftypevr (deftypevar)
deftp
Solaris 11
==========
# recent Test::Deep requires perl 5.12
cpan> o conf urllist push http://backpan.perl.org/
cpan RJBS/Test-Deep-1.127.tar.gz
Also possible to install Texinfo dependencies with openCSW, like
pkgutil -y -i CSWhelp2man CSWpm-data-compare CSWpm-test-deep
The system perl may not be suitable to build XS modules, and the
system gawk may be too old, openCSW may be needed. For example:
./configure PERL=/opt/csw/bin/perl GAWK=/opt/csw/bin/gawk CFLAGS='-g'
./configure PERL=/opt/csw/bin/perl GAWK=/opt/csw/bin/gawk CFLAGS='-g' LDFLAGS=-L/opt/csw/lib/ CPPFLAGS='-I/opt/csw/include/' PERL_EXT_LDFLAGS=-L/opt/csw/lib/ LIBS=-liconv
Misc notes
==========
# differences between dist in and out source after maintainer-clean
rm -f texinfo-7.*.*.tar.gz
./autogen.sh
./configure
make maintainer-clean
./autogen.sh
./configure
make dist
rm -rf in_source_dist_contents
mkdir in_source_dist_contents
(cd in_source_dist_contents && tar xvf ../texinfo-7.*.*.tar.gz)
make maintainer-clean
rm -rf bb
mkdir bb
cd bb
../configure
make dist
tar xvf texinfo-7.*.*.tar.gz
diff -u -r ../in_source_dist_contents/texinfo-7.*.*/ texinfo-7.*.*/ > ../build_in_out_source_differences.diff
Test validity of Texinfo XML or docbook
export XML_CATALOG_FILES=~/src/texinfo/tta/maintain/catalog.xml
xmllint --nonet --noout --valid commands.xml
tidy does not seems to be released and/or maintained anymore. It incorrectly
emits an error for ol type attribute.
tidy -qe *.html
profiling: package on debian:
libdevel-nytprof-perl
In doc:
perl -d:NYTProf ../tta/perl/texi2any.pl texinfo.texi --html
perl -d:NYTProf ../tta/perl/texi2any.pl texinfo.texi
nytprofhtml
# firefox nytprof/index.html
Test with 8bit locale:
export LANG=fr_FR; export LANGUAGE=fr_FR; export LC_ALL=fr_FR
xterm &
Turkish locale, interesting as ASCII upper-case letter I can become
a (non-ASCII) dotless i when lower casing. (Eli recommendation).
export LANG=tr_TR.UTF-8; export LANGUAGE=tr_TR.UTF-8; export LC_ALL=tr_TR.UTF-8
On ExtUtils::Embed flags (the documentation tends to be inaccurate):
ldopts: ccdlflags ldflags "libperl and perllibs through MakeMaker + static_ext"
ccopts: ccflags perl_inc
convert to pdf from docbook
xsltproc -o intermediate-fo-file.fo /usr/share/xml/docbook/stylesheet/docbook-xsl/fo/docbook.xsl texinfo.xml
fop -r -pdf texinfo-dbk.pdf -fo intermediate-fo-file.fo
dblatex -o texinfo-dblatex.pdf texinfo.xml
Open a specific info file in Emacs Info reader: C-u C-h i
In tta/tests/, generate Texinfo file for Texinfo TeX coverage
../perl/texi2any.pl --force --error=100000 -c TEXINFO_OUTPUT_FORMAT=plaintexinfo -D valid layout/formatting.texi > formatting_valid.texi
From doc/
texi2pdf -I ../tta/tests/layout/ ../tta/tests/formatting_valid.texi
To generate valgrind .supp rules: --gen-suppressions=all --log-file=gen_supp_rules.log
mkdir -p val_res
PERL_DESTRUCT_LEVEL=2
export PERL_DESTRUCT_LEVEL
for file in t/*.t ; do bfile=`basename $file .t`; echo $bfile; valgrind --suppressions=../texi2any.supp -q perl -w $file > val_res/$bfile.out 2>&1 ; done
With memory leaks
for file in t/*.t ; do bfile=`basename $file .t`; echo $bfile; valgrind --suppressions=../texi2any.supp -q --leak-check=full perl -w $file > val_res/$bfile.out 2>&1 ; done
for file in t/*.t ; do bfile=`basename $file .t`; echo $bfile; valgrind -q --leak-check=full perl -w $file > val_res/$bfile.out 2>&1 ; done
(for file in t/z_misc/*.t ; do bfile=`basename $file .t`; echo z_misc/$bfile; valgrind -q --leak-check=full perl -w $file ; done) > val_res/z_misc.out 2>&1
For tests in tta/tests, a way to have valgrind call prependend is to add,
in tta/defs:
prepended_command='valgrind --leak-check=full -q'
prepended_command='valgrind --leak-check=full -q --suppressions=../texi2any.supp'
Before the code reorganization and the separation of linked against Gnulib
and code linked against Perl, in some cases, memory that was not
released/freed, but that should still be accessible at the end of conversion
was shown by valgrind as being leaked, typically static/global
variables freed upon reuse, or left unreleased on purpose (parser conf, for
example) and some symbols were shown as ???. It is unclear what the problem
was, the documentation of valgrind hints that the memory appearing as leaked
could have come a dlclosed object. In that case adding --keep-debuginfo=yes
showed the missing symbols as seen in the valgrind documentation.
rm -rf t/check_debug_differences/
mkdir t/check_debug_differences/
for file in t/*.t ; do bfile=`basename $file .t`; perl -w $file -d 1 2>t/check_debug_differences/XS_$bfile.err ; done
export TEXINFO_XS_PARSER=0
for file in t/*.t ; do bfile=`basename $file .t`; perl -w $file -d 1 2>t/check_debug_differences/PL_$bfile.err ; done
for file in t/*.t ; do bfile=`basename $file .t`; sed 's/^XS|//' t/check_debug_differences/XS_$bfile.err | diff -u t/check_debug_differences/PL_$bfile.err - > t/check_debug_differences/debug_$bfile.diff; done
Full check, including XS conversion and memory leaks in debug:
PERL_DESTRUCT_LEVEL=2
export PERL_DESTRUCT_LEVEL
for file in t/*.t ; do bfile=`basename $file .t`; valgrind -q --leak-check=full --keep-debuginfo=yes perl -w $file -d 1 2>t/check_debug_differences/XS_$bfile.err ; done
Check of XS interface to Perl
export TEXINFO_XS_EXTERNAL_FORMATTING=1
export TEXINFO_XS_EXTERNAL_CONVERSION=1
Analysing memory use:
valgrinf massif useful-heap approximate distribution in 2024 (obsolete)
mkdir -p massif
valgrind --tool=massif --massif-out-file=massif/massif_info.out perl -w texi2any.pl ../doc/texinfo.texi
ms_print massif/massif_info.out > massif/ms_print_info.out
16M Perl
36M C tree
50M Perl tree (visible in detailed use, but difficult to do the
imputation right, some may correspond with other uses
of Perl memory)
5M (approximate, not visible in the detailed use, based on difference
in use over time) conversion
With full XS (7.2 64M, with text separate 58.5M, without info_info 56M
with integer extra keys 54M, with source marks as pointers 52.3M)
mkdir -p massif
valgrind --tool=massif --massif-out-file=massif/massif_html.out perl -w texi2any.pl --html ../../doc/texinfo.texi
ms_print massif/massif_html.out > massif/ms_print_html.out
useful-heap
25M = 13.1 + 5.8 + 2.9 + 2.5 + 0.7 Perl
17.8M Tree
6 + 5 = 11M new_element
3.5M reallocate_list
0.5M get_associated_info_key (below threshold in later reports)
2.8M = 0.8 + 0.7 +1.3 text
5.2M = 3.8 (text) + 0.7 (text printindex) + 0.7: conversion,
mainly text in convert_output_output_unit*
(+1.3M by approximate difference with total)
(7.5 + 1.3) - (3.8 + 0.7 + 0.7 + 0.8 +1.3) = 1.5 M Text not imputed
3. - 0.5 = 2.5M remaining not imputed (- get_associated_info_key)
52M TOTAL (for 52.3M reported)
Using callgrind to find the time used by functions
valgrind --tool=callgrind perl -w texi2any.pl ../../doc/texinfo.texi --html
# to avoid cycles (some remain in Perl only code) that mess up the graph:
valgrind --tool=callgrind --separate-callers=3 --separate-recs=10 perl -w texi2any.pl ../../doc/texinfo.texi --html
valgrind --tool=callgrind --separate-callers=4 --separate-recs=11 perl -w texi2any.pl ../../doc/texinfo.texi
kcachegrind callgrind.out.XXXXXX
This is obsolete with output overriding, although the distribution changed
very little after reattributing the shares.
For the Texinfo manual with full XS, in 2024, Perl uses 22% of the time
(for html), now only for code hopefully called once. The switch to
global locales for setlocale calling that is needed for Perl takes also 4%.
Calling Perl getSortKey uses about 28% (more on sorting and C below).
Decomposition of the time used for the Texinfo manual with full XS
(in percent):
parser: 11.5
index sorting: 30
main conversion to HTML: 24.8 = 54.8 - 30
node redirections: 2.6
prepare conversion units: 2.3
remove document: 1.8
associate internal references: 0.53
prepare unit directions: 0.41
setup indices sort strings: 0.36
reset converter: 0.23
structuring transformation1: 0.19
structuring transformation2: 0.19
remaining Texinfo XS code: 0.35
= 8.95
Perl: 22.5 = 7 + 15.2 + (75.57 - 54.8 - 11.5 - 8.95)
SUM: 98
Setting flags
# some features are only enabled at -O2, but sometime -O0 is better
# for debugging with valgrind
our_CFLAGS='-g -O0 -Wformat-security -Wstrict-prototypes -Wall -Wno-parentheses -Wno-unused-parameter -Wextra'
# keep cpp expanded files
# -save-temps
# Without -Wstack-protector there is no message on functions not protected.
# All these are in gnulib or gettext for now.
# -fno-omit-frame-pointer is better for debugging with valgrind, but has
# some speed penalty
our_CFLAGS='-g -O2 -D_FORTIFY_SOURCE=2 -Wformat-security -Wstrict-prototypes -Wall -Wno-parentheses -Wno-missing-braces -Wno-unused-parameter -fstack-protector-all -Wextra -fno-omit-frame-pointer'
our_CFLAGS='-g -O2 -Wformat-security -Wstrict-prototypes -Wall -Wno-parentheses -Wno-missing-braces -Wno-unused-parameter -Wextra'
./configure --enable-additional-checks "CFLAGS=$our_CFLAGS" "PERL_EXT_CFLAGS=$our_CFLAGS"
./configure --enable-c-texi2any --enable-additional-checks "CFLAGS=$our_CFLAGS" "PERL_EXT_CFLAGS=$our_CFLAGS"
unset our_CFLAGS