blob: 4d6e9b7b608cfe8298236badbceb8ed02424110a [file] [log] [blame]
This file discusses issues related to HTML being used alongside or
instead of Info as a documentation format. As this has been discussed
for many years without much progress being made, it was thought helpful
to create a file like this to collect all the issues in one place. Then,
we could avoid having the same discussions over and over again. Anybody
raising the question could be referred to this file so they could be
aware of the current situation.
* Why?
(Excerpt from mail to bug-texinfo@gnu.org in April 2019).
> Documentation for GNU packages and others is often installed in the
> Info format, a plain text format. Using a plaintext based format for
> documentation does not take advantage of bitmapped displays that have
> been available for decades. It does not allow styling of text or
> reflowing of text. Much information is lost in the conversion from
> Texinfo to Info and any attempt in, for example, Emacs to re-add this
> information is unreliable.
>
> Nonetheless, Info viewers have continued to have advantages over web
> browsers. They are fast, and have features for searching the manual
> with index lookup. They allow the use of keyboard commands.
Links to these discussions:
https://lists.gnu.org/archive/html/bug-texinfo/2019-04/msg00001.html
https://lists.gnu.org/archive/html/bug-texinfo/2019-10/msg00010.html
https://lists.gnu.org/archive/html/bug-texinfo/2019-11/msg00003.html
Image support would be another advantage of HTML.
An earlier document on this issue, from Per Bothner in 2016:
http://per.bothner.com/blog/2016/texinfo-roadmap/
* Coordinate with other projects
It's hard to make progress on this due to the need to coordinate with
different projects. It's a chicken-and-egg problem: an HTML Info
program could not be released until people had documentation installed
where the program could read it, but people are also not motivated
to install documentation in a place that no program would use it.
Changes could be needed in other projects
- Distributions (Guix, Debian...) - Install HTML documentation
- Automake - rules for generating and installing HTML documentation
- Emacs - Info mode
- GNOME/other desktop environments - see if work on webkitgtk-info
or other could be integrated into other help systems like yelp
or DevHelp.
- there was some discussion (December 2024) at
https://gitlab.gnome.org/GNOME/yelp/-/issues/220 (although that
issue tracker is often inaccessible due to their anti-bot measures).
A possible first step is to encourage installation of Texinfo-derived
HTML documentation and add support to whatever programs people use
to read locally-installed documentation on their systems. (Better
support for indices and search paths could be added later.)
So that these Info features could be added later, installed HTML
manuals should be installed in standardised locations and contain
needed information. There should be some degree of stability in
the HTML produced by texi2any to allow such features and the Texinfo
project should agree and document what this stability entails (e.g.
inter-manual links, identifying index nodes).
* Location of locally installed HTML files
The location of files is needed for users to find a manual by name
and to follow cross-references between manuals.
The GNU coding standards specify a 'docdir' directory for a package. For
example, '/usr/local/share/doc/YOURPKG'. It would be natural to install
Texinfo HTML documentation under '/usr/local/share/doc/YOURPKG/html'.
However, manual names are distinct from package names. For example,
"emacs" would be the package name for many manuals ("tramp",
"org", "calc", etc.). Hence, a user might not find the tramp
manual under /usr/local/share/doc/tramp/html but somewhere under
/usr/local/share/doc/emacs.
We propose that HTML manuals should be sought in a directory named
datadir/texinfo/html/MANUAL or datadir/texinfo/html/MANUAL_html, where
datadir is as defined in the GNU Coding Standards. (For example,
datadir could take the values "/usr/share" or "/usr/local/share".)
$XDG_DATA_DIRS/texinfo/html should also be checked where
XDG_DATA_DIRS is as defined in the "XDG Base Directory Specification"
(https://specifications.freedesktop.org/basedir-spec/latest/).
Users or distributions would install manuals by copying/symlinking the
manuals to this location. (The /usr/share/texinfo/html directory is akin
to the /usr/share/info/dir file in the Info system.) For example,
HTML manuals could be symlinked from locations under /usr/share/doc to
be accessed by manual name under /use/share/texinfo/html in a standard
format.
Debian alse has some guidance on where HTML documentation could
be installed, specifying /usr/share/doc/PACKAGE or /usr/share/doc/PACKAGE-doc
as installation directories:
https://www.debian.org/doc/debian-policy/ch-docs.html#additional-documentation
File locations on Debian are not completely consistent; e.g. the
make manual is under /usr/share/doc/make-doc/make.html, while gdb
is under /usr/share/doc/gdb-doc/html/gdb.
* Getting names of manuals from links
It was also an issue as to how to get the manual name from a link from
one Texinfo manual to another Texinfo manual.
In recent Texinfo releases, links to other manuals are annotated with the
"data-manual" attribute. This solves the problem of getting the manual
name.
Texinfo manuals are put on the web with a variety of URL formats and
it is not possible to reliably get a manual name from a URL or even
to tell if a URL is to a Texinfo manual or to some other webpapge. It's
unlikely we could persuade everybody to change their URL formats to
be something a browser could parse reliably to get the manual name. Here
are some examples of URL's and the manual names:
https://www.gnu.org/software/trans-coord/manual/cvs/html_node/ - "cvs"
https://gmplib.org/manual/ - "gmp"
http://lilypond.org/doc/v2.22/Documentation/internals/ - "lilypond-internals"
https://gcc.gnu.org/onlinedocs/gfortran/ - "gfortran"
Prior to this, we mooted the idea of outputting links in HTML documents
differently if they were to be installed locally rather than put on the web.
For example, link to "../auth/index.html" instead of
"https://www.gnu.org/software/emacs/manual/html_node/auth/index.html". With
the right installation standards, this would have allowed inter-manual
links to work in regular web browsers (although only for a single
installation location).
* Where to store non-English manuals
Possibly in a subdirectory named with the language code
* Requirements
Simply using the system web browser to access HTML manuals is not
a full Info replacement. The following features would be missing:
- Search in the document's indices should be allowed.
- There should also be facility for handling intermanual-links and searching
a path for manuals installed in multiple locations on a system.
* Browser software
There have been two browsing systems developed specially for reading
the texi2any HTML output:
- info.js
- webkitgtk-info
info.js has acquired a small number of users and is fairly
feature-complete.
(However, the current Texinfo developers (as of July 2025) have little
capacity to push forward development (or adoption) of either system.)
On info.js:
> In attempt to bring some of the benefits of the Info viewers to HTML
> documentation in web browsers, in 2017, as part of Google Summer of
> Code, Matthieu Lirzin worked on a JavaScript interface that works with
> the HTML that texi2any produces. His work is substantially complete.
> ...All the important keyboard commands that work in the Info viewers are
> implemented, including index lookup.
This system is contained within the 'info.js' file (under js/ in
the Texinfo sources), and is added to texi2any HTML output with
the INFO_JS_DIR variable.
However, this system is only fully appropriate for online manuals,
not for locally installed manuals. The reason for this is that it only
handles one manual at a time and does not handle searching for a manual.
On webkitgtk-info:
Another system has been developed using the embedded WebKitGTK browser,
under the 'infog' directory in the Texinfo sources.
Video demo https://www.gnu.org/software/texinfo/video/demo.webm
(from November 2019)
From a private email (October 2020):
> The main problem with the two embedded HTML renderers we used (we also
> developed a program with QtWebEngine in this branch:
> https://git.savannah.gnu.org/cgit/texinfo.git/log/?h=qt-info) is their
> segmented architecture, and the difficulty of getting information in and
> out of them. As web browsers and websites are so complicated and
> unreliable, they make the code which processes and renders the HTML pages
> run in a separate process, so if the renderer crashes or freezes, some
> recovery action can be taken.
>
> Most of the complexity and code in both programs was to work around
> this, creating and using communications channels between the two halves
> of the program. If the process displaying the UI wants access to data
> from within the HTML page that the rendering process has displayed,
> for example, HTML elements for index entries in a page, so that
> index entries can be listed in some widget, then it can't simply
> access those data structures in the address space of the process:
> it has to create, send and receive messages over some channel.
>
> WebKitGTK had an API to access DOM elements in the renderer thread,
> but this API was deprecated in favour of injected JavaScript (I never
> worked out how to use this so am not sure if it is possible to achieve
> the same results). Hence a two-part architecture would have to be
> replaced with a three-part architecture: instead of
>
> UI process<--->Browser process
>
> there would be
>
> UI process<--->Browser process (native code)<-->Browser process (Javascript)
>
> (The QtWebEngine program had a similar architecture except it was
> even worse, with a message-passing system between JavaScript code
> running in different contexts.)
(Since that email was written some use has been made of injected
JavaScript.)
Use of datadir and XDG directories as described above is (as of July 2025)
incomplete, but would be straighforward to add.
Currently infog only supports "split" installed HTML manuals
(meaning a collection of smaller HTML files rather than one large HTML file).