| @node Texinfo@asis{::}Convert@asis{::}Unicode |
| @chapter Texinfo::Convert::Unicode |
| |
| @node Texinfo@asis{::}Convert@asis{::}Unicode NAME |
| @section Texinfo::Convert::Unicode NAME |
| |
| Texinfo::Convert::Unicode - Representation as Unicode characters |
| |
| @node Texinfo@asis{::}Convert@asis{::}Unicode SYNOPSIS |
| @section Texinfo::Convert::Unicode SYNOPSIS |
| |
| @verbatim |
| use Texinfo::Convert::Unicode qw(unicode_accent encoded_accents |
| unicode_text); |
| use Texinfo::Convert::Text qw(convert_to_text); |
| |
| my ($contents_element, $stack) |
| = Texinfo::Common::find_innermost_accent_contents($accent); |
| |
| my $formatted_accents = encoded_accents($converter, |
| convert_to_text($contents_element), $stack, $encoding, |
| \&Texinfo::Text::ascii_accent_fallback); |
| @end verbatim |
| |
| @node Texinfo@asis{::}Convert@asis{::}Unicode NOTES |
| @section Texinfo::Convert::Unicode NOTES |
| |
| The Texinfo Perl module main purpose is to be used in @code{texi2any} to convert |
| Texinfo to other formats. There is no promise of API stability. |
| |
| @node Texinfo@asis{::}Convert@asis{::}Unicode DESCRIPTION |
| @section Texinfo::Convert::Unicode DESCRIPTION |
| |
| @code{Texinfo::Convert::Unicode} provides methods dealing with Unicode |
| representation and conversion of Unicode code points, to be used in Texinfo |
| converters. |
| |
| When an encoding supported in Texinfo is given as argument of a method of the |
| module, the accented letters or characters returned by the method should only |
| be represented by Unicode code points if it is known that Perl should manage |
| to convert the Unicode code points to encoded characters in the encoding |
| character set. Note that the actual conversion is done by Perl, not by the |
| module. |
| |
| @node Texinfo@asis{::}Convert@asis{::}Unicode METHODS |
| @section Texinfo::Convert::Unicode METHODS |
| |
| @table @asis |
| @item $result = brace_no_arg_command($command_name, $encoding) |
| @anchor{Texinfo@asis{::}Convert@asis{::}Unicode $result = brace_no_arg_command($command_name@comma{} $encoding)} |
| @cindex @code{brace_no_arg_command} |
| |
| Return the Unicode representation of a command with brace and no argument |
| @emph{$command_name} (like @code{@@bullet@{@}}, @code{@@aa@{@}} or @code{@@guilsinglleft@{@}}), |
| or @code{undef} if the Unicode representation cannot be converted to encoding |
| @emph{$encoding}. |
| |
| @item $possible_conversion = check_unicode_point_conversion($arg, $output_debug) |
| @anchor{Texinfo@asis{::}Convert@asis{::}Unicode $possible_conversion = check_unicode_point_conversion($arg@comma{} $output_debug)} |
| @cindex @code{check_unicode_point_conversion} |
| |
| Check that it is possible to output actual UTF-8 binary bytes |
| corresponding to the Unicode code point string @emph{$arg} (such as |
| @code{201D}). Perl gives a warning and will not output UTF-8 for |
| Unicode non-characters such as U+10FFFF. If the optional |
| @emph{$output_debug} argument is set, a debugging output warning |
| is emitted if the test of the conversion failed. |
| Returns 1 if the conversion is possible and can be attempted, |
| 0 otherwise. |
| |
| @item $result = encoded_accents($converter, $text, $stack, $encoding, $format_accent, $set_case) |
| @anchor{Texinfo@asis{::}Convert@asis{::}Unicode $result = encoded_accents($converter@comma{} $text@comma{} $stack@comma{} $encoding@comma{} $format_accent@comma{} $set_case)} |
| @cindex @code{encoded_accents} |
| |
| @emph{$encoding} is the encoding the accented characters should be encoded to. If |
| @emph{$encoding} not set, @emph{$result} is set to @code{undef}. Nested accents and |
| their content are passed with @emph{$text} and @emph{$stack}. @emph{$text} is the text |
| appearing within nested accent commands. @emph{$stack} is an array reference |
| holding the nested accents texinfo tree elements. In general, @emph{$text} is |
| the formatted contents and @emph{$stack} the stack returned by |
| @ref{Texinfo@asis{::}Common ($contents_element@comma{} |
| \@@accent_commands) = find_innermost_accent_contents($element),, Texinfo::Common::find_innermost_accent_contents}. The function |
| tries to convert as much as possible the accents to @emph{$encoding} starting from the |
| innermost accent. |
| |
| @emph{$format_accent} is a function reference that is used to format the accent |
| commands if there is no encoded character available at some point of the |
| conversion of the @emph{$stack}. @emph{$converter} is a converter object optionaly |
| used by @emph{$format_accent}. It may be @code{undef} if there is no need of |
| converter object in @emph{$format_accent}. |
| |
| The @emph{$set_case} argument is optional. If @emph{$set_case} is positive, the result |
| is upper-cased, while if it is negative, the result is lower-cased. |
| |
| @item $width = string_width($string) |
| @anchor{Texinfo@asis{::}Convert@asis{::}Unicode $width = string_width($string)} |
| @cindex @code{string_width} |
| |
| Return the string width, taking into account the fact that some characters |
| have a zero width (like composing accents) while some have a width of 2 |
| (most chinese characters, for example). |
| |
| @item $result = unicode_accent($text, $accent_command, $index_in_stack, $accents_stack) |
| @anchor{Texinfo@asis{::}Convert@asis{::}Unicode $result = unicode_accent($text@comma{} $accent_command@comma{} $index_in_stack@comma{} $accents_stack)} |
| @cindex @code{unicode_accent} |
| |
| @emph{$text} is the text appearing within an accent command. @emph{$accent_command} |
| should be a Texinfo tree element corresponding to an accent command taking |
| an argument. @emph{$index_in_stack} is the position in the @emph{$accents_stack} |
| of the accent command being converted. The function returns the Unicode |
| representation of the accented character. |
| |
| @item $is_decoded = unicode_point_decoded_in_encoding($encoding, $unicode_point) |
| @anchor{Texinfo@asis{::}Convert@asis{::}Unicode $is_decoded = unicode_point_decoded_in_encoding($encoding@comma{} $unicode_point)} |
| @cindex @code{unicode_point_decoded_in_encoding} |
| |
| Return true if the @emph{$unicode_point} will be encoded in the encoding |
| @emph{$encoding}. The @emph{$unicode_point} should be specified as a four letter |
| string describing an hexadecimal number with letters in upper case |
| (such as @code{201D}). Tables are used to determine if the @emph{$unicode_point} |
| will be encoded, when the encoding does not cover the whole Unicode range. |
| |
| If the encoding is not supported in Texinfo, the result will always be false. |
| |
| @item $result = unicode_text($text, $in_code) |
| @anchor{Texinfo@asis{::}Convert@asis{::}Unicode $result = unicode_text($text@comma{} $in_code)} |
| @cindex @code{unicode_text} |
| |
| Return @emph{$text} with dashes and quotes corresponding, for example to @code{---} or |
| @code{'}, represented as Unicode code points. If @emph{$in_code} is set, the text is |
| considered to be in code style. |
| |
| @end table |
| |
| @node Texinfo@asis{::}Convert@asis{::}Unicode AUTHOR |
| @section Texinfo::Convert::Unicode AUTHOR |
| |
| Patrice Dumas, <bug-texinfo@@gnu.org> |
| |
| @node Texinfo@asis{::}Convert@asis{::}Unicode COPYRIGHT AND LICENSE |
| @section Texinfo::Convert::Unicode COPYRIGHT AND LICENSE |
| |
| Copyright 2010- Free Software Foundation, Inc. See the source file for |
| all copyright years. |
| |
| This library is free software; you can redistribute it and/or modify |
| it under the terms of the GNU General Public License as published by |
| the Free Software Foundation; either version 3 of the License, or (at |
| your option) any later version. |
| |